7 AI Agent Coding Examples (Real Projects)

Multiple monitors showing code review and debugging sessions — Photo by Florian Olivo on Unsplash

AI coding agents are transforming software engineering workflows. Unlike simple code completion, these agents can plan multi-step changes, navigate codebases, run tests, interpret results, and iterate — all autonomously or with minimal human oversight. The gap between a junior developer and a well-configured coding agent is closing rapidly.

These seven examples cover the most impactful coding agent use cases, with architecture details and realistic code. Each one reflects patterns used in production engineering teams today. For understanding the foundational concepts behind these agents, start with What is an AI Agent and review the Engineering AI Agents use case.

Example 1: Automated Pull Request Code Reviewer#

Use Case: Automatically review every pull request for security vulnerabilities, logic errors, and style issues — providing actionable inline comments before a human reviewer sees the PR.

Architecture: GitHub Actions trigger → fetch diff via GitHub API → AI review agent → post inline comments via GitHub Review API.

Key Implementation:

import os
import httpx
from anthropic import Anthropic

client = Anthropic()
GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]

def get_pr_diff(repo: str, pr_number: int) -> str:
    """Fetch the diff for a pull request."""
    response = httpx.get(
        f"https://api.github.com/repos/{repo}/pulls/{pr_number}",
        headers={"Authorization": f"Bearer {GITHUB_TOKEN}",
                 "Accept": "application/vnd.github.diff"}
    )
    return response.text

def review_diff(diff: str) -> list[dict]:
    """Use Claude to review a PR diff and return structured comments."""
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=4000,
        system="""You are a senior software engineer conducting a code review.
        Analyze the provided diff and return a JSON array of review comments.
        Each comment: {"file": str, "line": int, "severity": "critical|high|medium|low",
        "category": "security|performance|logic|style|documentation",
        "comment": str, "suggestion": str}
        Focus on actionable issues. Skip trivial style nits unless they affect readability.""",
        messages=[{
            "role": "user",
            "content": f"Review this pull request diff:\n\n{diff[:15000]}"
        }]
    )
    import json
    # Extract JSON from response
    text = response.content[0].text
    start = text.find('[')
    end = text.rfind(']') + 1
    return json.loads(text[start:end])

def post_review_comments(repo: str, pr_number: int, commit_sha: str, comments: list):
    """Post review comments to the GitHub PR."""
    review_body = {
        "commit_id": commit_sha,
        "event": "COMMENT",
        "comments": [
            {
                "path": c["file"],
                "line": c["line"],
                "body": f"**[{c['severity'].upper()}] {c['category'].title()}**\n\n{c['comment']}\n\n**Suggestion:** {c['suggestion']}"
            }
            for c in comments if c.get("severity") in ["critical", "high", "medium"]
        ]
    }
    httpx.post(
        f"https://api.github.com/repos/{repo}/pulls/{pr_number}/reviews",
        json=review_body,
        headers={"Authorization": f"Bearer {GITHUB_TOKEN}"}
    )

# Main review flow (called from GitHub Actions)
repo = "my-org/my-repo"
pr_number = 142
commit_sha = "abc123def456"

diff = get_pr_diff(repo, pr_number)
comments = review_diff(diff)
post_review_comments(repo, pr_number, commit_sha, comments)
print(f"Posted {len(comments)} review comments")

Outcome: Every PR gets consistent security and quality review in under 60 seconds. Human reviewers focus on architecture and business logic while the agent handles the mechanical checks. See the Coding Agent tutorial for a full GitHub Actions workflow setup.

Example 2: Automated Test Generation Agent#

Use Case: Given a source file, generate a comprehensive test suite covering happy paths, edge cases, and error conditions — using the actual function signatures and docstrings as context.

Architecture: File reader → AST parser (for function signatures) → test generation agent → test runner validation loop.

Key Implementation:

import ast
import subprocess
from openai import OpenAI

client = OpenAI()

def extract_functions(source_code: str) -> list[dict]:
    """Extract function signatures and docstrings from Python source."""
    tree = ast.parse(source_code)
    functions = []
    for node in ast.walk(tree):
        if isinstance(node, ast.FunctionDef):
            docstring = ast.get_docstring(node) or "No docstring"
            args = [a.arg for a in node.args.args]
            functions.append({
                "name": node.name,
                "args": args,
                "docstring": docstring,
                "lineno": node.lineno
            })
    return functions

def generate_tests(source_code: str, filename: str) -> str:
    """Generate pytest tests for a Python module."""
    functions = extract_functions(source_code)

    response = client.chat.completions.create(
        model="gpt-4o",
        temperature=0.1,
        messages=[
            {
                "role": "system",
                "content": """You are a senior Python engineer writing pytest test suites.
                Generate comprehensive tests covering:
                1. Happy path (normal inputs)
                2. Edge cases (empty, None, boundary values)
                3. Error conditions (invalid types, missing required args)
                4. Any domain-specific constraints from docstrings
                Use pytest fixtures and parametrize where appropriate.
                Import the module correctly and use descriptive test names."""
            },
            {
                "role": "user",
                "content": f"Write tests for this module ({filename}):\n\n{source_code}\n\nFunctions found: {functions}"
            }
        ]
    )
    return response.choices[0].message.content

def validate_tests(test_code: str, test_file: str) -> dict:
    """Run generated tests and return pass/fail results."""
    with open(test_file, "w") as f:
        f.write(test_code)

    result = subprocess.run(
        ["pytest", test_file, "--tb=short", "-q"],
        capture_output=True,
        text=True,
        timeout=60
    )
    return {
        "passed": result.returncode == 0,
        "output": result.stdout + result.stderr
    }

# Generate and validate tests
with open("src/user_service.py") as f:
    source = f.read()

test_code = generate_tests(source, "user_service.py")
results = validate_tests(test_code, "tests/test_user_service.py")

if results["passed"]:
    print("All generated tests pass!")
else:
    print("Some tests failed — review and fix:", results["output"])

Outcome: Test coverage for a typical 200-line module goes from 0% to 60–80% in under 2 minutes. The validation loop catches tests with incorrect assumptions before they're committed. Compare with AI Agent Coding Examples for research-focused agent patterns.

Example 3: Bug Diagnosis and Fix Agent#

Use Case: Given a bug report with a stack trace and reproduction steps, the agent locates the relevant code, diagnoses the root cause, and proposes a fix with explanation.

Architecture: LangChain ReAct agent + code reading tools + AST analysis + test execution tool.

Key Implementation:

from langchain_anthropic import ChatAnthropic
from langchain_core.tools import tool
from langchain.agents import create_react_agent, AgentExecutor
from langchain import hub
import subprocess, os

@tool
def read_file(file_path: str) -> str:
    """Read the contents of a source code file."""
    try:
        with open(file_path) as f:
            return f.read()
    except FileNotFoundError:
        return f"File not found: {file_path}"

@tool
def search_codebase(pattern: str) -> str:
    """Search the codebase for a pattern using grep."""
    result = subprocess.run(
        ["grep", "-rn", "--include=*.py", pattern, "src/"],
        capture_output=True, text=True, timeout=10
    )
    return result.stdout[:3000] if result.stdout else "No matches found"

@tool
def run_tests(test_file: str) -> str:
    """Run a specific test file and return results."""
    result = subprocess.run(
        ["pytest", test_file, "-v", "--tb=short"],
        capture_output=True, text=True, timeout=60
    )
    return (result.stdout + result.stderr)[:2000]

@tool
def propose_fix(file_path: str, original_code: str, fixed_code: str, explanation: str) -> str:
    """Propose a code fix with explanation (does NOT apply automatically)."""
    return f"""
PROPOSED FIX for {file_path}:
Explanation: {explanation}

BEFORE:
{original_code}

AFTER:
{fixed_code}
"""

llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
tools = [read_file, search_codebase, run_tests, propose_fix]
agent = create_react_agent(llm=llm, tools=tools, prompt=hub.pull("hwchase17/react"))
executor = AgentExecutor(agent=agent, tools=tools, max_iterations=10, verbose=True)

bug_report = """
Stack trace:
  File "src/payment_service.py", line 87, in process_refund
    amount = order.total_amount * refund_percentage
TypeError: unsupported operand type(s) for *: 'Decimal' and 'float'

Reproduction: Call process_refund(order_id=1234, percentage=0.5)
Expected: Refund of 50% processed successfully
"""

result = executor.invoke({"input": f"Diagnose and fix this bug:\n{bug_report}"})
print(result["output"])

Outcome: Root cause identified and a specific fix proposed in 1–2 minutes. The agent reads the file, searches for related code, and confirms the fix logic is correct before proposing. The propose_fix tool never applies changes automatically — a human applies the diff.

Multiple monitors showing code review and debugging sessions

Example 4: Documentation Generation Agent#

Use Case: Generate comprehensive API documentation from source code, including function descriptions, parameter tables, examples, and edge case notes.

Architecture: AST parser → docstring enrichment agent → Markdown formatter → output to docs directory.

Key Implementation:

import ast
from openai import OpenAI
from pathlib import Path

client = OpenAI()

def generate_api_docs(source_file: str) -> str:
    """Generate Markdown documentation for a Python module."""
    with open(source_file) as f:
        source = f.read()

    tree = ast.parse(source)
    module_docstring = ast.get_docstring(tree) or ""

    functions_info = []
    for node in ast.walk(tree):
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
            if node.name.startswith("_"):
                continue  # Skip private functions

            # Extract type annotations
            args_with_types = []
            for arg in node.args.args:
                annotation = ""
                if arg.annotation:
                    annotation = ast.unparse(arg.annotation)
                args_with_types.append(f"{arg.arg}: {annotation}" if annotation else arg.arg)

            return_annotation = ""
            if node.returns:
                return_annotation = ast.unparse(node.returns)

            functions_info.append({
                "name": node.name,
                "args": args_with_types,
                "returns": return_annotation,
                "existing_docstring": ast.get_docstring(node) or "",
                "source": ast.unparse(node)[:500]
            })

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": """Generate comprehensive Markdown API documentation.
                For each function include:
                - Description (what it does, why you'd use it)
                - Parameters table with name, type, description, default
                - Return value description
                - Example usage (realistic, not trivial)
                - Edge cases and common errors
                Format as clean Markdown suitable for a developer docs site."""
            },
            {
                "role": "user",
                "content": f"Module: {source_file}\n\nFunctions: {functions_info}"
            }
        ]
    )
    return response.choices[0].message.content

# Generate docs for all Python files in src/
for py_file in Path("src/").rglob("*.py"):
    if py_file.name.startswith("_"):
        continue

    docs = generate_api_docs(str(py_file))
    output_path = Path("docs/api") / py_file.with_suffix(".md").name
    output_path.parent.mkdir(parents=True, exist_ok=True)
    output_path.write_text(docs)
    print(f"Generated: {output_path}")

Outcome: Complete API documentation for an entire codebase generated in minutes. Documentation stays fresh when run as a pre-commit hook or CI step after significant changes.

Example 5: Automated Refactoring Agent#

Use Case: Identify and refactor code smells — long functions, duplicated logic, poor naming — across a codebase while preserving existing test coverage.

Architecture: Code smell detector → refactoring planner → change generator → test validation loop.

Key Implementation:

import subprocess
from anthropic import Anthropic
from pathlib import Path

client = Anthropic()

def detect_code_smells(file_path: str) -> list[dict]:
    """Identify refactoring opportunities in a Python file."""
    with open(file_path) as f:
        source = f.read()

    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": f"""Identify refactoring opportunities in this code.
            Return JSON array: [{{"type": str, "line_start": int, "line_end": int,
            "description": str, "priority": "high|medium|low"}}]

            Focus on: functions >40 lines, duplicated blocks >5 lines,
            unclear variable names, deep nesting >4 levels, missing type hints.

            Code:\n{source}"""
        }]
    )
    import json
    text = response.content[0].text
    start, end = text.find('['), text.rfind(']') + 1
    return json.loads(text[start:end]) if start != -1 else []

def generate_refactoring(file_path: str, smell: dict, source: str) -> dict:
    """Generate a specific refactoring for a code smell."""
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=3000,
        messages=[{
            "role": "user",
            "content": f"""Refactor this specific issue in {file_path}:
            Issue: {smell['description']} at lines {smell['line_start']}-{smell['line_end']}

            Provide: {{"original": str, "refactored": str, "explanation": str}}
            Preserve all existing behavior exactly.

            Full source:\n{source}"""
        }]
    )
    import json
    text = response.content[0].text
    start, end = text.find('{'), text.rfind('}') + 1
    return json.loads(text[start:end])

def validate_refactoring(file_path: str) -> bool:
    """Run tests to validate refactoring didn't break anything."""
    result = subprocess.run(
        ["pytest", "tests/", "--tb=short", "-q"],
        capture_output=True, text=True, timeout=120
    )
    return result.returncode == 0

# Refactoring pipeline
target_file = "src/order_processor.py"
smells = detect_code_smells(target_file)
high_priority = [s for s in smells if s["priority"] == "high"]

print(f"Found {len(high_priority)} high-priority refactoring opportunities")

with open(target_file) as f:
    source = f.read()

for smell in high_priority[:3]:  # Process top 3 to limit scope
    refactoring = generate_refactoring(target_file, smell, source)
    print(f"\nProposed: {refactoring['explanation']}")
    # Human applies change, then validate:
    # if validate_refactoring(target_file):
    #     print("Tests pass after refactoring")

Outcome: Systematic identification of technical debt with concrete, safe refactoring proposals. The agent never applies changes automatically — the human reviews and applies each one, then the test suite validates correctness.

Example 6: Dependency Vulnerability Scanner#

Use Case: Scan project dependencies for known CVEs, assess exploitability in context, and generate a prioritized remediation plan with specific version upgrade paths.

Architecture: pip audit / npm audit execution → CVE enrichment via NVD API → AI context analysis → remediation report.

Key Implementation:

import subprocess
import json
import httpx
from openai import OpenAI

client = OpenAI()

def run_pip_audit() -> list[dict]:
    """Run pip-audit and return JSON results."""
    result = subprocess.run(
        ["pip-audit", "--format=json", "--output=-"],
        capture_output=True, text=True
    )
    return json.loads(result.stdout).get("dependencies", [])

def analyze_vulnerabilities(vulns: list[dict], project_context: str) -> str:
    """Use AI to assess exploitability and prioritize remediations."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": """You are a security engineer analyzing dependency vulnerabilities.
                For each vulnerability, assess:
                1. Actual exploitability given the project context
                2. Whether the vulnerable code path is exercised
                3. Remediation priority (Critical/High/Medium/Low/Informational)
                4. Specific version to upgrade to
                5. Any breaking changes to watch for in the upgrade
                Output as a structured markdown security report."""
            },
            {
                "role": "user",
                "content": f"Project context: {project_context}\n\nVulnerabilities: {json.dumps(vulns, indent=2)}"
            }
        ]
    )
    return response.choices[0].message.content

# Run audit
vulnerabilities = run_pip_audit()
if vulnerabilities:
    project_context = "Python Flask REST API using SQLAlchemy ORM, deployed on AWS Lambda. Handles user authentication and payment processing."
    report = analyze_vulnerabilities(vulnerabilities, project_context)

    with open("security-report.md", "w") as f:
        f.write(report)
    print("Security report generated: security-report.md")
else:
    print("No vulnerabilities found")

Outcome: Context-aware vulnerability prioritization that avoids false alarms for vulnerabilities in code paths your application never uses. Security teams get actionable reports, not noise.

Example 7: Commit Message and Changelog Generator#

Use Case: Generate descriptive, conventional-commit-formatted commit messages and aggregate them into a structured CHANGELOG.md entry automatically.

Architecture: git diff --staged → AI message generator → git commit wrapper → periodic changelog aggregation.

Key Implementation:

import subprocess
from anthropic import Anthropic

client = Anthropic()

def get_staged_diff() -> str:
    """Get the current staged diff."""
    result = subprocess.run(
        ["git", "diff", "--staged", "--stat", "--diff-algorithm=minimal"],
        capture_output=True, text=True
    )
    diff_result = subprocess.run(
        ["git", "diff", "--staged"],
        capture_output=True, text=True
    )
    return result.stdout + "\n" + diff_result.stdout[:8000]

def generate_commit_message(diff: str) -> str:
    """Generate a conventional commit message for the staged changes."""
    response = client.messages.create(
        model="claude-3-5-haiku-20241022",  # Fast and cost-efficient for this task
        max_tokens=300,
        messages=[{
            "role": "user",
            "content": f"""Generate a conventional commit message for these staged changes.

            Format: <type>(<scope>): <description>
            [blank line]
            [optional body: what changed and why, max 3 bullet points]

            Types: feat|fix|docs|style|refactor|test|chore|perf|ci
            Rules: imperative mood, ≤72 chars first line, no period at end

            Diff:\n{diff}

            Return ONLY the commit message, nothing else."""
        }]
    )
    return response.content[0].text.strip()

def commit_with_message():
    """Generate message and commit staged changes."""
    diff = get_staged_diff()
    if not diff.strip():
        print("No staged changes found.")
        return

    message = generate_commit_message(diff)
    print(f"\nGenerated commit message:\n{message}\n")

    confirm = input("Use this message? [Y/n/e(dit)]: ").strip().lower()
    if confirm in ("", "y"):
        subprocess.run(["git", "commit", "-m", message])
    elif confirm == "e":
        # Open in editor
        import tempfile, os
        with tempfile.NamedTemporaryFile(suffix=".txt", mode="w", delete=False) as f:
            f.write(message)
            tmp = f.name
        os.system(f"$EDITOR {tmp}")
        with open(tmp) as f:
            edited = f.read()
        subprocess.run(["git", "commit", "-m", edited.strip()])

commit_with_message()

Outcome: Consistent, well-formatted commit history without the mental overhead of writing commit messages manually. The human always reviews before committing, maintaining control.

Choosing the Right Coding Agent Approach#

The examples above cover the spectrum from fully automated (documentation generation, test generation) to human-in-the-loop (code review comments, refactoring proposals, commit messages). Start with the fully automated patterns where output quality can be validated programmatically (tests pass/fail, CI checks pass). Add human checkpoints for any pattern where incorrect output would reach production or require significant rework to reverse.

The Coding Agent tutorial provides a complete development environment setup including sandboxing, tool permissions, and CI integration.

Getting Started#

Install the core dependencies: pip install anthropic langchain-anthropic openai. For code execution sandboxing, use E2B (pip install e2b) or Docker-based isolation. The Engineering AI Agents use case covers the organizational patterns for deploying these agents at team scale.

For understanding the agentic patterns that power these examples, ReAct reasoning explains the think-act-observe loop that most coding agents implement.

Frequently Asked Questions#

The FAQ section renders from the frontmatter faq array above.

Example 1: Automated Pull Request Code Reviewer#

Use Case: Automatically review every pull request for security vulnerabilities, logic errors, and style issues — providing actionable inline comments before a human reviewer sees the PR.

Architecture: GitHub Actions trigger → fetch diff via GitHub API → AI review agent → post inline comments via GitHub Review API.

Key Implementation:

import os
import httpx
from anthropic import Anthropic

client = Anthropic()
GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]

def get_pr_diff(repo: str, pr_number: int) -> str:
    """Fetch the diff for a pull request."""
    response = httpx.get(
        f"https://api.github.com/repos/{repo}/pulls/{pr_number}",
        headers={"Authorization": f"Bearer {GITHUB_TOKEN}",
                 "Accept": "application/vnd.github.diff"}
    )
    return response.text

def review_diff(diff: str) -> list[dict]:
    """Use Claude to review a PR diff and return structured comments."""
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=4000,
        system="""You are a senior software engineer conducting a code review.
        Analyze the provided diff and return a JSON array of review comments.
        Each comment: {"file": str, "line": int, "severity": "critical|high|medium|low",
        "category": "security|performance|logic|style|documentation",
        "comment": str, "suggestion": str}
        Focus on actionable issues. Skip trivial style nits unless they affect readability.""",
        messages=[{
            "role": "user",
            "content": f"Review this pull request diff:\n\n{diff[:15000]}"
        }]
    )
    import json
    # Extract JSON from response
    text = response.content[0].text
    start = text.find('[')
    end = text.rfind(']') + 1
    return json.loads(text[start:end])

def post_review_comments(repo: str, pr_number: int, commit_sha: str, comments: list):
    """Post review comments to the GitHub PR."""
    review_body = {
        "commit_id": commit_sha,
        "event": "COMMENT",
        "comments": [
            {
                "path": c["file"],
                "line": c["line"],
                "body": f"**[{c['severity'].upper()}] {c['category'].title()}**\n\n{c['comment']}\n\n**Suggestion:** {c['suggestion']}"
            }
            for c in comments if c.get("severity") in ["critical", "high", "medium"]
        ]
    }
    httpx.post(
        f"https://api.github.com/repos/{repo}/pulls/{pr_number}/reviews",
        json=review_body,
        headers={"Authorization": f"Bearer {GITHUB_TOKEN}"}
    )

# Main review flow (called from GitHub Actions)
repo = "my-org/my-repo"
pr_number = 142
commit_sha = "abc123def456"

diff = get_pr_diff(repo, pr_number)
comments = review_diff(diff)
post_review_comments(repo, pr_number, commit_sha, comments)
print(f"Posted {len(comments)} review comments")

Example 2: Automated Test Generation Agent#

Use Case: Given a source file, generate a comprehensive test suite covering happy paths, edge cases, and error conditions — using the actual function signatures and docstrings as context.

Architecture: File reader → AST parser (for function signatures) → test generation agent → test runner validation loop.

Key Implementation:

import ast
import subprocess
from openai import OpenAI

client = OpenAI()

def extract_functions(source_code: str) -> list[dict]:
    """Extract function signatures and docstrings from Python source."""
    tree = ast.parse(source_code)
    functions = []
    for node in ast.walk(tree):
        if isinstance(node, ast.FunctionDef):
            docstring = ast.get_docstring(node) or "No docstring"
            args = [a.arg for a in node.args.args]
            functions.append({
                "name": node.name,
                "args": args,
                "docstring": docstring,
                "lineno": node.lineno
            })
    return functions

def generate_tests(source_code: str, filename: str) -> str:
    """Generate pytest tests for a Python module."""
    functions = extract_functions(source_code)

    response = client.chat.completions.create(
        model="gpt-4o",
        temperature=0.1,
        messages=[
            {
                "role": "system",
                "content": """You are a senior Python engineer writing pytest test suites.
                Generate comprehensive tests covering:
                1. Happy path (normal inputs)
                2. Edge cases (empty, None, boundary values)
                3. Error conditions (invalid types, missing required args)
                4. Any domain-specific constraints from docstrings
                Use pytest fixtures and parametrize where appropriate.
                Import the module correctly and use descriptive test names."""
            },
            {
                "role": "user",
                "content": f"Write tests for this module ({filename}):\n\n{source_code}\n\nFunctions found: {functions}"
            }
        ]
    )
    return response.choices[0].message.content

def validate_tests(test_code: str, test_file: str) -> dict:
    """Run generated tests and return pass/fail results."""
    with open(test_file, "w") as f:
        f.write(test_code)

    result = subprocess.run(
        ["pytest", test_file, "--tb=short", "-q"],
        capture_output=True,
        text=True,
        timeout=60
    )
    return {
        "passed": result.returncode == 0,
        "output": result.stdout + result.stderr
    }

# Generate and validate tests
with open("src/user_service.py") as f:
    source = f.read()

test_code = generate_tests(source, "user_service.py")
results = validate_tests(test_code, "tests/test_user_service.py")

if results["passed"]:
    print("All generated tests pass!")
else:
    print("Some tests failed — review and fix:", results["output"])

Example 3: Bug Diagnosis and Fix Agent#

Use Case: Given a bug report with a stack trace and reproduction steps, the agent locates the relevant code, diagnoses the root cause, and proposes a fix with explanation.

Architecture: LangChain ReAct agent + code reading tools + AST analysis + test execution tool.

Key Implementation:

from langchain_anthropic import ChatAnthropic
from langchain_core.tools import tool
from langchain.agents import create_react_agent, AgentExecutor
from langchain import hub
import subprocess, os

@tool
def read_file(file_path: str) -> str:
    """Read the contents of a source code file."""
    try:
        with open(file_path) as f:
            return f.read()
    except FileNotFoundError:
        return f"File not found: {file_path}"

@tool
def search_codebase(pattern: str) -> str:
    """Search the codebase for a pattern using grep."""
    result = subprocess.run(
        ["grep", "-rn", "--include=*.py", pattern, "src/"],
        capture_output=True, text=True, timeout=10
    )
    return result.stdout[:3000] if result.stdout else "No matches found"

@tool
def run_tests(test_file: str) -> str:
    """Run a specific test file and return results."""
    result = subprocess.run(
        ["pytest", test_file, "-v", "--tb=short"],
        capture_output=True, text=True, timeout=60
    )
    return (result.stdout + result.stderr)[:2000]

@tool
def propose_fix(file_path: str, original_code: str, fixed_code: str, explanation: str) -> str:
    """Propose a code fix with explanation (does NOT apply automatically)."""
    return f"""
PROPOSED FIX for {file_path}:
Explanation: {explanation}

BEFORE:
{original_code}

AFTER:
{fixed_code}
"""

llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
tools = [read_file, search_codebase, run_tests, propose_fix]
agent = create_react_agent(llm=llm, tools=tools, prompt=hub.pull("hwchase17/react"))
executor = AgentExecutor(agent=agent, tools=tools, max_iterations=10, verbose=True)

bug_report = """
Stack trace:
  File "src/payment_service.py", line 87, in process_refund
    amount = order.total_amount * refund_percentage
TypeError: unsupported operand type(s) for *: 'Decimal' and 'float'

Reproduction: Call process_refund(order_id=1234, percentage=0.5)
Expected: Refund of 50% processed successfully
"""

result = executor.invoke({"input": f"Diagnose and fix this bug:\n{bug_report}"})
print(result["output"])

Multiple monitors showing code review and debugging sessions

Example 4: Documentation Generation Agent#

Use Case: Generate comprehensive API documentation from source code, including function descriptions, parameter tables, examples, and edge case notes.

Architecture: AST parser → docstring enrichment agent → Markdown formatter → output to docs directory.

Key Implementation:

import ast
from openai import OpenAI
from pathlib import Path

client = OpenAI()

def generate_api_docs(source_file: str) -> str:
    """Generate Markdown documentation for a Python module."""
    with open(source_file) as f:
        source = f.read()

    tree = ast.parse(source)
    module_docstring = ast.get_docstring(tree) or ""

    functions_info = []
    for node in ast.walk(tree):
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
            if node.name.startswith("_"):
                continue  # Skip private functions

            # Extract type annotations
            args_with_types = []
            for arg in node.args.args:
                annotation = ""
                if arg.annotation:
                    annotation = ast.unparse(arg.annotation)
                args_with_types.append(f"{arg.arg}: {annotation}" if annotation else arg.arg)

            return_annotation = ""
            if node.returns:
                return_annotation = ast.unparse(node.returns)

            functions_info.append({
                "name": node.name,
                "args": args_with_types,
                "returns": return_annotation,
                "existing_docstring": ast.get_docstring(node) or "",
                "source": ast.unparse(node)[:500]
            })

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": """Generate comprehensive Markdown API documentation.
                For each function include:
                - Description (what it does, why you'd use it)
                - Parameters table with name, type, description, default
                - Return value description
                - Example usage (realistic, not trivial)
                - Edge cases and common errors
                Format as clean Markdown suitable for a developer docs site."""
            },
            {
                "role": "user",
                "content": f"Module: {source_file}\n\nFunctions: {functions_info}"
            }
        ]
    )
    return response.choices[0].message.content

# Generate docs for all Python files in src/
for py_file in Path("src/").rglob("*.py"):
    if py_file.name.startswith("_"):
        continue

    docs = generate_api_docs(str(py_file))
    output_path = Path("docs/api") / py_file.with_suffix(".md").name
    output_path.parent.mkdir(parents=True, exist_ok=True)
    output_path.write_text(docs)
    print(f"Generated: {output_path}")

Outcome: Complete API documentation for an entire codebase generated in minutes. Documentation stays fresh when run as a pre-commit hook or CI step after significant changes.

Example 5: Automated Refactoring Agent#

Use Case: Identify and refactor code smells — long functions, duplicated logic, poor naming — across a codebase while preserving existing test coverage.

Architecture: Code smell detector → refactoring planner → change generator → test validation loop.

Key Implementation:

import subprocess
from anthropic import Anthropic
from pathlib import Path

client = Anthropic()

def detect_code_smells(file_path: str) -> list[dict]:
    """Identify refactoring opportunities in a Python file."""
    with open(file_path) as f:
        source = f.read()

    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": f"""Identify refactoring opportunities in this code.
            Return JSON array: [{{"type": str, "line_start": int, "line_end": int,
            "description": str, "priority": "high|medium|low"}}]

            Focus on: functions >40 lines, duplicated blocks >5 lines,
            unclear variable names, deep nesting >4 levels, missing type hints.

            Code:\n{source}"""
        }]
    )
    import json
    text = response.content[0].text
    start, end = text.find('['), text.rfind(']') + 1
    return json.loads(text[start:end]) if start != -1 else []

def generate_refactoring(file_path: str, smell: dict, source: str) -> dict:
    """Generate a specific refactoring for a code smell."""
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=3000,
        messages=[{
            "role": "user",
            "content": f"""Refactor this specific issue in {file_path}:
            Issue: {smell['description']} at lines {smell['line_start']}-{smell['line_end']}

            Provide: {{"original": str, "refactored": str, "explanation": str}}
            Preserve all existing behavior exactly.

            Full source:\n{source}"""
        }]
    )
    import json
    text = response.content[0].text
    start, end = text.find('{'), text.rfind('}') + 1
    return json.loads(text[start:end])

def validate_refactoring(file_path: str) -> bool:
    """Run tests to validate refactoring didn't break anything."""
    result = subprocess.run(
        ["pytest", "tests/", "--tb=short", "-q"],
        capture_output=True, text=True, timeout=120
    )
    return result.returncode == 0

# Refactoring pipeline
target_file = "src/order_processor.py"
smells = detect_code_smells(target_file)
high_priority = [s for s in smells if s["priority"] == "high"]

print(f"Found {len(high_priority)} high-priority refactoring opportunities")

with open(target_file) as f:
    source = f.read()

for smell in high_priority[:3]:  # Process top 3 to limit scope
    refactoring = generate_refactoring(target_file, smell, source)
    print(f"\nProposed: {refactoring['explanation']}")
    # Human applies change, then validate:
    # if validate_refactoring(target_file):
    #     print("Tests pass after refactoring")

Example 6: Dependency Vulnerability Scanner#

Use Case: Scan project dependencies for known CVEs, assess exploitability in context, and generate a prioritized remediation plan with specific version upgrade paths.

Architecture: pip audit / npm audit execution → CVE enrichment via NVD API → AI context analysis → remediation report.

Key Implementation:

import subprocess
import json
import httpx
from openai import OpenAI

client = OpenAI()

def run_pip_audit() -> list[dict]:
    """Run pip-audit and return JSON results."""
    result = subprocess.run(
        ["pip-audit", "--format=json", "--output=-"],
        capture_output=True, text=True
    )
    return json.loads(result.stdout).get("dependencies", [])

def analyze_vulnerabilities(vulns: list[dict], project_context: str) -> str:
    """Use AI to assess exploitability and prioritize remediations."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": """You are a security engineer analyzing dependency vulnerabilities.
                For each vulnerability, assess:
                1. Actual exploitability given the project context
                2. Whether the vulnerable code path is exercised
                3. Remediation priority (Critical/High/Medium/Low/Informational)
                4. Specific version to upgrade to
                5. Any breaking changes to watch for in the upgrade
                Output as a structured markdown security report."""
            },
            {
                "role": "user",
                "content": f"Project context: {project_context}\n\nVulnerabilities: {json.dumps(vulns, indent=2)}"
            }
        ]
    )
    return response.choices[0].message.content

# Run audit
vulnerabilities = run_pip_audit()
if vulnerabilities:
    project_context = "Python Flask REST API using SQLAlchemy ORM, deployed on AWS Lambda. Handles user authentication and payment processing."
    report = analyze_vulnerabilities(vulnerabilities, project_context)

    with open("security-report.md", "w") as f:
        f.write(report)
    print("Security report generated: security-report.md")
else:
    print("No vulnerabilities found")

Outcome: Context-aware vulnerability prioritization that avoids false alarms for vulnerabilities in code paths your application never uses. Security teams get actionable reports, not noise.

Example 7: Commit Message and Changelog Generator#

Use Case: Generate descriptive, conventional-commit-formatted commit messages and aggregate them into a structured CHANGELOG.md entry automatically.

Architecture: git diff --staged → AI message generator → git commit wrapper → periodic changelog aggregation.

Key Implementation:

import subprocess
from anthropic import Anthropic

client = Anthropic()

def get_staged_diff() -> str:
    """Get the current staged diff."""
    result = subprocess.run(
        ["git", "diff", "--staged", "--stat", "--diff-algorithm=minimal"],
        capture_output=True, text=True
    )
    diff_result = subprocess.run(
        ["git", "diff", "--staged"],
        capture_output=True, text=True
    )
    return result.stdout + "\n" + diff_result.stdout[:8000]

def generate_commit_message(diff: str) -> str:
    """Generate a conventional commit message for the staged changes."""
    response = client.messages.create(
        model="claude-3-5-haiku-20241022",  # Fast and cost-efficient for this task
        max_tokens=300,
        messages=[{
            "role": "user",
            "content": f"""Generate a conventional commit message for these staged changes.

            Format: <type>(<scope>): <description>
            [blank line]
            [optional body: what changed and why, max 3 bullet points]

            Types: feat|fix|docs|style|refactor|test|chore|perf|ci
            Rules: imperative mood, ≤72 chars first line, no period at end

            Diff:\n{diff}

            Return ONLY the commit message, nothing else."""
        }]
    )
    return response.content[0].text.strip()

def commit_with_message():
    """Generate message and commit staged changes."""
    diff = get_staged_diff()
    if not diff.strip():
        print("No staged changes found.")
        return

    message = generate_commit_message(diff)
    print(f"\nGenerated commit message:\n{message}\n")

    confirm = input("Use this message? [Y/n/e(dit)]: ").strip().lower()
    if confirm in ("", "y"):
        subprocess.run(["git", "commit", "-m", message])
    elif confirm == "e":
        # Open in editor
        import tempfile, os
        with tempfile.NamedTemporaryFile(suffix=".txt", mode="w", delete=False) as f:
            f.write(message)
            tmp = f.name
        os.system(f"$EDITOR {tmp}")
        with open(tmp) as f:
            edited = f.read()
        subprocess.run(["git", "commit", "-m", edited.strip()])

commit_with_message()

Outcome: Consistent, well-formatted commit history without the mental overhead of writing commit messages manually. The human always reviews before committing, maintaining control.

Choosing the Right Coding Agent Approach#

The Coding Agent tutorial provides a complete development environment setup including sandboxing, tool permissions, and CI integration.

Getting Started#

For understanding the agentic patterns that power these examples, ReAct reasoning explains the think-act-observe loop that most coding agents implement.

Frequently Asked Questions#

The FAQ section renders from the frontmatter faq array above.