🤖AI Agents Guide
TutorialsComparisonsReviewsExamplesIntegrationsUse CasesTemplatesGlossary
Get Started
🤖AI Agents Guide

Your comprehensive resource for understanding, building, and implementing AI Agents.

Learn

  • Tutorials
  • Glossary
  • Use Cases
  • Examples

Compare

  • Tool Comparisons
  • Reviews
  • Integrations
  • Templates

Company

  • About
  • Contact
  • Privacy Policy

© 2026 AI Agents Guide. All rights reserved.

Home/Tutorials/Build an AI Coding Agent in Python
advanced36 min read

Build an AI Coding Agent in Python

Learn how to build an AI coding agent that generates, reviews, and automatically tests code using LangChain tool use, sandboxed execution, and a self-repair loop that fixes failing tests without human intervention.

person holding sticky note
Photo by Hitesh Choudhary on Unsplash
By AI Agents Guide Team•February 28, 2026

Table of Contents

  1. What You'll Learn
  2. Prerequisites
  3. Architecture Overview
  4. Step 1: Setup
  5. Step 2: Sandboxed Code Execution Tool
  6. Step 3: LangChain Tools
  7. Step 4: The Coding Agent Loop
  8. Step 5: CLI Interface
  9. Production Considerations
  10. What's Next
Terminal output showing automated test results from an AI coding workflow
Photo by Shahadat Rahman on Unsplash

Build an AI Coding Agent in Python

The promise of AI coding assistants has moved well past autocomplete. A coding agent can accept a natural language task, write the implementation, generate tests, execute those tests in a sandbox, and iterate on failures until the tests pass — all without manual intervention. This tutorial builds exactly that system.

The agent you will create handles the full coding loop: it generates Python code from a specification, writes a pytest test suite, executes the tests in an isolated subprocess, parses the failures, and repairs the code. It also includes a code review step that catches quality issues before tests even run.

Before building, understand how agent sandboxes work and why code execution isolation is non-negotiable for safety.

What You'll Learn#

  • How to define tools for code generation, file writing, and test execution
  • How to build a repair loop that feeds test failures back to the LLM
  • How to sandbox code execution to prevent malicious or runaway code
  • How to add a code review step before execution
  • How to integrate the agent with a simple CLI interface

Prerequisites#

  • Python 3.10+
  • OpenAI API key
  • Familiarity with LangChain agent patterns
  • Understanding of AI agent concepts

Architecture Overview#

The coding agent follows a five-step loop:

  1. Specification Parser — Extracts function signatures, requirements, and constraints from the user prompt
  2. Code Generator — Writes the implementation file
  3. Test Generator — Writes a pytest test file with edge cases
  4. Code Reviewer — Analyzes the code for quality issues before execution
  5. Test Runner + Repair Loop — Executes tests, parses failures, sends error context back to the generator, and retries up to N times

Step 1: Setup#

pip install langchain==0.3.0 langchain-openai==0.2.0 python-dotenv==1.0.1 \
    pytest==8.3.0 black==24.8.0 ruff==0.6.9
# .env
OPENAI_API_KEY=sk-proj-...
MAX_REPAIR_ITERATIONS=5
EXECUTION_TIMEOUT_SECONDS=30

Step 2: Sandboxed Code Execution Tool#

Never execute LLM-generated code directly in your main process. Use a subprocess with strict resource limits.

# tools/executor.py
import subprocess
import tempfile
import os
import sys
import textwrap
from pathlib import Path

class SandboxedExecutor:
    """Run Python code in an isolated subprocess with timeout."""

    def __init__(self, timeout: int = 30):
        self.timeout = timeout

    def run_pytest(self, code: str, test_code: str) -> dict:
        """Write code and tests to temp files, run pytest, return results."""
        with tempfile.TemporaryDirectory() as tmpdir:
            tmp_path = Path(tmpdir)

            # Write implementation
            impl_file = tmp_path / "implementation.py"
            impl_file.write_text(code)

            # Write tests
            test_file = tmp_path / "test_implementation.py"
            test_file.write_text(test_code)

            # Run pytest in a subprocess with timeout
            try:
                result = subprocess.run(
                    [sys.executable, "-m", "pytest", str(test_file), "-v",
                     "--tb=short", "--no-header", "-q"],
                    capture_output=True,
                    text=True,
                    timeout=self.timeout,
                    cwd=tmpdir,
                    env={**os.environ, "PYTHONPATH": tmpdir},
                )
                return {
                    "returncode": result.returncode,
                    "stdout": result.stdout,
                    "stderr": result.stderr,
                    "passed": result.returncode == 0,
                }
            except subprocess.TimeoutExpired:
                return {
                    "returncode": -1,
                    "stdout": "",
                    "stderr": f"Execution timed out after {self.timeout}s",
                    "passed": False,
                }

Step 3: LangChain Tools#

Define the tools the agent can call:

# tools/coding_tools.py
from langchain.tools import tool
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
from tools.executor import SandboxedExecutor

executor = SandboxedExecutor(timeout=30)
llm = ChatOpenAI(model="gpt-4o", temperature=0.1)

class CodeOutput(BaseModel):
    code: str = Field(description="The complete Python implementation code")
    explanation: str = Field(description="Brief explanation of the implementation approach")

class TestOutput(BaseModel):
    test_code: str = Field(description="Complete pytest test file content")
    test_count: int = Field(description="Number of test cases written")

@tool
def generate_code(specification: str) -> dict:
    """Generate Python code from a natural language specification."""
    chain = ChatPromptTemplate.from_messages([
        ("system", """You are an expert Python developer.
Write clean, well-typed Python code following the specification.
Include type hints, docstrings, and handle edge cases.
Return only the implementation — no test code."""),
        ("human", "Specification:\n{spec}"),
    ]) | llm.with_structured_output(CodeOutput)

    result = chain.invoke({"spec": specification})
    return {"code": result.code, "explanation": result.explanation}

@tool
def generate_tests(code: str, specification: str) -> dict:
    """Generate pytest tests for the given implementation."""
    chain = ChatPromptTemplate.from_messages([
        ("system", """You are a Python test engineer.
Write comprehensive pytest tests for this code.
Cover: happy path, edge cases, error cases, boundary values.
Import the implementation from 'implementation' module."""),
        ("human", "Code:\n{code}\n\nOriginal specification:\n{spec}"),
    ]) | llm.with_structured_output(TestOutput)

    result = chain.invoke({"code": code, "spec": specification})
    return {"test_code": result.test_code, "test_count": result.test_count}

@tool
def run_tests(code: str, test_code: str) -> dict:
    """Run pytest tests against the implementation and return results."""
    return executor.run_pytest(code, test_code)

@tool
def review_code(code: str) -> str:
    """Review Python code for quality issues, bugs, and style problems."""
    chain = ChatPromptTemplate.from_messages([
        ("system", """You are a senior Python code reviewer.
Identify: bugs, security issues, performance problems, missing error handling.
Be concise — list specific issues with line references if possible.
If code is clean, say 'LGTM'."""),
        ("human", "{code}"),
    ]) | llm

    result = chain.invoke({"code": code})
    return result.content

@tool
def repair_code(code: str, test_output: str, specification: str) -> str:
    """Fix Python code based on failing test output."""
    chain = ChatPromptTemplate.from_messages([
        ("system", """You are debugging Python code.
Read the failing test output carefully.
Fix the implementation to make all tests pass.
Return only the corrected implementation code, no explanations."""),
        ("human", "Original specification:\n{spec}\n\nFailing code:\n{code}\n\nTest failures:\n{failures}"),
    ]) | llm

    result = chain.invoke({
        "spec": specification,
        "code": code,
        "failures": test_output,
    })
    return result.content

Terminal showing the coding agent's repair loop: failing tests on iteration 1, passing tests on iteration 3

Step 4: The Coding Agent Loop#

# agent.py
import os
from dotenv import load_dotenv

load_dotenv()

MAX_REPAIRS = int(os.getenv("MAX_REPAIR_ITERATIONS", 5))

def run_coding_agent(specification: str) -> dict:
    """
    Full coding agent loop: generate → review → test → repair.
    Returns the final code, tests, and execution report.
    """
    print(f"\n[Coding Agent] Starting for: {specification[:80]}...")

    # Step 1: Generate initial implementation
    print("[1/5] Generating implementation...")
    gen_result = generate_code.invoke({"specification": specification})
    code = gen_result["code"]
    print(f"      {gen_result['explanation']}")

    # Step 2: Code review
    print("[2/5] Running code review...")
    review = review_code.invoke({"code": code})
    if review != "LGTM":
        print(f"      Review findings: {review[:200]}")
        # Repair based on review before even running tests
        code = repair_code.invoke({
            "code": code,
            "test_output": f"Code review findings:\n{review}",
            "specification": specification,
        })

    # Step 3: Generate tests
    print("[3/5] Generating test suite...")
    test_result = generate_tests.invoke({"code": code, "specification": specification})
    test_code = test_result["test_code"]
    print(f"      Generated {test_result['test_count']} test cases")

    # Step 4: Run tests with repair loop
    print("[4/5] Running tests...")
    for iteration in range(MAX_REPAIRS + 1):
        test_run = run_tests.invoke({"code": code, "test_code": test_code})

        if test_run["passed"]:
            print(f"      All tests passed on iteration {iteration + 1}")
            break

        if iteration == MAX_REPAIRS:
            print(f"      Max repair iterations reached. Returning best effort.")
            break

        print(f"      Tests failed (iteration {iteration + 1}), repairing...")
        failure_context = test_run["stdout"] + "\n" + test_run["stderr"]
        code = repair_code.invoke({
            "code": code,
            "test_output": failure_context,
            "specification": specification,
        })

    return {
        "specification": specification,
        "final_code": code,
        "test_code": test_code,
        "tests_passed": test_run.get("passed", False),
        "test_output": test_run.get("stdout", ""),
        "iterations": iteration + 1,
    }

# Import tools at module level for the loop
from tools.coding_tools import generate_code, generate_tests, run_tests, review_code, repair_code

Step 5: CLI Interface#

# cli.py
import argparse
import json
from agent import run_coding_agent

def main():
    parser = argparse.ArgumentParser(description="AI Coding Agent")
    parser.add_argument("spec", help="Coding specification (natural language)")
    parser.add_argument("--output", help="Write final code to this file")
    parser.add_argument("--json", action="store_true", help="Output JSON report")
    args = parser.parse_args()

    result = run_coding_agent(args.spec)

    if args.json:
        print(json.dumps(result, indent=2))
    else:
        print("\n" + "="*60)
        print("FINAL IMPLEMENTATION")
        print("="*60)
        print(result["final_code"])
        print(f"\nTests {'PASSED' if result['tests_passed'] else 'FAILED'} "
              f"after {result['iterations']} iteration(s)")

    if args.output and result["final_code"]:
        with open(args.output, "w") as f:
            f.write(result["final_code"])
        print(f"\nCode written to {args.output}")

if __name__ == "__main__":
    main()

Run it:

python cli.py "Write a Python function that takes a list of integers and returns \
  the top-k most frequent elements. Handle edge cases for empty lists and k > len(list)."

Production Considerations#

For production deployments, review the AI agent security best practices guide — code execution agents carry significant risk if not properly sandboxed. Key hardening steps:

  • Run the subprocess executor inside a Docker container with --network none and read-only filesystem mounts
  • Set CPU and memory limits on the execution subprocess using resource.setrlimit
  • Block dangerous imports (os, sys, subprocess) by scanning the generated code before execution
  • Add Langfuse observability to track repair loop counts and identify specifications that consistently fail
  • Use human-in-the-loop approval before the generated code is merged into a real repository

What's Next#

  • Add this agent to a larger LangGraph multi-agent system
  • Deploy the coding agent as a service using the Docker deployment guide
  • Review AI agent testing patterns to test the agent itself
  • Explore engineering AI agent use cases for real-world applications
  • Read about agent sandboxes for deeper security design patterns

Related Tutorials

How to Create a Meeting Scheduling AI Agent

Build an autonomous AI agent to handle meeting scheduling, calendar checks, and bookings intelligently. This step-by-step tutorial covers Python implementation with LangChain, Google Calendar integration, and advanced features like conflict resolution for efficient automation.

How to Manage Multiple AI Agents

Master managing multiple AI agents with this in-depth tutorial. Learn orchestration, state sharing, parallel execution, and scaling using LangGraph and custom tools. From basics to production-ready swarms for complex tasks.

How to Train an AI Agent on Your Own Data

Master training AI agents on custom data with three methods: context stuffing, RAG using vector databases, and fine-tuning. This beginner-to-advanced guide includes step-by-step code examples, pitfalls, and best practices to build knowledgeable agents for your specific needs.

← Back to All Tutorials