How to Build an AI Agent from Scratch (Step-by-Step Guide) | AI Agents Guide

Architecture diagram and code structure for building a complete AI agent system — Photo by Myriam Jessier on Unsplash

Introduction#

An AI agent is a program that uses a large language model to reason about a task and take actions — calling tools, retrieving information, and producing outputs — until the task is complete. Unlike a simple chatbot that only responds with text, an agent can search the web, write and execute code, send emails, query databases, and more.

This guide walks you through building your first AI agent from scratch. You will learn how agents think, how to give them tools to act, how memory works, and how to go from a script running on your laptop to a deployed agent you can use in the real world.

This is a practical, hands-on tutorial. Code examples use Python, but the concepts apply to any language or framework.

For a broader map of the space, start with the tutorials index. If you want to understand how agents are structured theoretically before building one, the agent loop glossary entry and tool use glossary entry are good primers.

Why Build an Agent Instead of Using a Chatbot?#

A standard chatbot takes a message and returns a response. It cannot take actions in the world. An agent can:

Search the web and retrieve current information
Read and write files on a computer
Query APIs and databases
Call external services (send emails, create calendar events)
Execute code and return results
Chain multiple actions together to complete complex tasks

The key difference is autonomy: an agent decides which tools to use, in what order, and keeps working until the task is done. You give it a goal; it figures out the steps.

Prerequisites#

Before you start, you need:

Python 3.10 or higher installed
Basic Python knowledge (functions, classes, dictionaries)
An OpenAI API key (or another LLM provider key)
pip for installing packages

You do not need prior AI experience. This guide explains every concept as it is introduced.

Step 1: Understand the Agent Loop#

Before writing a single line of code, understand what an agent actually does when it runs. This is called the agent loop:

Receive task: The agent receives a user instruction or goal.
Reason: The LLM thinks about what to do next and decides whether to call a tool or produce a final answer.
Act: If the LLM decides to use a tool, the agent calls that tool with the chosen parameters.
Observe: The agent receives the tool's output and adds it to the conversation context.
Repeat: Steps 2-4 repeat until the LLM decides it has enough information to produce a final answer.
Respond: The agent returns its final answer to the user.

This loop — Reason → Act → Observe → Reason — is the foundation of every AI agent, whether it runs for two steps or twenty. Understanding it will help you debug agents when they get stuck and design better prompts that guide the reasoning effectively.

Step 2: Set Up Your Environment#

Install the dependencies you need:

pip install openai langchain langchain-openai python-dotenv

Create a .env file in your project directory:

OPENAI_API_KEY=your-api-key-here

Create your main script:

from dotenv import load_dotenv
load_dotenv()

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Verify the connection
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Run this to confirm your API key works before building the agent.

Step 3: Define Your Tools#

Tools are the actions your agent can take. Each tool is a Python function with a clear description that tells the LLM when and how to use it.

Function calling is the mechanism that lets the LLM request a tool call — it outputs a structured request specifying which tool to call and with what arguments, rather than free-form text.

Here are three practical tools for a research agent:

import json
import requests

def search_web(query: str) -> str:
    """
    Search the web for information about a topic.
    Use this when you need current information or facts.
    
    Args:
        query: The search query string
        
    Returns:
        A string containing search results
    """
    # In production, use a real search API (Tavily, Serper, etc.)
    # This is a placeholder for demonstration
    return f"Search results for '{query}': [Result 1: ...] [Result 2: ...]"


def calculate(expression: str) -> str:
    """
    Evaluate a mathematical expression safely.
    Use this for any numerical calculations.
    
    Args:
        expression: A valid mathematical expression like '2 + 2' or '100 * 0.15'
        
    Returns:
        The result of the calculation as a string
    """
    try:
        # Only allow safe mathematical operations
        allowed = set("0123456789+-*/()., ")
        if not all(c in allowed for c in expression):
            return "Error: Invalid characters in expression"
        result = eval(expression)
        return str(result)
    except Exception as e:
        return f"Calculation error: {str(e)}"


def get_current_date() -> str:
    """
    Get today's date and time.
    Use this when the user asks about dates or when you need the current date.
    
    Returns:
        Current date and time as a formatted string
    """
    from datetime import datetime
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")

Good tool design follows three rules:

One tool, one purpose. Do not build a tool that does five things. Build five focused tools.
Clear docstrings. The LLM reads your docstrings to decide when to use each tool. Write them for the LLM, not just for human developers.
Always return strings. Tool outputs go back into the LLM's context as text. Return strings (or JSON strings for structured data), not Python objects.

For more on tool design philosophy, see tool use in the glossary.

Step 4: Build the Agent Loop#

Now wire together the LLM and tools into a working agent loop. This is the core of your agent.

import json
from openai import OpenAI

client = OpenAI()

# Define tools in OpenAI's function calling format
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for information about a topic.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query"
                    }
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function", 
        "function": {
            "name": "calculate",
            "description": "Evaluate a mathematical expression.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "The math expression to evaluate"
                    }
                },
                "required": ["expression"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_current_date",
            "description": "Get the current date and time.",
            "parameters": {
                "type": "object",
                "properties": {}
            }
        }
    }
]

# Map tool names to Python functions
tool_functions = {
    "search_web": search_web,
    "calculate": calculate,
    "get_current_date": get_current_date,
}

def run_agent(task: str, max_iterations: int = 10) -> str:
    """
    Run the agent loop for a given task.
    """
    messages = [
        {
            "role": "system",
            "content": (
                "You are a helpful AI assistant with access to tools. "
                "Use the tools available to you to complete the user's task. "
                "Think step by step. When you have enough information, "
                "provide a clear final answer."
            )
        },
        {"role": "user", "content": task}
    ]
    
    for iteration in range(max_iterations):
        # Call the LLM
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )
        
        message = response.choices[0].message
        
        # If no tool calls, we have a final answer
        if not message.tool_calls:
            return message.content
        
        # Add the assistant's message to history
        messages.append(message)
        
        # Execute each tool call
        for tool_call in message.tool_calls:
            tool_name = tool_call.function.name
            tool_args = json.loads(tool_call.function.arguments)
            
            print(f"Calling tool: {tool_name} with args: {tool_args}")
            
            # Call the tool
            tool_fn = tool_functions.get(tool_name)
            if tool_fn:
                result = tool_fn(**tool_args)
            else:
                result = f"Error: Unknown tool '{tool_name}'"
            
            # Add tool result to messages
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result
            })
    
    return "Agent reached maximum iterations without completing the task."


# Run the agent
result = run_agent("What is 15% of 840, and what day of the week is it today?")
print(f"\nFinal answer: {result}")

Architecture diagram and code structure for building a complete AI agent system

Run this and you will see the agent call calculate to compute the percentage, call get_current_date to check the day, and then combine the results into a final answer. You have built a working agent.

Step 5: Add Memory#

The basic agent above has no memory — it starts fresh every run. For most real-world use cases, you need agents that remember previous interactions or maintain state across a session.

There are two types of memory to understand:

Short-term memory is the conversation history stored in the messages list. Everything the agent has seen in the current session is available. This is limited by the model's context window.

Long-term memory persists between sessions. This is typically implemented by storing summaries or key facts in a database and retrieving relevant memories at the start of each run.

Here is a simple long-term memory implementation:

import json
from pathlib import Path

class SimpleMemory:
    def __init__(self, memory_file: str = "agent_memory.json"):
        self.memory_file = Path(memory_file)
        self.memories = self._load()
    
    def _load(self) -> list:
        if self.memory_file.exists():
            return json.loads(self.memory_file.read_text())
        return []
    
    def save(self, fact: str):
        """Store a fact in long-term memory."""
        self.memories.append(fact)
        self.memory_file.write_text(json.dumps(self.memories, indent=2))
    
    def recall(self, limit: int = 5) -> str:
        """Retrieve recent memories as context."""
        recent = self.memories[-limit:] if self.memories else []
        if not recent:
            return "No previous memories."
        return "Previous context:\n" + "\n".join(f"- {m}" for m in recent)


# Use memory in your agent
memory = SimpleMemory()

def run_agent_with_memory(task: str) -> str:
    memory_context = memory.recall()
    
    messages = [
        {
            "role": "system",
            "content": (
                "You are a helpful AI assistant with memory and access to tools.\n"
                f"{memory_context}\n\n"
                "Use tools as needed to complete the user's task."
            )
        },
        {"role": "user", "content": task}
    ]
    # ... rest of agent loop

For production memory systems, look at vector databases for semantic retrieval — the vector database glossary entry covers the core concepts.

Step 6: Test Locally Before Deploying#

Before deploying, run your agent through a set of test cases that cover:

The primary task it was designed for
Edge cases and unusual inputs
Failure conditions (what happens when a tool returns an error?)
Multi-step tasks that require several tool calls

Add print statements or logging to trace the agent's reasoning:

print(f"Iteration {iteration + 1}")
print(f"LLM response: {message.content or '[tool call]'}")
if message.tool_calls:
    for tc in message.tool_calls:
        print(f"  Tool: {tc.function.name}, Args: {tc.function.arguments}")

Tracing lets you see exactly what the agent decides at each step, which is essential for debugging. Once you are satisfied with local behavior, consult the how to deploy AI agents in your company guide for production deployment patterns.

Common Mistakes When Building Your First Agent#

Writing vague tool descriptions. The LLM decides which tools to use based entirely on their descriptions. Vague descriptions lead to wrong tool choices. Write precise, specific descriptions that tell the LLM exactly when to use each tool and what it returns.

Not limiting iterations. Without a max_iterations cap, a confused agent can loop indefinitely, burning tokens and money. Always set a maximum.

Returning Python objects from tools. When a tool returns a dictionary or list instead of a string, it will cause errors when the runtime tries to inject it into the message context. Always return strings.

Not handling tool errors. Tools fail. APIs time out. Data is missing. Every tool should return an error string on failure rather than raising an exception that crashes the agent.

Starting with too many tools. More tools create more opportunities for the agent to get confused. Start with the minimum set of tools needed for your core use case, test thoroughly, then add more.

Skipping the system prompt. The system prompt is your primary lever for controlling agent behavior. A well-written system prompt is the difference between an agent that reliably does what you want and one that goes off in unpredictable directions.

Best Practices for Building AI Agents#

Define one tool per action. Narrow, focused tools are more reliable than broad multi-purpose tools.
Version your prompts. Store system prompts in version control and track which version produced which results.
Set explicit stopping conditions. Tell the agent in the system prompt when it has enough information to stop.
Log every run. Store inputs, tool calls, and outputs for debugging and future evaluation.
Build evaluation before scaling. Before building more features, set up evaluation so you can measure whether changes help or hurt.
Use structured output where possible. For tasks with defined output schemas, use structured output to enforce consistency.

For framework comparisons that help you decide where to build, see the open-source vs commercial AI agent frameworks comparison.

Conclusion#

You have now built a working AI agent from scratch. The core architecture — an LLM that reasons, calls tools, observes results, and repeats — scales from this simple example to production systems handling thousands of tasks per day. The fundamentals do not change; the complexity increases.

Your next steps: add more tools specific to your use case, build a proper evaluation suite using the AI agent evaluation guide, and explore a framework like LangChain or AutoGen (see LangChain vs AutoGen) to handle the orchestration boilerplate as your agent grows.

Start small, test early, and build incrementally.

Frequently Asked Questions#

Do I need to use a framework like LangChain to build an AI agent?

No. As this tutorial shows, you can build a working agent with just the OpenAI SDK and plain Python. Frameworks like LangChain add useful abstractions for memory management, tool integration, and multi-agent orchestration — they are worth adopting as your needs grow, but they are not required to get started. Building from scratch first helps you understand what the frameworks are doing for you.

Which LLM should I use for my first agent?

GPT-4o is the most reliable choice for beginners because its function calling support is mature and well-documented. Claude Sonnet and Gemini Pro are strong alternatives. Avoid smaller or older models for your first agent — weaker reasoning capabilities make debugging harder. Once you understand how agents work, you can experiment with smaller models for cost optimization.

How do I prevent my agent from running in infinite loops?

Always set a max_iterations parameter and enforce it in your agent loop. In addition, add a loop detection mechanism: if the agent calls the same tool with the same arguments twice in a row, that is a sign it is stuck. Break the loop and return an error explaining what happened. You can also instruct the agent in the system prompt to stop and ask for clarification rather than retrying indefinitely.

How much does it cost to run an AI agent?

Costs depend on the model, the length of your conversations, and how many tool calls the agent makes. A simple agent run using GPT-4o typically costs between $0.005 and $0.05 depending on task complexity. Multi-step agents with long conversation histories cost more. Track cost per run from the start — it tends to grow as agents become more capable and take on more complex tasks.