Person using multiple computer screens in a tech workspace representing the multi-tool environment of AI agent systems — Photo by ThisisEngineering on Unsplash

What Is Tool Use in AI Agents?

Q: What is the difference between tool use and function calling?

Function calling refers specifically to the mechanism by which an LLM outputs a structured JSON request to invoke a named function. Tool use is the broader concept that encompasses defining tools, presenting them to the model, handling the model's function call output, executing the actual function, and returning results back to the model. Function calling is the model's half of the tool use interaction; tool use describes the complete cycle.

Q: How do OpenAI and Anthropic implement tool use differently?

OpenAI uses the terms 'function calling' and 'tool calling' interchangeably in their chat completions API, with tools defined as JSON objects containing a name, description, and parameters schema. Anthropic uses the term 'tool use' in the Claude API, with tools defined using name, description, and input_schema. The conceptual model is identical — both allow the model to request tool invocations and receive results — but the API syntax, field names, and response structures differ. Frameworks like LangChain abstract these differences so the same agent code runs on both providers.

Quick Definition#

Tool use is the capability that allows an AI agent to call external functions, APIs, and services — extending its behavior far beyond text generation. Without tools, a language model can only produce text responses. With tools, an agent can search the web, execute code, read and write files, query databases, send messages, call external APIs, retrieve real-time data, and take actions in the world. Tool use is the defining capability that separates a language model from an action-taking agent.

For related concepts, see Function Calling for the underlying mechanism and Agentic Workflow for how tools are used in production pipelines. Browse the full AI Agents Glossary for all foundational terms.

Why Tool Use Matters#

Language models have enormous reasoning and generation capabilities, but they are bounded by their training data and context window. They cannot look up what happened yesterday, check a live price, send an email, write code and run it, or interact with any external system. These limitations matter enormously for real-world automation.

Tool use removes these limitations. Rather than a model that knows a great deal about the world up to its training cutoff, you get an agent that can act in the current world: look things up, take measurements, trigger workflows, retrieve user-specific data, and execute decisions rather than just recommend them.

This is why AI Agent Examples in Business almost always involve tool use — the value of agents comes from their ability to act, not merely to reason.

Built-In Tools vs. Custom Tools#

Built-In Tools#

Built-in tools are provided natively by the model provider and do not require the developer to implement the execution logic:

Code Interpreter (OpenAI): Executes Python code in a sandboxed environment. The agent can write code, run it, observe output, write more code, and iterate. Used for data analysis, calculation, chart generation, and file processing.

Web Search: Enables the agent to retrieve current information from the web. Available natively in some providers (Perplexity, OpenAI with browsing) or as a built-in tool in certain deployment configurations.

File I/O: Read and write files within a sandboxed context. The agent can process uploaded documents, generate files as output, and manage data between conversation turns.

Image analysis: Pass images to the model and receive structured descriptions, classifications, or analysis as part of a tool-augmented workflow.

Custom Tools#

Custom tools are any external function or API the developer exposes to the agent. There is no inherent limit on what can be a tool:

Search a company's internal knowledge base
Query a CRM for customer records
Send email or Slack messages
Create or update database records
Call any REST or GraphQL API
Execute shell commands
Trigger webhooks
Scrape web pages
Call specialized ML models

The developer defines the tool, the model decides when to call it and with what arguments, and the developer's code executes it and returns the result.

Tool Definition with JSON Schema#

Every tool is defined using a JSON Schema that tells the model:

What the tool is named — the model uses this name to invoke it
What it does — a natural language description that the model uses to decide when to call it
What parameters it expects — types, descriptions, and which are required

The quality of the description is critical. The model selects tools based on its understanding of what they do. A vague description leads to incorrect tool selection or missing required arguments.

{
  "type": "function",
  "function": {
    "name": "get_customer_orders",
    "description": "Retrieves the order history for a specific customer by their customer ID. Returns a list of orders including order date, total amount, status, and line items. Use this when the user asks about past purchases, order status, or order history.",
    "parameters": {
      "type": "object",
      "properties": {
        "customer_id": {
          "type": "string",
          "description": "The unique customer identifier from the CRM system (format: CUST-XXXXXX)"
        },
        "limit": {
          "type": "integer",
          "description": "Maximum number of orders to return. Defaults to 10 if not specified.",
          "minimum": 1,
          "maximum": 100
        },
        "status_filter": {
          "type": "string",
          "enum": ["all", "pending", "shipped", "delivered", "cancelled"],
          "description": "Filter orders by status. Use 'all' to return all orders regardless of status."
        }
      },
      "required": ["customer_id"]
    }
  }
}

Notice that the description explains not just what the tool does but when to use it. This is the most impactful improvement most developers can make to tool performance.

OpenAI Tool Use API#

OpenAI's chat completions API accepts tools as a list of function definitions. The model returns a tool_calls array when it decides to invoke a tool:

from openai import OpenAI
import json

client = OpenAI()

tools = [{
    "type": "function",
    "function": {
        "name": "search_knowledge_base",
        "description": "Search the company knowledge base for articles related to the query",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"},
                "max_results": {"type": "integer", "description": "Max articles to return", "default": 5}
            },
            "required": ["query"]
        }
    }
}]

messages = [{"role": "user", "content": "How do I reset my password?"}]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)

# Check if model wants to call a tool
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]

    # Execute the tool
    args = json.loads(tool_call.function.arguments)
    result = search_knowledge_base(**args)  # Your implementation

    # Return result to model
    messages.append(response.choices[0].message)
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(result)
    })

    # Get final response
    final_response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools
    )

Anthropic Claude Tool Use API#

Anthropic's Claude API uses the same conceptual model but different syntax:

import anthropic
import json

client = anthropic.Anthropic()

tools = [{
    "name": "search_knowledge_base",
    "description": "Search the company knowledge base for articles related to the query",
    "input_schema": {
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "Search query"},
            "max_results": {"type": "integer", "description": "Max articles to return"}
        },
        "required": ["query"]
    }
}]

messages = [{"role": "user", "content": "How do I reset my password?"}]

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    tools=tools,
    messages=messages
)

# Check for tool use
for block in response.content:
    if block.type == "tool_use":
        # Execute the tool
        result = search_knowledge_base(**block.input)

        # Return result to continue the conversation
        messages.append({"role": "assistant", "content": response.content})
        messages.append({
            "role": "user",
            "content": [{
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": json.dumps(result)
            }]
        })

        # Get final response
        final_response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )

The key differences: Anthropic uses input_schema instead of parameters, tool results go in user messages as tool_result blocks rather than in tool role messages, and the response structure uses content blocks rather than choices.

Developer workspace with multiple screens representing the multi-tool environment and code execution in AI agent systems

Tool Chains#

In multi-step agent workflows, tools are chained: the output of one tool becomes the input to the reasoning step that decides whether to call another tool. This is the foundation of the Agent Loop:

User request
    ↓
Model reasons → calls search_tool("current weather in Austin")
    ↓
Search returns weather data
    ↓
Model reasons → calls calendar_tool("today's schedule")
    ↓
Calendar returns schedule
    ↓
Model reasons → no more tools needed → generates response combining both results

Each tool result feeds the model's next reasoning step. Complex agents can chain dozens of tool calls across multiple turns to complete sophisticated tasks.

Tool Error Handling#

Tool errors are a primary source of agent reliability failures. Best practices:

Return structured errors: When a tool call fails, return a JSON object with an error code and message rather than raising an exception or returning an empty string. The model can reason about structured errors and adapt — an empty response produces undefined behavior.

def search_knowledge_base(query: str, max_results: int = 5) -> dict:
    try:
        results = kb_client.search(query, limit=max_results)
        return {"status": "success", "results": results, "count": len(results)}
    except ConnectionError:
        return {"status": "error", "error_code": "KB_UNAVAILABLE",
                "message": "Knowledge base is currently unavailable. Try again in 60 seconds."}
    except Exception as e:
        return {"status": "error", "error_code": "UNKNOWN", "message": str(e)}

Set retry limits: Allow the model to retry a tool call if the first attempt fails, but limit retries. An agent that retries indefinitely on a broken tool will exhaust its token budget without completing the task.

Design for graceful degradation: If a tool is unavailable, the agent should either fall back to an alternative tool or return a partial result with clear status rather than failing entirely.

Validate arguments before execution: Before passing model-generated arguments to a tool, validate them against the schema. This catches model errors early and prevents bad data from reaching downstream systems.

Tool Best Practices#

Keep tool counts manageable: Presenting an agent with more than 10-15 tools at once degrades performance — the model struggles to select the right tool from a long list. For agents with many tools, use dynamic tool loading: present only the tools relevant to the current workflow step.

Write tool descriptions for the model, not the user: The description is what the model reads to decide whether to use a tool. Explain the use case, not just the mechanics. "Use this when the user asks about their order history or past purchases" outperforms "Returns order data."

Make tools composable: Design tools that do one thing well rather than broad tools that try to handle many cases. Composable tools are easier for the model to chain correctly and easier for developers to maintain.

Log all tool calls: Record every tool invocation, argument set, and result for debugging, auditing, and observability. See Agent Observability for monitoring architecture.

Use least privilege: Grant each tool only the permissions it needs. A read-only search tool should not have write access. See AI Agent Guardrails for access control patterns.

Frequently Asked Questions#

What is tool use in AI agents?#

Tool use is the capability that allows an AI agent to invoke external functions, APIs, and services — going beyond text generation to take actions in the world. It is what transforms a language model into an action-taking agent capable of searching the web, executing code, querying databases, sending messages, and calling any API that has been exposed to it as a tool.

What is the difference between tool use and function calling?#

Function calling refers specifically to the LLM mechanism of outputting a structured JSON request to invoke a named function. Tool use is the complete cycle: defining tools, presenting them to the model, handling the function call output, executing the actual function, and returning results back to the model. Function calling is the model's half; tool use describes the full interaction.

How do OpenAI and Anthropic implement tool use differently?#

Both provide the same fundamental capability but with different API syntax. OpenAI uses parameters in tool definitions and tool role messages for results. Anthropic uses input_schema in tool definitions and tool_result blocks within user messages for results. Frameworks like LangChain abstract these differences so the same agent code works across both providers.

Term Snapshot