Managing multiple AI agents unlocks complex workflows like research pipelines or product builds, but coordination challenges like state conflicts and resource contention arise quickly. This tutorial guides you from single-agent basics to orchestrating swarms of 20+ agents, using practical LangGraph implementations and real-world strategies. You'll build a production-ready multi-agent system step-by-step.
Prerequisites and Background#
Before diving in, ensure familiarity with AI agents and frameworks like LangChain or LangGraph. Basic Python knowledge, including async programming and typing, is required. Install dependencies:
pip install langgraph langchain-openai asyncio
Understand key concepts: orchestration (centralized task routing), communication (message passing or shared state), and parallelism (concurrent execution). Review multi-agent use cases for inspiration, like swarms building products via parallel subtasks.
Single agents excel at isolated tasks, but multiples handle decomposition: a manager routes work to specialists (e.g., researcher, coder, tester). Challenges include context overflow, race conditions, and monitoring.
Core Concepts in Multi-Agent Management#
Agent Roles and Hierarchy#
Define clear roles: Manager Agent delegates, monitors, and synthesizes; Worker Agents execute subtasks. Use a hierarchical model where managers spawn sub-agents dynamically.
In swarms (10-20+ agents), apply "rules" like context playbooks: limit each agent's context window, use summaries for handoffs. From real experiments, maintain a CLAUDE.md-style playbook documenting agent behaviors.
State Management and Communication#
Shared state prevents silos. LangGraph's StateGraph uses TypedDict for schemas:
from typing import TypedDict, Annotated, Sequence
from langchain_core.messages import BaseMessage
import operator
class AgentState(TypedDict):
messages: Annotated[Sequence[BaseMessage], operator.add]
tasks: list[str]
results: dict[str, str]
next_agent: str
Agents update this state atomically. For unstructured comms, implement pub-sub via Redis or in-memory queues.
Parallel Execution#
Run agents concurrently to cut latency. LangGraph supports async nodes; custom tools parallelize via multiprocessing or asyncio.gather.
Security note: Apply API management like rate limits and RBAC (integrations/api-gateways/) to prevent token exhaustion or unauthorized actions.
Step-by-Step: Building a Multi-Agent Research Swarm#
We'll create a system where a manager decomposes a research topic into parallel subtasks (search, summarize, analyze), aggregates results.
Step 1: Define Agents#
Create specialist agents using LangChain tools.
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END
llm = ChatOpenAI(model="gpt-4o")
def researcher(state: AgentState) -> AgentState:
# Tool calls for web search, add results
state["results"]["research"] = "Sample findings..."
state["next_agent"] = "summarizer"
return state
def summarizer(state: AgentState) -> AgentState:
# Summarize research
state["results"]["summary"] = "Key insights..."
return state
# Parallel analyzer
async def analyzer(state: AgentState) -> AgentState:
# Async analysis
state["results"]["analysis"] = "Deep dive..."
return state
Step 2: Build the Graph with Manager#
Construct a StateGraph with conditional routing.
workflow = StateGraph(AgentState)
# Add manager node first
def manager(state: AgentState) -> AgentState:
topic = state["messages"][-1].content
subtasks = ["research", "analyze"] # Decompose
state["tasks"] = subtasks
state["next_agent"] = "researcher" # Sequential start, then parallel
return state
workflow.add_node("manager", manager)
workflow.add_node("researcher", researcher)
workflow.add_node("summarizer", summarizer)
workflow.add_node("analyzer", analyzer)
# Edges: manager -> researcher -> conditional parallel
workflow.set_entry_point("manager")
workflow.add_edge("researcher", "summarizer")
workflow.add_conditional_edges(
"summarizer",
lambda state: ["analyzer"] if "analyze" in state["tasks"] else END,
{"analyzer": "analyzer", END: END}
)
workflow.add_edge("analyzer", END)
app = workflow.compile()
Step 3: Invoke with Parallelism#
Run async for parallelism:
async def run_swarm(topic: str):
initial_state = {
"messages": [],
"tasks": [],
"results": {},
"next_agent": "manager"
}
result = await app.ainvoke(initial_state)
return result["results"]
# Usage
import asyncio
results = asyncio.run(run_swarm("AI agent trends"))
print(results)
Step 4: Scale to Swarms#
For 20+ agents, integrate task trackers:
- Use Trello/Asana APIs: Manager creates cards per agent.
- Custom parallelizer: asyncio.gather([agent_task(i) for i in range(20)]).
- Playbook: Enforce "self-improving" loops where agents critique outputs.
Deploy via Docker with Redis for state: StateGraph persists checkpoints.
Step 5: Monitoring and Observability#
Log with LangSmith. Track metrics: tokens used, latency per agent. Use dashboards for swarm health.
Common Pitfalls and Best Practices#
Pitfall 1: Context Explosion. Agents hoard tokensāfix with summarization handoffs and per-agent limits (e.g., 8k tokens).
Pitfall 2: Race Conditions. Shared state updates conflictāuse LangGraph's atomic ops or locks.
Pitfall 3: Over-Delegation. Managers micromanageātrain with prompts: "Delegate only, intervene on failure."
Best Practices:
- Start Small: Prototype with 3-5 agents before swarms.
- Idempotency: Make agents retry-safe.
- Cost Control: Rate-limit APIs; prefer cheaper models for workers.
- Testing: Simulate failures in LangGraph comparisons.
- Production: Add RBACāagents get scoped keys. Use Gravitee-like gateways.
From swarm experiments: 8 rules include "parallelize everything," "context isolation," and "human-in-loop for synthesis."
Conclusion and Next Steps#
You've now built and scaled a multi-agent swarm, handling parallelism, state, and orchestration. This foundation powers applications like automated product dev.
Next: Explore agent integrations for tools like vector stores. Dive into advanced use cases or compare frameworks in LangGraph vs. CrewAI. Experiment with your codeāscale to 20 agents and share results!
===