Multi-Agent Systems Guide: Orchestration, Communication & Scaling Patterns
Single agents hit their limits fast. When tasks require multiple domains of expertise, parallel processing, or complex decision-making, you need a multi-agent system β and you need to orchestrate it well. This advanced guide covers the patterns, protocols, and production strategies for building reliable multi-agent architectures.
What You'll Learn#
- Five orchestration patterns and when to use each
- Communication protocols between agents
- Conflict resolution when agents disagree
- Scaling multi-agent systems for production
- Monitoring, debugging, and cost optimization
Prerequisites#
- Complete at least one agent framework tutorial:
- Understanding of AI agent architecture
Orchestration Pattern 1: Sequential Pipeline#
Agents execute one after another, each receiving the previous agent's output.
Agent A β Agent B β Agent C β Final Output
When to Use#
- Tasks with clear, linear stages (research β write β edit)
- Each step depends on the previous step's output
- Quality gates between stages
Implementation Pattern#
class SequentialPipeline:
def __init__(self, agents: list):
self.agents = agents
def run(self, initial_input: str) -> str:
current_output = initial_input
for agent in self.agents:
current_output = agent.execute(current_output)
# Optional: quality check between stages
if not self.quality_check(current_output):
raise QualityError(f"Agent {agent.name} "
"produced low-quality output")
return current_output
Pros: Simple, predictable, easy to debug Cons: Slow (no parallelism), single point of failure per stage
Orchestration Pattern 2: Parallel Fan-Out / Fan-In#
Multiple agents work simultaneously, then results are aggregated.
ββ Agent A ββ
Input βββββββΌβ Agent B βββΌβββ Aggregator β Output
ββ Agent C ββ
When to Use#
- Tasks divisible into independent sub-tasks
- When speed matters (parallel = faster)
- Multi-perspective analysis (get diverse viewpoints)
Implementation Pattern#
import asyncio
class ParallelOrchestrator:
def __init__(self, agents: list, aggregator):
self.agents = agents
self.aggregator = aggregator
async def run(self, input_data: str) -> str:
# Fan-out: run all agents in parallel
tasks = [
agent.execute_async(input_data)
for agent in self.agents
]
results = await asyncio.gather(
*tasks, return_exceptions=True
)
# Filter out failures
successful = [
r for r in results
if not isinstance(r, Exception)
]
if not successful:
raise RuntimeError("All agents failed")
# Fan-in: aggregate results
return self.aggregator.merge(successful)
Pros: Fast, fault-tolerant (one agent failing doesn't block others) Cons: Complex aggregation logic, potential inconsistencies
Orchestration Pattern 3: Hierarchical Delegation#
A supervisor agent delegates tasks to specialist agents and coordinates their work.
βββββββββββββββ
β Supervisor β
ββββββββ¬βββββββ
βββββββββββββΌββββββββββββ
βββββββ΄βββ βββββββ΄βββ ββββββ΄ββββ
βSpecialistβ βSpecialistβ βSpecialistβ
β A β β B β β C β
βββββββββββ βββββββββββ βββββββββββ
When to Use#
- Dynamic task routing based on input type
- Quality requires centralized oversight
- Different agents for different domains
Implementation Pattern#
class SupervisorAgent:
def __init__(self, specialists: dict, llm):
self.specialists = specialists # name -> agent
self.llm = llm
def route(self, task: str) -> str:
"""Decide which specialist should handle this task."""
routing_prompt = f"""Given this task: {task}
Available specialists:
{self._describe_specialists()}
Which specialist should handle this? Respond with just
the specialist name."""
chosen = self.llm.invoke(routing_prompt).strip()
return chosen
def execute(self, task: str) -> str:
specialist_name = self.route(task)
specialist = self.specialists[specialist_name]
result = specialist.execute(task)
# Supervisor verifies quality
if self.verify_quality(task, result):
return result
else:
# Reassign or request revision
return specialist.execute(
f"Revise this output: {result}\n"
f"Original task: {task}"
)
Pros: Flexible routing, centralized quality control Cons: Supervisor is a bottleneck, additional LLM calls for routing
Orchestration Pattern 4: Consensus / Voting#
Multiple agents independently solve the same problem, then vote on the best answer.
ββ Agent A (Answer 1) ββ
Input βββββββΌβ Agent B (Answer 2) βββΌβββ Voter β Best Answer
ββ Agent C (Answer 3) ββ
When to Use#
- High-stakes decisions requiring reliability
- Different models/approaches for the same task
- When no single agent is trusted enough
Implementation Pattern#
class ConsensusOrchestrator:
def __init__(self, agents: list, judge_llm):
self.agents = agents
self.judge = judge_llm
async def run(self, task: str) -> str:
# Get independent answers
answers = await asyncio.gather(*[
agent.execute_async(task)
for agent in self.agents
])
# Judge selects the best answer
judge_prompt = f"""Task: {task}
Candidate answers:
{self._format_answers(answers)}
Select the best answer. Consider accuracy, completeness,
and clarity. Explain your choice briefly, then output
the selected answer."""
return self.judge.invoke(judge_prompt)
Pros: Higher accuracy than single agent, error detection Cons: 3x cost (or more), some tasks don't have a "best" answer
Orchestration Pattern 5: Event-Driven Reactive#
Agents react to events and messages on a shared bus rather than following a fixed sequence.
βββββββββββββββββββββββββββββββββββββββ
β Event Bus β
βββββββββββ¬βββββββββββ¬βββββββββββββββββ€
β β β β
βΌ βΌ βΌ βΌ
Agent A Agent B Agent C Agent D
(listens (listens (listens (listens
for X) for Y) for Z) for X,Z)
When to Use#
- Complex workflows with conditional branching
- Real-time systems responding to external events
- Agents that need to react to each other's outputs
Implementation Pattern#
class EventBus:
def __init__(self):
self.subscribers = {} # event_type -> [callbacks]
def subscribe(self, event_type: str, callback):
self.subscribers.setdefault(event_type, []).append(callback)
async def publish(self, event_type: str, data: dict):
for callback in self.subscribers.get(event_type, []):
await callback(data)
# Usage
bus = EventBus()
bus.subscribe("new_lead", enrichment_agent.handle)
bus.subscribe("lead_enriched", scoring_agent.handle)
bus.subscribe("lead_scored", routing_agent.handle)
bus.subscribe("high_priority_lead", notification_agent.handle)
# Trigger the chain
await bus.publish("new_lead", {"email": "jane@company.com"})
Pros: Highly flexible, loosely coupled, easy to add new agents Cons: Hard to debug (no linear flow), potential event storms
Communication Between Agents#
Message Types#
| Type | Purpose | Example | |------|---------|---------| | Task assignment | Request agent to do work | "Research these 3 companies" | | Result | Return completed work | "Here are the research findings..." | | Query | Ask another agent a question | "What's the lead score for X?" | | Feedback | Provide quality assessment | "The report needs more data on..." | | Status | Report progress | "Completed 3 of 5 companies" | | Escalation | Request human/senior help | "I can't determine industry type" |
Structured Message Format#
from dataclasses import dataclass
from datetime import datetime
from typing import Optional
@dataclass
class AgentMessage:
sender: str
recipient: str
type: str # task, result, query, feedback, status, escalation
content: str
metadata: dict
timestamp: datetime = datetime.now()
priority: str = "normal" # low, normal, high, critical
parent_id: Optional[str] = None # For threading
Conflict Resolution#
When agents disagree (e.g., different scoring, contradictory recommendations):
Strategy 1: Supervisor Override#
A supervisor agent reviews conflicting outputs and makes the final decision.
Strategy 2: Weighted Voting#
Agents with higher expertise in the relevant domain get more weight:
def weighted_vote(responses: list, weights: dict) -> str:
scores = {}
for response in responses:
agent = response["agent"]
answer = response["answer"]
weight = weights.get(agent, 1.0)
scores[answer] = scores.get(answer, 0) + weight
return max(scores, key=scores.get)
Strategy 3: Confidence-Based Selection#
Each agent reports a confidence score; the most confident (and historically accurate) agent wins.
Scaling for Production#
Cost Optimization#
| Strategy | Impact | Implementation | |----------|--------|---------------| | Model tiering | 60-80% savings | Use GPT-4o-mini for simple agents, GPT-4o for complex reasoning | | Caching | 30-50% savings | Cache tool results and repeated queries | | Batch processing | 20-40% savings | Group similar tasks and process together | | Short-circuit | Variable | Skip unnecessary agents when the answer is clear early |
Model Tiering Example#
agents_config = {
"classifier": {
"model": "gpt-4o-mini", # Simple classification
"max_tokens": 100
},
"researcher": {
"model": "gpt-4o", # Complex reasoning
"max_tokens": 2000
},
"formatter": {
"model": "gpt-4o-mini", # Text formatting
"max_tokens": 500
}
}
Monitoring Multi-Agent Systems#
Track these metrics:
| Metric | What to monitor | Alert threshold | |--------|----------------|-----------------| | Task completion rate | % of tasks completed successfully | < 95% | | Average latency | End-to-end time per request | > 30 seconds | | Agent error rate | Per-agent failure frequency | > 5% for any agent | | Token usage | Cost per request | > 2x expected average | | Escalation rate | % of tasks requiring human help | > 10% |
Choosing the Right Pattern#
| Scenario | Recommended Pattern | |----------|-------------------| | Content pipeline (research β write β edit) | Sequential | | Multi-source data enrichment | Parallel fan-out/fan-in | | Customer support with specialists | Hierarchical delegation | | Critical decisions (medical, legal) | Consensus/voting | | Real-time event processing | Event-driven reactive | | Complex workflows with branching | Hierarchical + event hybrid |
Common Mistakes to Avoid#
- Over-engineering: Don't build a 10-agent system when 2 agents would suffice
- Tight coupling: Agents should communicate through well-defined interfaces, not shared state
- No fallback plan: Always have a degraded mode when agents fail
- Ignoring latency: Multi-agent systems are inherently slower β optimize the critical path
- No cost budgets: Set per-request token budgets and enforce them
Next Steps#
- AI Agent for Sales Automation β apply multi-agent patterns to sales
- AI Agent for Customer Service β multi-agent support systems
- Prompt Engineering for AI Agents β optimize agent prompts
Frequently Asked Questions#
How many agents should a multi-agent system have?#
Start with 2-3 agents and add more only when you can clearly articulate why a new agent is needed. Most production systems use 3-5 agents. Each additional agent adds latency, cost, and complexity. If you can't explain an agent's unique role in one sentence, you don't need it.
Do all agents need to use the same LLM?#
No. In fact, using different models is a best practice. Use powerful models (GPT-4o, Claude) for complex reasoning agents and cheaper models (GPT-4o-mini) for simple classification or formatting agents. This can reduce costs by 60% or more.
How do I debug a multi-agent system?#
Log every agent message with timestamps and agent names. Use structured logging to trace a request through the entire pipeline. Implement a "replay" mode where you can re-run a specific conversation for debugging. Tools like LangSmith and Weights & Biases offer tracing for multi-agent systems.
What's the latency overhead of multi-agent systems?#
Sequential systems multiply latency by the number of agents. Parallel systems are limited by the slowest agent. Typical overhead is 2-10x compared to a single-agent approach. Optimize by running independent agents in parallel and using faster models where accuracy isn't critical.