Build Conversational AI Agents with AutoGen: Human-in-the-Loop & Group Chat
Microsoft AutoGen takes a unique approach to multi-agent AI: agents collaborate through natural conversations. Instead of rigid task pipelines, AutoGen agents chat with each other — and with humans — to solve complex problems. This tutorial teaches you to build conversational agent systems with human oversight.
What You'll Learn#
- AutoGen's conversational agent paradigm
- Setting up AssistantAgent and UserProxyAgent
- Human-in-the-loop patterns for safe automation
- Group chat: coordinating multiple specialized agents
- Code generation and execution with safety guardrails
Prerequisites#
- Python 3.10+ installed
- An OpenAI API key (GPT-4 recommended)
- Understanding of AI agent architecture
- Familiarity with what AI agents are
How AutoGen Differs#
| Feature | LangChain | CrewAI | AutoGen | |---------|-----------|--------|---------| | Paradigm | Tool-based chains | Role-based crews | Conversational | | Agent interaction | Sequential tool calls | Task delegation | Chat messages | | Human-in-the-loop | Manual | Optional | First-class | | Code execution | Via tools | Via tools | Native sandbox | | Best for | Single-agent tasks | Team workflows | Collaborative reasoning |
Step 1: Setup#
mkdir autogen-project && cd autogen-project
python -m venv venv
source venv/bin/activate
pip install pyautogen python-dotenv
Create .env:
OPENAI_API_KEY=your-api-key-here
Configure the LLM:
# config.py
import os
from dotenv import load_dotenv
load_dotenv()
llm_config = {
"config_list": [
{
"model": "gpt-4o",
"api_key": os.environ["OPENAI_API_KEY"],
}
],
"temperature": 0,
"timeout": 120,
}
Step 2: Your First Agent Pair#
AutoGen's core pattern is a two-agent conversation:
# basic_agent.py
from autogen import AssistantAgent, UserProxyAgent
from config import llm_config
# The AI assistant — generates responses and code
assistant = AssistantAgent(
name="AI_Assistant",
system_message="""You are a helpful AI assistant.
Solve tasks step by step. When you need to perform
calculations or data processing, write Python code.
When the task is complete, reply with TERMINATE.""",
llm_config=llm_config,
)
# The user proxy — executes code and provides human input
user_proxy = UserProxyAgent(
name="User",
human_input_mode="TERMINATE", # Ask human before TERMINATE
max_consecutive_auto_reply=5,
code_execution_config={
"work_dir": "workspace",
"use_docker": False, # Set True in production
},
)
# Start the conversation
user_proxy.initiate_chat(
assistant,
message="Analyze the last 5 years of S&P 500 returns. "
"Calculate the average annual return and volatility."
)
Human Input Modes#
| Mode | Behavior | Use When |
|------|----------|----------|
| ALWAYS | Ask human before every reply | High-risk tasks |
| TERMINATE | Ask human only when agent says TERMINATE | Standard workflows |
| NEVER | Fully autonomous, no human input | Low-risk automation |
Step 3: Human-in-the-Loop Workflows#
AutoGen excels at human-AI collaboration. The human can intervene at any point:
# human_in_loop.py
from autogen import AssistantAgent, UserProxyAgent
from config import llm_config
# Assistant drafts emails
email_drafter = AssistantAgent(
name="Email_Drafter",
system_message="""You are a professional email writer.
Draft emails based on the user's instructions.
Always present the draft for review before sending.
Format: Subject: ... \n\n Body: ...
After the user approves, reply TERMINATE.""",
llm_config=llm_config,
)
# Human reviews and provides feedback
human_reviewer = UserProxyAgent(
name="Human_Reviewer",
human_input_mode="ALWAYS", # Human reviews every draft
max_consecutive_auto_reply=0, # Always wait for human
code_execution_config=False, # No code execution needed
)
# Conversation flow:
# 1. Human provides instructions
# 2. AI drafts email
# 3. Human reviews and gives feedback
# 4. AI revises
# 5. Human approves → TERMINATE
human_reviewer.initiate_chat(
email_drafter,
message="Write a follow-up email to a prospect who attended "
"our webinar on AI agents yesterday. Mention our "
"enterprise plan and offer a demo."
)
Why this pattern matters: For sensitive communications (emails, contracts, public posts), human-in-the-loop prevents costly mistakes while still saving 80% of the drafting effort.
Step 4: Group Chat — Multiple Agents Collaborating#
Group chat lets multiple agents discuss and collaborate on complex tasks:
# group_chat.py
from autogen import (
AssistantAgent,
UserProxyAgent,
GroupChat,
GroupChatManager,
)
from config import llm_config
# Specialized agents
planner = AssistantAgent(
name="Planner",
system_message="""You are a project planner. When given a
project goal, break it down into specific tasks with clear
requirements. Assign tasks to the right specialist.
Do not write code.""",
llm_config=llm_config,
)
coder = AssistantAgent(
name="Coder",
system_message="""You are a Python developer. Write clean,
well-documented code to accomplish assigned tasks. Include
error handling and type hints. Only write code when asked.""",
llm_config=llm_config,
)
reviewer = AssistantAgent(
name="Reviewer",
system_message="""You are a code reviewer. Review code for:
1. Correctness — does it solve the problem?
2. Security — any vulnerabilities?
3. Performance — any obvious improvements?
4. Readability — is it clear and well-documented?
Provide specific, actionable feedback.""",
llm_config=llm_config,
)
# User proxy for code execution
executor = UserProxyAgent(
name="Executor",
human_input_mode="NEVER",
max_consecutive_auto_reply=3,
code_execution_config={
"work_dir": "workspace",
"use_docker": False,
},
)
# Group chat setup
group_chat = GroupChat(
agents=[planner, coder, reviewer, executor],
messages=[],
max_round=15,
speaker_selection_method="auto", # LLM decides who speaks
)
manager = GroupChatManager(
groupchat=group_chat,
llm_config=llm_config,
)
# Kick off the project
executor.initiate_chat(
manager,
message="Build a Python script that fetches the top 10 "
"trending repositories on GitHub today and saves "
"the results as a formatted markdown table."
)
Speaker Selection Methods#
| Method | How it works | Best for |
|--------|-------------|----------|
| auto | LLM decides who speaks next | Most cases |
| round_robin | Agents take turns | Structured workflows |
| random | Random selection | Brainstorming |
| Custom function | Your logic decides | Domain-specific routing |
Custom Speaker Selection#
def custom_speaker_selection(last_speaker, group_chat):
"""Route based on message content."""
messages = group_chat.messages
if not messages:
return planner # Always start with planner
last_message = messages[-1]["content"].lower()
if "write code" in last_message or "implement" in last_message:
return coder
elif "review" in last_message or "check" in last_message:
return reviewer
elif "execute" in last_message or "run" in last_message:
return executor
else:
return planner # Default to planner
group_chat = GroupChat(
agents=[planner, coder, reviewer, executor],
messages=[],
max_round=15,
speaker_selection_method=custom_speaker_selection,
)
Step 5: Safe Code Execution#
AutoGen can generate and execute code — but this needs guardrails:
# Safe code execution configuration
code_execution_config = {
"work_dir": "sandbox", # Isolated directory
"use_docker": True, # Run in Docker container
"timeout": 60, # 60-second timeout
"last_n_messages": 3, # Only use recent context
}
executor = UserProxyAgent(
name="Executor",
human_input_mode="TERMINATE", # Human approves before exit
max_consecutive_auto_reply=3,
code_execution_config=code_execution_config,
is_termination_msg=lambda msg: "TERMINATE" in msg.get(
"content", ""
),
)
Code Execution Safety Checklist#
| Guard | Config | Default |
|-------|--------|---------|
| Docker isolation | use_docker: True | Recommended |
| Timeout | timeout: 60 | 60 seconds |
| Iteration limit | max_consecutive_auto_reply: 3 | Set explicitly |
| Human review | human_input_mode: "TERMINATE" | For sensitive tasks |
| Working directory | work_dir: "sandbox" | Isolated folder |
Step 6: Termination Strategy#
Proper termination prevents endless loops:
def is_termination_message(message):
"""Check if the conversation should end."""
content = message.get("content", "")
if content is None:
return False
# End conditions
if "TERMINATE" in content:
return True
if "task complete" in content.lower():
return True
return False
assistant = AssistantAgent(
name="Assistant",
system_message="... Reply TERMINATE when the task is done.",
llm_config=llm_config,
is_termination_msg=is_termination_message,
)
Practical Example: Research Report Generator#
"""Complete AutoGen example: Multi-agent research report."""
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
from config import llm_config
# Research agent
researcher = AssistantAgent(
name="Researcher",
system_message="""Find and summarize information about the
given topic. Provide facts with source attribution.
When research is complete, say 'RESEARCH COMPLETE'.""",
llm_config=llm_config,
)
# Writer agent
writer = AssistantAgent(
name="Writer",
system_message="""Write a structured report based on the
research findings. Include: executive summary, key findings,
analysis, and recommendations. Use markdown formatting.
When done, say 'DRAFT COMPLETE'.""",
llm_config=llm_config,
)
# Quality checker
quality_checker = AssistantAgent(
name="Quality_Checker",
system_message="""Review the report for accuracy, clarity,
and completeness. Provide specific suggestions for improvement.
If quality is satisfactory, reply TERMINATE.""",
llm_config=llm_config,
)
# Human proxy
user = UserProxyAgent(
name="User",
human_input_mode="TERMINATE",
code_execution_config=False,
)
# Group chat
chat = GroupChat(
agents=[user, researcher, writer, quality_checker],
messages=[],
max_round=12,
)
manager = GroupChatManager(groupchat=chat, llm_config=llm_config)
# Run
user.initiate_chat(
manager,
message="Write a report on how AI agents are transforming "
"customer service in the financial services industry."
)
Common Mistakes to Avoid#
- No termination condition: Always define when conversations should end
- Docker disabled in production: Enable Docker for code execution safety
- Too many agents in group chat: 3-5 agents is optimal; more causes confusion
- Missing human oversight for risky tasks: Use
ALWAYSorTERMINATEinput modes - Ignoring cost control: Set
max_consecutive_auto_replyand group chat round limits
Next Steps#
- Multi-Agent Systems Guide — advanced orchestration patterns
- Build an AI Agent with LangChain — comparison approach
- Build Multi-Agent Systems with CrewAI — role-based alternative
Frequently Asked Questions#
AutoGen vs. CrewAI — when should I use each?#
AutoGen is best for collaborative reasoning with human-in-the-loop requirements. CrewAI is better for structured workflows with clear role delegation. If you need humans to intervene freely during agent execution, choose AutoGen. If you need a predictable pipeline, choose CrewAI. See our CrewAI tutorial for a direct comparison.
Is AutoGen safe for production use?#
Yes, with proper configuration. Enable Docker isolation for code execution, set timeouts, limit iterations, and use human_input_mode="ALWAYS" for high-risk tasks. Microsoft actively maintains the project and it's used in production by many enterprises.
Can AutoGen agents use external tools like web search?#
Yes. You can register functions as tools for any agent using register_function(). AutoGen also integrates with LangChain tools. For web browsing, AutoGen provides a built-in WebSurferAgent for research tasks.
How do I reduce the cost of AutoGen conversations?#
Limit conversation rounds with max_round, set max_consecutive_auto_reply, use cheaper models (GPT-4o-mini) for non-critical agents, and cache frequently used responses. A typical group chat with 4 agents costs $0.20-1.00 per execution with GPT-4o.