Build Conversational AI Agents with AutoGen: Human-in-the-Loop & Group Chat

Learn to build conversational multi-agent systems with Microsoft AutoGen. Covers agent types, human-in-the-loop workflows, group chat, and code execution patterns.

green and blue ball illustration
Photo by Alexander Shatov on Unsplash
a green square with a white speech bubble
Photo by Mariia Shalabaieva on Unsplash

Build Conversational AI Agents with AutoGen: Human-in-the-Loop & Group Chat

Microsoft AutoGen takes a unique approach to multi-agent AI: agents collaborate through natural conversations. Instead of rigid task pipelines, AutoGen agents chat with each other — and with humans — to solve complex problems. This tutorial teaches you to build conversational agent systems with human oversight.

What You'll Learn#

  • AutoGen's conversational agent paradigm
  • Setting up AssistantAgent and UserProxyAgent
  • Human-in-the-loop patterns for safe automation
  • Group chat: coordinating multiple specialized agents
  • Code generation and execution with safety guardrails

Prerequisites#

How AutoGen Differs#

| Feature | LangChain | CrewAI | AutoGen | |---------|-----------|--------|---------| | Paradigm | Tool-based chains | Role-based crews | Conversational | | Agent interaction | Sequential tool calls | Task delegation | Chat messages | | Human-in-the-loop | Manual | Optional | First-class | | Code execution | Via tools | Via tools | Native sandbox | | Best for | Single-agent tasks | Team workflows | Collaborative reasoning |

Step 1: Setup#

mkdir autogen-project && cd autogen-project
python -m venv venv
source venv/bin/activate

pip install pyautogen python-dotenv

Create .env:

OPENAI_API_KEY=your-api-key-here

Configure the LLM:

# config.py
import os
from dotenv import load_dotenv

load_dotenv()

llm_config = {
    "config_list": [
        {
            "model": "gpt-4o",
            "api_key": os.environ["OPENAI_API_KEY"],
        }
    ],
    "temperature": 0,
    "timeout": 120,
}

Step 2: Your First Agent Pair#

AutoGen's core pattern is a two-agent conversation:

# basic_agent.py
from autogen import AssistantAgent, UserProxyAgent
from config import llm_config

# The AI assistant — generates responses and code
assistant = AssistantAgent(
    name="AI_Assistant",
    system_message="""You are a helpful AI assistant.
    Solve tasks step by step. When you need to perform
    calculations or data processing, write Python code.
    When the task is complete, reply with TERMINATE.""",
    llm_config=llm_config,
)

# The user proxy — executes code and provides human input
user_proxy = UserProxyAgent(
    name="User",
    human_input_mode="TERMINATE",  # Ask human before TERMINATE
    max_consecutive_auto_reply=5,
    code_execution_config={
        "work_dir": "workspace",
        "use_docker": False,  # Set True in production
    },
)

# Start the conversation
user_proxy.initiate_chat(
    assistant,
    message="Analyze the last 5 years of S&P 500 returns. "
            "Calculate the average annual return and volatility."
)

Human Input Modes#

| Mode | Behavior | Use When | |------|----------|----------| | ALWAYS | Ask human before every reply | High-risk tasks | | TERMINATE | Ask human only when agent says TERMINATE | Standard workflows | | NEVER | Fully autonomous, no human input | Low-risk automation |

Step 3: Human-in-the-Loop Workflows#

AutoGen excels at human-AI collaboration. The human can intervene at any point:

# human_in_loop.py
from autogen import AssistantAgent, UserProxyAgent
from config import llm_config

# Assistant drafts emails
email_drafter = AssistantAgent(
    name="Email_Drafter",
    system_message="""You are a professional email writer.
    Draft emails based on the user's instructions.

    Always present the draft for review before sending.
    Format: Subject: ... \n\n Body: ...

    After the user approves, reply TERMINATE.""",
    llm_config=llm_config,
)

# Human reviews and provides feedback
human_reviewer = UserProxyAgent(
    name="Human_Reviewer",
    human_input_mode="ALWAYS",  # Human reviews every draft
    max_consecutive_auto_reply=0,  # Always wait for human
    code_execution_config=False,  # No code execution needed
)

# Conversation flow:
# 1. Human provides instructions
# 2. AI drafts email
# 3. Human reviews and gives feedback
# 4. AI revises
# 5. Human approves → TERMINATE
human_reviewer.initiate_chat(
    email_drafter,
    message="Write a follow-up email to a prospect who attended "
            "our webinar on AI agents yesterday. Mention our "
            "enterprise plan and offer a demo."
)

Why this pattern matters: For sensitive communications (emails, contracts, public posts), human-in-the-loop prevents costly mistakes while still saving 80% of the drafting effort.

Step 4: Group Chat — Multiple Agents Collaborating#

Group chat lets multiple agents discuss and collaborate on complex tasks:

# group_chat.py
from autogen import (
    AssistantAgent,
    UserProxyAgent,
    GroupChat,
    GroupChatManager,
)
from config import llm_config

# Specialized agents
planner = AssistantAgent(
    name="Planner",
    system_message="""You are a project planner. When given a
    project goal, break it down into specific tasks with clear
    requirements. Assign tasks to the right specialist.
    Do not write code.""",
    llm_config=llm_config,
)

coder = AssistantAgent(
    name="Coder",
    system_message="""You are a Python developer. Write clean,
    well-documented code to accomplish assigned tasks. Include
    error handling and type hints. Only write code when asked.""",
    llm_config=llm_config,
)

reviewer = AssistantAgent(
    name="Reviewer",
    system_message="""You are a code reviewer. Review code for:
    1. Correctness — does it solve the problem?
    2. Security — any vulnerabilities?
    3. Performance — any obvious improvements?
    4. Readability — is it clear and well-documented?

    Provide specific, actionable feedback.""",
    llm_config=llm_config,
)

# User proxy for code execution
executor = UserProxyAgent(
    name="Executor",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=3,
    code_execution_config={
        "work_dir": "workspace",
        "use_docker": False,
    },
)

# Group chat setup
group_chat = GroupChat(
    agents=[planner, coder, reviewer, executor],
    messages=[],
    max_round=15,
    speaker_selection_method="auto",  # LLM decides who speaks
)

manager = GroupChatManager(
    groupchat=group_chat,
    llm_config=llm_config,
)

# Kick off the project
executor.initiate_chat(
    manager,
    message="Build a Python script that fetches the top 10 "
            "trending repositories on GitHub today and saves "
            "the results as a formatted markdown table."
)

Speaker Selection Methods#

| Method | How it works | Best for | |--------|-------------|----------| | auto | LLM decides who speaks next | Most cases | | round_robin | Agents take turns | Structured workflows | | random | Random selection | Brainstorming | | Custom function | Your logic decides | Domain-specific routing |

Custom Speaker Selection#

def custom_speaker_selection(last_speaker, group_chat):
    """Route based on message content."""
    messages = group_chat.messages
    if not messages:
        return planner  # Always start with planner

    last_message = messages[-1]["content"].lower()

    if "write code" in last_message or "implement" in last_message:
        return coder
    elif "review" in last_message or "check" in last_message:
        return reviewer
    elif "execute" in last_message or "run" in last_message:
        return executor
    else:
        return planner  # Default to planner

group_chat = GroupChat(
    agents=[planner, coder, reviewer, executor],
    messages=[],
    max_round=15,
    speaker_selection_method=custom_speaker_selection,
)

Step 5: Safe Code Execution#

AutoGen can generate and execute code — but this needs guardrails:

# Safe code execution configuration
code_execution_config = {
    "work_dir": "sandbox",         # Isolated directory
    "use_docker": True,            # Run in Docker container
    "timeout": 60,                 # 60-second timeout
    "last_n_messages": 3,          # Only use recent context
}

executor = UserProxyAgent(
    name="Executor",
    human_input_mode="TERMINATE",  # Human approves before exit
    max_consecutive_auto_reply=3,
    code_execution_config=code_execution_config,
    is_termination_msg=lambda msg: "TERMINATE" in msg.get(
        "content", ""
    ),
)

Code Execution Safety Checklist#

| Guard | Config | Default | |-------|--------|---------| | Docker isolation | use_docker: True | Recommended | | Timeout | timeout: 60 | 60 seconds | | Iteration limit | max_consecutive_auto_reply: 3 | Set explicitly | | Human review | human_input_mode: "TERMINATE" | For sensitive tasks | | Working directory | work_dir: "sandbox" | Isolated folder |

Step 6: Termination Strategy#

Proper termination prevents endless loops:

def is_termination_message(message):
    """Check if the conversation should end."""
    content = message.get("content", "")
    if content is None:
        return False

    # End conditions
    if "TERMINATE" in content:
        return True
    if "task complete" in content.lower():
        return True
    return False

assistant = AssistantAgent(
    name="Assistant",
    system_message="... Reply TERMINATE when the task is done.",
    llm_config=llm_config,
    is_termination_msg=is_termination_message,
)

Practical Example: Research Report Generator#

"""Complete AutoGen example: Multi-agent research report."""
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
from config import llm_config

# Research agent
researcher = AssistantAgent(
    name="Researcher",
    system_message="""Find and summarize information about the
    given topic. Provide facts with source attribution.
    When research is complete, say 'RESEARCH COMPLETE'.""",
    llm_config=llm_config,
)

# Writer agent
writer = AssistantAgent(
    name="Writer",
    system_message="""Write a structured report based on the
    research findings. Include: executive summary, key findings,
    analysis, and recommendations. Use markdown formatting.
    When done, say 'DRAFT COMPLETE'.""",
    llm_config=llm_config,
)

# Quality checker
quality_checker = AssistantAgent(
    name="Quality_Checker",
    system_message="""Review the report for accuracy, clarity,
    and completeness. Provide specific suggestions for improvement.
    If quality is satisfactory, reply TERMINATE.""",
    llm_config=llm_config,
)

# Human proxy
user = UserProxyAgent(
    name="User",
    human_input_mode="TERMINATE",
    code_execution_config=False,
)

# Group chat
chat = GroupChat(
    agents=[user, researcher, writer, quality_checker],
    messages=[],
    max_round=12,
)
manager = GroupChatManager(groupchat=chat, llm_config=llm_config)

# Run
user.initiate_chat(
    manager,
    message="Write a report on how AI agents are transforming "
            "customer service in the financial services industry."
)

Common Mistakes to Avoid#

  1. No termination condition: Always define when conversations should end
  2. Docker disabled in production: Enable Docker for code execution safety
  3. Too many agents in group chat: 3-5 agents is optimal; more causes confusion
  4. Missing human oversight for risky tasks: Use ALWAYS or TERMINATE input modes
  5. Ignoring cost control: Set max_consecutive_auto_reply and group chat round limits

Next Steps#


Frequently Asked Questions#

AutoGen vs. CrewAI — when should I use each?#

AutoGen is best for collaborative reasoning with human-in-the-loop requirements. CrewAI is better for structured workflows with clear role delegation. If you need humans to intervene freely during agent execution, choose AutoGen. If you need a predictable pipeline, choose CrewAI. See our CrewAI tutorial for a direct comparison.

Is AutoGen safe for production use?#

Yes, with proper configuration. Enable Docker isolation for code execution, set timeouts, limit iterations, and use human_input_mode="ALWAYS" for high-risk tasks. Microsoft actively maintains the project and it's used in production by many enterprises.

Yes. You can register functions as tools for any agent using register_function(). AutoGen also integrates with LangChain tools. For web browsing, AutoGen provides a built-in WebSurferAgent for research tasks.

How do I reduce the cost of AutoGen conversations?#

Limit conversation rounds with max_round, set max_consecutive_auto_reply, use cheaper models (GPT-4o-mini) for non-critical agents, and cache frequently used responses. A typical group chat with 4 agents costs $0.20-1.00 per execution with GPT-4o.