🤖AI Agents Guide
TutorialsComparisonsReviewsExamplesIntegrationsUse CasesTemplatesGlossary
Get Started
🤖AI Agents Guide

Your comprehensive resource for understanding, building, and implementing AI Agents.

Learn

  • Tutorials
  • Glossary
  • Use Cases
  • Examples

Compare

  • Tool Comparisons
  • Reviews
  • Integrations
  • Templates

Company

  • About
  • Contact
  • Privacy Policy

© 2026 AI Agents Guide. All rights reserved.

Home/Glossary/What Is Agent Self-Reflection?
Glossary7 min read

What Is Agent Self-Reflection?

Agent self-reflection is the ability of an AI agent to evaluate and critique its own outputs, identify errors or gaps in its reasoning, and revise its response before finalizing — reducing mistakes, improving output quality, and enabling the agent to learn from its own errors within a single task.

Person reviewing and reflecting on work representing agent self-evaluation
Photo by Christin Hume on Unsplash
By AI Agents Guide Team•February 28, 2026

Term Snapshot

Also known as: Agent Self-Critique, AI Self-Correction, LLM Introspection

Related terms: What Is AI Agent Planning?, What Is ReAct (Reasoning + Acting)?, What Is Inner Monologue in AI Agents?, What Is the Agent Loop?

Table of Contents

  1. Quick Definition
  2. Why Self-Reflection Matters
  3. Self-Reflection Patterns
  4. Pattern 1: Simple Self-Critique
  5. Pattern 2: Reflexion (Memory-Based Reflection)
  6. Pattern 3: Constitutional Self-Critique
  7. When Self-Reflection Helps vs Hurts
  8. Self-reflection helps:
  9. Self-reflection may hurt:
  10. Self-Reflection vs External Evaluation
  11. Common Misconceptions
  12. Related Terms
  13. Frequently Asked Questions
  14. What is agent self-reflection in AI?
  15. How does agent self-reflection work?
  16. What are the main self-reflection patterns?
  17. Does self-reflection always improve output quality?
Person taking notes and analyzing information during reflection
Photo by Avel Chuklanov on Unsplash

What Is Agent Self-Reflection?

Quick Definition#

Agent self-reflection is the ability of an AI agent to evaluate and critique its own outputs or reasoning, identify weaknesses or errors, and revise before producing a final result. Rather than accepting the first output from an LLM call, a self-reflective agent runs an internal review loop — generating a draft, critiquing it against explicit criteria, and refining based on its own assessment. This reduce errors that would otherwise reach users or propagate through downstream workflow steps.

Browse all AI agent terms in the AI Agent Glossary. For multi-branch reasoning that explores alternatives before committing, see Tree of Thought. For action-based reasoning with tool use, see ReAct (Reasoning + Acting).

Why Self-Reflection Matters#

A single LLM call produces output probabilistically — the model's first attempt is often good but not optimal. Common failure modes self-reflection catches:

  • Factual errors: The initial response contains an incorrect date, statistic, or technical claim
  • Incomplete coverage: The answer misses important aspects of the question
  • Logical inconsistency: The reasoning contains a contradiction that is not obvious from the prompt
  • Code bugs: Generated code has an off-by-one error or edge case not handled
  • Format violations: The output does not match the requested schema or structure

Without self-reflection, these errors reach the user or become inputs to the next step in a multi-agent pipeline, where they compound. With self-reflection, the agent catches and corrects them before the output leaves its control.

Self-Reflection Patterns#

Pattern 1: Simple Self-Critique#

The most basic pattern: the agent generates an output, then critiques it against explicit criteria:

from anthropic import Anthropic

client = Anthropic()

def generate_with_reflection(task: str, criteria: list[str], max_revisions: int = 2) -> str:
    """Generate output, critique against criteria, and revise."""

    # Step 1: Generate initial draft
    draft_response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1500,
        messages=[{"role": "user", "content": task}]
    )
    draft = draft_response.content[0].text

    for revision in range(max_revisions):
        # Step 2: Critique the draft
        criteria_text = "\n".join(f"- {c}" for c in criteria)
        critique_response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=800,
            messages=[{
                "role": "user",
                "content": f"""Review this output against the following criteria:

{criteria_text}

Output to review:
{draft}

Identify specific issues. If the output satisfies all criteria, respond with "APPROVED".
Otherwise, list the specific problems found."""
            }]
        )
        critique = critique_response.content[0].text

        if "APPROVED" in critique.upper():
            break  # Output meets criteria — stop revising

        # Step 3: Revise based on critique
        revision_response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=1500,
            messages=[{
                "role": "user",
                "content": f"""Original task: {task}

Your previous output:
{draft}

Critique of your output:
{critique}

Please revise your output to address all the identified issues."""
            }]
        )
        draft = revision_response.content[0].text

    return draft

# Usage
result = generate_with_reflection(
    task="Write a Python function that validates email addresses",
    criteria=[
        "Handles edge cases (empty string, None input)",
        "Includes docstring explaining behavior",
        "Does not use external libraries",
        "Returns True/False consistently"
    ]
)

Pattern 2: Reflexion (Memory-Based Reflection)#

The Reflexion framework (Shinn et al., 2023) extends self-critique by storing reflections in memory to guide future attempts:

class ReflexionAgent:
    def __init__(self, task: str):
        self.task = task
        self.reflections = []  # Memory of past failures and learnings
        self.client = Anthropic()

    def run(self, max_attempts: int = 3) -> str:
        for attempt in range(max_attempts):
            # Generate with context from past reflections
            reflection_context = "\n".join(self.reflections) if self.reflections else "No prior attempts."

            response = self.client.messages.create(
                model="claude-opus-4-6",
                max_tokens=2000,
                messages=[{
                    "role": "user",
                    "content": f"""Task: {self.task}

Lessons from previous attempts:
{reflection_context}

Complete the task, incorporating lessons learned."""
                }]
            )
            output = response.content[0].text

            # Evaluate the output
            is_success, feedback = self._evaluate(output)
            if is_success:
                return output

            # Generate reflection for memory
            reflection = self._reflect(output, feedback)
            self.reflections.append(f"Attempt {attempt + 1}: {reflection}")

        return output  # Return best attempt

    def _evaluate(self, output: str) -> tuple[bool, str]:
        """Evaluate output quality. Returns (success, feedback)."""
        # Custom evaluation logic here
        pass

    def _reflect(self, output: str, feedback: str) -> str:
        """Generate a concise lesson from the failed attempt."""
        response = self.client.messages.create(
            model="claude-opus-4-6",
            max_tokens=300,
            messages=[{
                "role": "user",
                "content": f"""Given this failed attempt and feedback, write one concise lesson to remember:

Output: {output}
Feedback: {feedback}

Lesson (1-2 sentences):"""
            }]
        )
        return response.content[0].text

Pattern 3: Constitutional Self-Critique#

Evaluate outputs against a set of principles rather than task-specific criteria:

AGENT_CONSTITUTION = [
    "Be factually accurate — do not state things you are uncertain about as fact",
    "Be helpful — provide actionable, specific information rather than vague advice",
    "Be honest — acknowledge limitations and uncertainties explicitly",
    "Be safe — do not provide information that could cause harm"
]

def constitutional_check(output: str) -> tuple[bool, str]:
    """Check output against constitutional principles."""
    principles_text = "\n".join(f"{i+1}. {p}" for i, p in enumerate(AGENT_CONSTITUTION))

    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"""Evaluate whether this output violates any of the following principles:

{principles_text}

Output to evaluate:
{output}

Does this output violate any principles? If yes, identify which ones and explain.
If no violations, respond with "COMPLIANT"."""
        }]
    )
    critique = response.content[0].text
    return "COMPLIANT" in critique.upper(), critique

When Self-Reflection Helps vs Hurts#

Self-reflection helps:#

  • Code generation: Catching bugs, missing error handling, and edge cases before running
  • Factual research: Identifying unsupported claims or logical gaps
  • Structured data extraction: Ensuring outputs match required schemas
  • Multi-step agent pipelines: Preventing errors from compounding in downstream steps

Self-reflection may hurt:#

  • Creative tasks: Critique criteria applied to creative writing can make outputs more generic and less distinctive
  • Conversational responses: Over-refinement produces stilted, less natural dialogue
  • Time-sensitive tasks: Each critique-revise cycle adds latency; for real-time applications this may be unacceptable
  • Cost-sensitive contexts: Multiple LLM calls for each output increases per-query costs proportionally

Self-Reflection vs External Evaluation#

DimensionSelf-ReflectionExternal Evaluator
Evaluation sourceSame model as generatorSeparate model or system
Setup costLow (just prompting)Higher (separate agent/model)
Evaluation qualityLimited by same model biasesIndependent perspective
SpeedFaster (single model)Slower (second model call)
Best forGeneral quality improvementHigh-stakes verification

Common Misconceptions#

Misconception: Self-reflection always improves output quality Self-reflection improves measurable quality for tasks with clear correctness criteria. For tasks where quality is subjective, critique cycles often make outputs more conservative and generic. Match the use of self-reflection to tasks where evaluation criteria are explicit.

Misconception: Self-reflection eliminates hallucinations Self-reflection reduces certain types of errors but cannot eliminate hallucinations — if the model's initial generation contains a confident factual error, the same model critiquing that output may not catch it. For hallucination reduction, external verification against authoritative sources (CRITIC, tool-grounded evaluation) is more reliable.

Misconception: More revision cycles produce better results Diminishing returns set in quickly. Research suggests that one or two critique-revise cycles capture most of the quality improvement. Beyond that, additional cycles add cost and latency with minimal gains — and can actually introduce new errors through over-editing.

Related Terms#

  • Tree of Thought — Explores multiple reasoning paths before committing, complementary to self-reflection
  • ReAct (Reasoning + Acting) — The action-based reasoning pattern self-reflection can enhance
  • Agent Planning — Planning agents benefit from self-reflection before executing plans
  • Agent Loop — Self-reflection adds inner loops within the agent's outer execution loop
  • Inner Monologue — The internal reasoning chain that self-reflection examines
  • Build Your First AI Agent — Tutorial including agent reasoning and quality improvement patterns
  • LangChain vs AutoGen — Comparing framework support for self-reflection and iterative reasoning

Frequently Asked Questions#

What is agent self-reflection in AI?#

Agent self-reflection is a reasoning pattern where an AI agent evaluates its own draft outputs, identifies errors or gaps, and revises accordingly. The agent acts as both producer and critic — generating an initial response, then using a critique prompt to find issues, then revising. This internal review loop catches mistakes before they reach users or propagate through multi-step pipelines.

How does agent self-reflection work?#

Self-reflection typically involves at least two LLM calls: a generation call producing an initial output, and an evaluation call that critiques that output against specific criteria. The critique is fed back with instructions to revise. This generate-critique-revise cycle can repeat multiple times. More sophisticated implementations use separate evaluator agents or tool-grounded verification to make critique more reliable.

What are the main self-reflection patterns?#

Key patterns include Reflexion (stores verbal reflections in memory to guide future attempts), Self-Critique (prompts the agent to find flaws in its own output), Constitutional AI (evaluates outputs against a set of principles), and CRITIC (uses external tool verification to ground evaluation in factual checks).

Does self-reflection always improve output quality?#

Self-reflection improves quality for tasks with clear correctness criteria — code generation, factual research, structured extraction. For subjective tasks like creative writing, critique cycles can reduce diversity and make outputs more generic. Effectiveness depends on evaluation prompt quality, and each cycle adds proportional cost and latency.

Tags:
architecturereasoningfundamentals

Related Glossary Terms

What Is Inner Monologue in AI Agents?

Inner monologue is an AI agent's explicit internal chain of reasoning — the step-by-step thinking process the model generates before producing a final response. Making reasoning visible improves answer quality, enables debugging, and allows the agent to "think through" complex problems before committing to an answer.

What Is ReAct (Reasoning + Acting)?

ReAct is a prompting and agent design pattern that interleaves reasoning traces (Thought) with environment interactions (Action and Observation), enabling AI agents to solve multi-step tasks more accurately than either chain-of-thought reasoning or action-only approaches alone.

What Is AI Agent Planning?

A practical guide to AI agent planning — how agents decompose goals into subtasks, the difference between plan-and-execute and ReAct approaches, Tree of Thought planning, and how to recover from planning failures.

What Is Few-Shot Prompting?

Few-shot prompting is a technique where a small number of input-output examples are included in a prompt to guide an LLM to produce responses in a specific format, style, or reasoning pattern — enabling rapid adaptation to new tasks without fine-tuning or retraining.

← Back to Glossary