What Is Agent Self-Reflection?
Quick Definition#
Agent self-reflection is the ability of an AI agent to evaluate and critique its own outputs or reasoning, identify weaknesses or errors, and revise before producing a final result. Rather than accepting the first output from an LLM call, a self-reflective agent runs an internal review loop — generating a draft, critiquing it against explicit criteria, and refining based on its own assessment. This reduce errors that would otherwise reach users or propagate through downstream workflow steps.
Browse all AI agent terms in the AI Agent Glossary. For multi-branch reasoning that explores alternatives before committing, see Tree of Thought. For action-based reasoning with tool use, see ReAct (Reasoning + Acting).
Why Self-Reflection Matters#
A single LLM call produces output probabilistically — the model's first attempt is often good but not optimal. Common failure modes self-reflection catches:
- Factual errors: The initial response contains an incorrect date, statistic, or technical claim
- Incomplete coverage: The answer misses important aspects of the question
- Logical inconsistency: The reasoning contains a contradiction that is not obvious from the prompt
- Code bugs: Generated code has an off-by-one error or edge case not handled
- Format violations: The output does not match the requested schema or structure
Without self-reflection, these errors reach the user or become inputs to the next step in a multi-agent pipeline, where they compound. With self-reflection, the agent catches and corrects them before the output leaves its control.
Self-Reflection Patterns#
Pattern 1: Simple Self-Critique#
The most basic pattern: the agent generates an output, then critiques it against explicit criteria:
from anthropic import Anthropic
client = Anthropic()
def generate_with_reflection(task: str, criteria: list[str], max_revisions: int = 2) -> str:
"""Generate output, critique against criteria, and revise."""
# Step 1: Generate initial draft
draft_response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1500,
messages=[{"role": "user", "content": task}]
)
draft = draft_response.content[0].text
for revision in range(max_revisions):
# Step 2: Critique the draft
criteria_text = "\n".join(f"- {c}" for c in criteria)
critique_response = client.messages.create(
model="claude-opus-4-6",
max_tokens=800,
messages=[{
"role": "user",
"content": f"""Review this output against the following criteria:
{criteria_text}
Output to review:
{draft}
Identify specific issues. If the output satisfies all criteria, respond with "APPROVED".
Otherwise, list the specific problems found."""
}]
)
critique = critique_response.content[0].text
if "APPROVED" in critique.upper():
break # Output meets criteria — stop revising
# Step 3: Revise based on critique
revision_response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1500,
messages=[{
"role": "user",
"content": f"""Original task: {task}
Your previous output:
{draft}
Critique of your output:
{critique}
Please revise your output to address all the identified issues."""
}]
)
draft = revision_response.content[0].text
return draft
# Usage
result = generate_with_reflection(
task="Write a Python function that validates email addresses",
criteria=[
"Handles edge cases (empty string, None input)",
"Includes docstring explaining behavior",
"Does not use external libraries",
"Returns True/False consistently"
]
)
Pattern 2: Reflexion (Memory-Based Reflection)#
The Reflexion framework (Shinn et al., 2023) extends self-critique by storing reflections in memory to guide future attempts:
class ReflexionAgent:
def __init__(self, task: str):
self.task = task
self.reflections = [] # Memory of past failures and learnings
self.client = Anthropic()
def run(self, max_attempts: int = 3) -> str:
for attempt in range(max_attempts):
# Generate with context from past reflections
reflection_context = "\n".join(self.reflections) if self.reflections else "No prior attempts."
response = self.client.messages.create(
model="claude-opus-4-6",
max_tokens=2000,
messages=[{
"role": "user",
"content": f"""Task: {self.task}
Lessons from previous attempts:
{reflection_context}
Complete the task, incorporating lessons learned."""
}]
)
output = response.content[0].text
# Evaluate the output
is_success, feedback = self._evaluate(output)
if is_success:
return output
# Generate reflection for memory
reflection = self._reflect(output, feedback)
self.reflections.append(f"Attempt {attempt + 1}: {reflection}")
return output # Return best attempt
def _evaluate(self, output: str) -> tuple[bool, str]:
"""Evaluate output quality. Returns (success, feedback)."""
# Custom evaluation logic here
pass
def _reflect(self, output: str, feedback: str) -> str:
"""Generate a concise lesson from the failed attempt."""
response = self.client.messages.create(
model="claude-opus-4-6",
max_tokens=300,
messages=[{
"role": "user",
"content": f"""Given this failed attempt and feedback, write one concise lesson to remember:
Output: {output}
Feedback: {feedback}
Lesson (1-2 sentences):"""
}]
)
return response.content[0].text
Pattern 3: Constitutional Self-Critique#
Evaluate outputs against a set of principles rather than task-specific criteria:
AGENT_CONSTITUTION = [
"Be factually accurate — do not state things you are uncertain about as fact",
"Be helpful — provide actionable, specific information rather than vague advice",
"Be honest — acknowledge limitations and uncertainties explicitly",
"Be safe — do not provide information that could cause harm"
]
def constitutional_check(output: str) -> tuple[bool, str]:
"""Check output against constitutional principles."""
principles_text = "\n".join(f"{i+1}. {p}" for i, p in enumerate(AGENT_CONSTITUTION))
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=500,
messages=[{
"role": "user",
"content": f"""Evaluate whether this output violates any of the following principles:
{principles_text}
Output to evaluate:
{output}
Does this output violate any principles? If yes, identify which ones and explain.
If no violations, respond with "COMPLIANT"."""
}]
)
critique = response.content[0].text
return "COMPLIANT" in critique.upper(), critique
When Self-Reflection Helps vs Hurts#
Self-reflection helps:#
- Code generation: Catching bugs, missing error handling, and edge cases before running
- Factual research: Identifying unsupported claims or logical gaps
- Structured data extraction: Ensuring outputs match required schemas
- Multi-step agent pipelines: Preventing errors from compounding in downstream steps
Self-reflection may hurt:#
- Creative tasks: Critique criteria applied to creative writing can make outputs more generic and less distinctive
- Conversational responses: Over-refinement produces stilted, less natural dialogue
- Time-sensitive tasks: Each critique-revise cycle adds latency; for real-time applications this may be unacceptable
- Cost-sensitive contexts: Multiple LLM calls for each output increases per-query costs proportionally
Self-Reflection vs External Evaluation#
| Dimension | Self-Reflection | External Evaluator |
|---|---|---|
| Evaluation source | Same model as generator | Separate model or system |
| Setup cost | Low (just prompting) | Higher (separate agent/model) |
| Evaluation quality | Limited by same model biases | Independent perspective |
| Speed | Faster (single model) | Slower (second model call) |
| Best for | General quality improvement | High-stakes verification |
Common Misconceptions#
Misconception: Self-reflection always improves output quality Self-reflection improves measurable quality for tasks with clear correctness criteria. For tasks where quality is subjective, critique cycles often make outputs more conservative and generic. Match the use of self-reflection to tasks where evaluation criteria are explicit.
Misconception: Self-reflection eliminates hallucinations Self-reflection reduces certain types of errors but cannot eliminate hallucinations — if the model's initial generation contains a confident factual error, the same model critiquing that output may not catch it. For hallucination reduction, external verification against authoritative sources (CRITIC, tool-grounded evaluation) is more reliable.
Misconception: More revision cycles produce better results Diminishing returns set in quickly. Research suggests that one or two critique-revise cycles capture most of the quality improvement. Beyond that, additional cycles add cost and latency with minimal gains — and can actually introduce new errors through over-editing.
Related Terms#
- Tree of Thought — Explores multiple reasoning paths before committing, complementary to self-reflection
- ReAct (Reasoning + Acting) — The action-based reasoning pattern self-reflection can enhance
- Agent Planning — Planning agents benefit from self-reflection before executing plans
- Agent Loop — Self-reflection adds inner loops within the agent's outer execution loop
- Inner Monologue — The internal reasoning chain that self-reflection examines
- Build Your First AI Agent — Tutorial including agent reasoning and quality improvement patterns
- LangChain vs AutoGen — Comparing framework support for self-reflection and iterative reasoning
Frequently Asked Questions#
What is agent self-reflection in AI?#
Agent self-reflection is a reasoning pattern where an AI agent evaluates its own draft outputs, identifies errors or gaps, and revises accordingly. The agent acts as both producer and critic — generating an initial response, then using a critique prompt to find issues, then revising. This internal review loop catches mistakes before they reach users or propagate through multi-step pipelines.
How does agent self-reflection work?#
Self-reflection typically involves at least two LLM calls: a generation call producing an initial output, and an evaluation call that critiques that output against specific criteria. The critique is fed back with instructions to revise. This generate-critique-revise cycle can repeat multiple times. More sophisticated implementations use separate evaluator agents or tool-grounded verification to make critique more reliable.
What are the main self-reflection patterns?#
Key patterns include Reflexion (stores verbal reflections in memory to guide future attempts), Self-Critique (prompts the agent to find flaws in its own output), Constitutional AI (evaluates outputs against a set of principles), and CRITIC (uses external tool verification to ground evaluation in factual checks).
Does self-reflection always improve output quality?#
Self-reflection improves quality for tasks with clear correctness criteria — code generation, factual research, structured extraction. For subjective tasks like creative writing, critique cycles can reduce diversity and make outputs more generic. Effectiveness depends on evaluation prompt quality, and each cycle adds proportional cost and latency.