What Is Tree of Thought?
Quick Definition#
Tree of Thought (ToT) is an LLM reasoning strategy that explores multiple reasoning branches simultaneously rather than following a single linear chain of thought. At each reasoning step, the model generates several candidate next thoughts, evaluates their promise, and pursues the most productive branches — backtracking from dead ends just as a human expert might consider and abandon multiple problem-solving approaches before finding the right one.
Browse all AI agent terms in the AI Agent Glossary. For the action-based counterpart to ToT, see ReAct (Reasoning + Acting). For planning in AI agents, see Agent Planning.
The Problem Tree of Thought Solves#
Standard Chain of Thought (CoT) prompting generates reasoning steps sequentially:
Step 1 → Step 2 → Step 3 → Answer
This works well for problems with a clear, linear solution path. It fails when:
- Early mistakes propagate: A wrong assumption in step 2 corrupts everything that follows, and the model has no mechanism to backtrack.
- Multiple valid approaches exist: The model commits to the first plausible approach even if a better one exists.
- Exploration is required: Some problems require considering multiple options before evaluating which is best.
Tree of Thought addresses this by treating reasoning as a search problem, not a generation problem.
How Tree of Thought Works#
The Core Structure#
At each step, instead of generating one thought, the model generates multiple candidate thoughts:
Problem
/ | \
T1.1 T1.2 T1.3 ← Branch: generate 3 candidate next steps
| | |
eval eval eval ← Evaluate: score each candidate
| |
T2.1 T2.2 ← Expand promising branches; prune weak ones
The three key operations are:
- Thought generation: Produce multiple candidate next reasoning steps
- State evaluation: Score each candidate (promising / uncertain / impossible)
- Search: Use BFS (breadth-first) or DFS (depth-first) to navigate the tree
BFS vs DFS Search Strategies#
BFS (Breadth-First Search): Explore all nodes at each depth before going deeper. Best when the optimal path length is unknown and early evaluation is reliable.
DFS (Depth-First Search): Explore one branch to completion before trying others. Best for problems where partial solutions are easy to evaluate and the tree is deep.
Implementing Tree of Thought#
Simple ToT with LLM Prompting#
from anthropic import Anthropic
import json
client = Anthropic()
def generate_thoughts(problem: str, current_state: str, n: int = 3) -> list[str]:
"""Generate n candidate next reasoning steps."""
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1000,
messages=[{
"role": "user",
"content": f"""Problem: {problem}
Current reasoning state: {current_state}
Generate {n} different possible next steps in the reasoning process.
Each step should explore a distinct approach or direction.
Format as JSON: {{"thoughts": ["step1", "step2", "step3"]}}"""
}]
)
data = json.loads(response.content[0].text)
return data["thoughts"]
def evaluate_thoughts(problem: str, thoughts: list[str]) -> list[dict]:
"""Score each candidate thought on promise."""
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=500,
messages=[{
"role": "user",
"content": f"""Problem: {problem}
Evaluate each reasoning step below. Score as 'sure', 'likely', or 'impossible'.
Steps: {json.dumps(thoughts)}
Format as JSON: {{"evaluations": [{{"thought": "...", "score": "sure|likely|impossible", "reasoning": "..."}}]}}"""
}]
)
data = json.loads(response.content[0].text)
return data["evaluations"]
def tree_of_thought(problem: str, depth: int = 3, breadth: int = 3) -> str:
"""Run BFS Tree of Thought search."""
# Start with the problem as the initial state
frontier = [""] # Current reasoning paths
for level in range(depth):
all_thoughts = []
# Generate candidates for each frontier state
for state in frontier:
thoughts = generate_thoughts(problem, state, n=breadth)
all_thoughts.extend([(state, t) for t in thoughts])
# Evaluate all candidates
thought_texts = [t for _, t in all_thoughts]
evaluations = evaluate_thoughts(problem, thought_texts)
# Keep only "sure" and "likely" candidates
promising = [
state + "\n" + eval_item["thought"]
for (state, _), eval_item in zip(all_thoughts, evaluations)
if eval_item["score"] != "impossible"
]
frontier = promising[:breadth] # Limit frontier size
if not frontier:
break
# Select the best final state and generate the answer
best_state = frontier[0] if frontier else "No valid path found"
return best_state
# Example: Mathematical reasoning
result = tree_of_thought(
problem="What are all possible ways to arrange 3 books on a shelf?",
depth=3,
breadth=3
)
Tree of Thought vs Other Reasoning Approaches#
| Approach | Path | Evaluation | Backtracking | Best For |
|---|---|---|---|---|
| Standard prompting | Single, direct | None | None | Simple queries |
| Chain of Thought | Single, linear | None | None | Multi-step reasoning |
| ReAct | Linear + actions | Implicit | None | Tool-using agents |
| Tree of Thought | Multiple branches | Explicit | Yes | Complex planning |
| Self-Consistency | Multiple linear paths | Vote | Implicit | Reasoning with uncertainty |
When to Use Tree of Thought#
Use ToT when:
- The problem requires exploring multiple solution approaches (algorithm design, mathematical proofs)
- Early mistakes are difficult to recover from with linear reasoning
- The best solution path is not obvious upfront
- Solution quality matters more than latency
Avoid ToT when:
- The problem has a clear linear solution path
- Latency is critical (ToT uses 3–10× more LLM calls than CoT)
- The problem is primarily factual retrieval rather than reasoning
- You need streaming responses (ToT's branching structure is hard to stream)
Common Misconceptions#
Misconception: Tree of Thought requires specialized model training ToT is entirely a prompting strategy — it works with any LLM capable of structured output. No fine-tuning or special model capabilities are required. The "tree" is managed by the orchestrating code, not the model.
Misconception: More branches always produce better results Increasing breadth (candidates per node) and depth rapidly increases cost and latency without proportional quality gains. For most problems, breadth of 3–5 and depth of 3–4 captures the benefits while remaining practical.
Misconception: ToT replaces ReAct for agent tasks ReAct is designed for tasks requiring tool use and real-world interaction. ToT is designed for internal reasoning over a fixed problem. They are complementary: an agent using ReAct could use ToT internally when planning its next action before executing it.
Related Terms#
- ReAct (Reasoning + Acting) — The action-focused counterpart to ToT's search-based reasoning
- Agent Planning — How agents use structured reasoning to plan multi-step tasks
- Agent Self-Reflection — Evaluating reasoning quality at each step
- Agent Loop — The execution cycle ToT operates within
- Context Window — The limit on how much reasoning state ToT can track
- Build Your First AI Agent — Tutorial covering agent reasoning patterns
- LangChain vs AutoGen — Comparing reasoning framework support across agent libraries
Frequently Asked Questions#
What is Tree of Thought in AI?#
Tree of Thought (ToT) is a reasoning framework for LLMs that explores multiple reasoning paths simultaneously, unlike Chain of Thought which follows a single linear sequence. The model generates several possible next steps, evaluates which are most promising, and explores the best paths further — backtracking from dead ends. This significantly improves performance on tasks requiring planning and strategic problem-solving.
How does Tree of Thought differ from Chain of Thought?#
Chain of Thought follows one linear reasoning path — if it goes wrong early, the error propagates. Tree of Thought explores multiple paths from each decision point, evaluates their promise, and can backtrack from unproductive branches. CoT is fast and good for straightforward reasoning; ToT is slower but far better for complex problems with multiple valid approaches.
When should I use Tree of Thought vs Chain of Thought?#
Use Tree of Thought when the problem benefits from exploring multiple approaches — mathematical proofs, algorithm design, creative writing with structural choices, and complex planning. Use Chain of Thought for simpler reasoning tasks or when latency matters more than solution quality. ToT uses significantly more LLM calls than CoT.
How is Tree of Thought implemented?#
ToT is implemented through repeated LLM calls that generate candidate next steps, followed by evaluation calls that score each candidate. A search algorithm (BFS or DFS) controls which branches to expand. The simplest implementation prompts the model to generate multiple possible next steps and evaluate each one, with orchestrating code managing the search tree.