What is Tree of Thought reasoning?

Tree of Thought (ToT) is a reasoning technique where the LLM explores multiple reasoning paths in parallel, evaluating intermediate steps before committing to a final answer. Unlike linear chain-of-thought, ToT branches at decision points and selects the most promising path.

When does Tree of Thought outperform chain-of-thought?

ToT outperforms CoT on tasks requiring deliberate planning, where early decisions significantly affect final outcomes, or where multiple valid approaches exist. Examples: complex math problems, strategic planning, creative writing with specific constraints, and multi-step puzzles.

Is Tree of Thought practical for production AI agents?

ToT is computationally expensive — generating and evaluating multiple thought branches multiplies token usage and latency. For most production agents, standard chain-of-thought or ReAct is more practical. ToT is valuable for offline reasoning tasks where quality matters more than speed.

Decision pathways representing multi-branch reasoning exploration — Photo by Dmitry Ratushny on Unsplash

What Is Tree of Thought?

Quick Definition#

Tree of Thought (ToT) is an LLM reasoning strategy that explores multiple reasoning branches simultaneously rather than following a single linear chain of thought. At each reasoning step, the model generates several candidate next thoughts, evaluates their promise, and pursues the most productive branches — backtracking from dead ends just as a human expert might consider and abandon multiple problem-solving approaches before finding the right one.

Browse all AI agent terms in the AI Agent Glossary. For the action-based counterpart to ToT, see ReAct (Reasoning + Acting). For planning in AI agents, see Agent Planning.

The Problem Tree of Thought Solves#

Standard Chain of Thought (CoT) prompting generates reasoning steps sequentially:

Step 1 → Step 2 → Step 3 → Answer

This works well for problems with a clear, linear solution path. It fails when:

Early mistakes propagate: A wrong assumption in step 2 corrupts everything that follows, and the model has no mechanism to backtrack.
Multiple valid approaches exist: The model commits to the first plausible approach even if a better one exists.
Exploration is required: Some problems require considering multiple options before evaluating which is best.

Tree of Thought addresses this by treating reasoning as a search problem, not a generation problem.

How Tree of Thought Works#

The Core Structure#

At each step, instead of generating one thought, the model generates multiple candidate thoughts:

                    Problem
                   /   |   \
               T1.1  T1.2  T1.3     ← Branch: generate 3 candidate next steps
                |     |     |
              eval   eval  eval     ← Evaluate: score each candidate
                |           |
              T2.1         T2.2     ← Expand promising branches; prune weak ones

The three key operations are:

Thought generation: Produce multiple candidate next reasoning steps
State evaluation: Score each candidate (promising / uncertain / impossible)
Search: Use BFS (breadth-first) or DFS (depth-first) to navigate the tree

BFS vs DFS Search Strategies#

BFS (Breadth-First Search): Explore all nodes at each depth before going deeper. Best when the optimal path length is unknown and early evaluation is reliable.

DFS (Depth-First Search): Explore one branch to completion before trying others. Best for problems where partial solutions are easy to evaluate and the tree is deep.

Implementing Tree of Thought#

Simple ToT with LLM Prompting#

from anthropic import Anthropic
import json

client = Anthropic()

def generate_thoughts(problem: str, current_state: str, n: int = 3) -> list[str]:
    """Generate n candidate next reasoning steps."""
    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1000,
        messages=[{
            "role": "user",
            "content": f"""Problem: {problem}

Current reasoning state: {current_state}

Generate {n} different possible next steps in the reasoning process.
Each step should explore a distinct approach or direction.
Format as JSON: {{"thoughts": ["step1", "step2", "step3"]}}"""
        }]
    )
    data = json.loads(response.content[0].text)
    return data["thoughts"]

def evaluate_thoughts(problem: str, thoughts: list[str]) -> list[dict]:
    """Score each candidate thought on promise."""
    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"""Problem: {problem}

Evaluate each reasoning step below. Score as 'sure', 'likely', or 'impossible'.
Steps: {json.dumps(thoughts)}

Format as JSON: {{"evaluations": [{{"thought": "...", "score": "sure|likely|impossible", "reasoning": "..."}}]}}"""
        }]
    )
    data = json.loads(response.content[0].text)
    return data["evaluations"]

def tree_of_thought(problem: str, depth: int = 3, breadth: int = 3) -> str:
    """Run BFS Tree of Thought search."""
    # Start with the problem as the initial state
    frontier = [""]  # Current reasoning paths

    for level in range(depth):
        all_thoughts = []

        # Generate candidates for each frontier state
        for state in frontier:
            thoughts = generate_thoughts(problem, state, n=breadth)
            all_thoughts.extend([(state, t) for t in thoughts])

        # Evaluate all candidates
        thought_texts = [t for _, t in all_thoughts]
        evaluations = evaluate_thoughts(problem, thought_texts)

        # Keep only "sure" and "likely" candidates
        promising = [
            state + "\n" + eval_item["thought"]
            for (state, _), eval_item in zip(all_thoughts, evaluations)
            if eval_item["score"] != "impossible"
        ]

        frontier = promising[:breadth]  # Limit frontier size

        if not frontier:
            break

    # Select the best final state and generate the answer
    best_state = frontier[0] if frontier else "No valid path found"
    return best_state

# Example: Mathematical reasoning
result = tree_of_thought(
    problem="What are all possible ways to arrange 3 books on a shelf?",
    depth=3,
    breadth=3
)

Tree of Thought vs Other Reasoning Approaches#

Approach	Path	Evaluation	Backtracking	Best For
Standard prompting	Single, direct	None	None	Simple queries
Chain of Thought	Single, linear	None	None	Multi-step reasoning
ReAct	Linear + actions	Implicit	None	Tool-using agents
Tree of Thought	Multiple branches	Explicit	Yes	Complex planning
Self-Consistency	Multiple linear paths	Vote	Implicit	Reasoning with uncertainty

When to Use Tree of Thought#

Use ToT when:

The problem requires exploring multiple solution approaches (algorithm design, mathematical proofs)
Early mistakes are difficult to recover from with linear reasoning
The best solution path is not obvious upfront
Solution quality matters more than latency

Avoid ToT when:

The problem has a clear linear solution path
Latency is critical (ToT uses 3–10× more LLM calls than CoT)
The problem is primarily factual retrieval rather than reasoning
You need streaming responses (ToT's branching structure is hard to stream)

Common Misconceptions#

Misconception: Tree of Thought requires specialized model training ToT is entirely a prompting strategy — it works with any LLM capable of structured output. No fine-tuning or special model capabilities are required. The "tree" is managed by the orchestrating code, not the model.

Misconception: More branches always produce better results Increasing breadth (candidates per node) and depth rapidly increases cost and latency without proportional quality gains. For most problems, breadth of 3–5 and depth of 3–4 captures the benefits while remaining practical.

Misconception: ToT replaces ReAct for agent tasks ReAct is designed for tasks requiring tool use and real-world interaction. ToT is designed for internal reasoning over a fixed problem. They are complementary: an agent using ReAct could use ToT internally when planning its next action before executing it.

ReAct (Reasoning + Acting) — The action-focused counterpart to ToT's search-based reasoning
Agent Planning — How agents use structured reasoning to plan multi-step tasks
Agent Self-Reflection — Evaluating reasoning quality at each step
Agent Loop — The execution cycle ToT operates within
Context Window — The limit on how much reasoning state ToT can track
Build Your First AI Agent — Tutorial covering agent reasoning patterns
LangChain vs AutoGen — Comparing reasoning framework support across agent libraries

Frequently Asked Questions#

What is Tree of Thought in AI?#

Tree of Thought (ToT) is a reasoning framework for LLMs that explores multiple reasoning paths simultaneously, unlike Chain of Thought which follows a single linear sequence. The model generates several possible next steps, evaluates which are most promising, and explores the best paths further — backtracking from dead ends. This significantly improves performance on tasks requiring planning and strategic problem-solving.

How does Tree of Thought differ from Chain of Thought?#

Chain of Thought follows one linear reasoning path — if it goes wrong early, the error propagates. Tree of Thought explores multiple paths from each decision point, evaluates their promise, and can backtrack from unproductive branches. CoT is fast and good for straightforward reasoning; ToT is slower but far better for complex problems with multiple valid approaches.

When should I use Tree of Thought vs Chain of Thought?#

Use Tree of Thought when the problem benefits from exploring multiple approaches — mathematical proofs, algorithm design, creative writing with structural choices, and complex planning. Use Chain of Thought for simpler reasoning tasks or when latency matters more than solution quality. ToT uses significantly more LLM calls than CoT.

How is Tree of Thought implemented?#

ToT is implemented through repeated LLM calls that generate candidate next steps, followed by evaluation calls that score each candidate. A search algorithm (BFS or DFS) controls which branches to expand. The simplest implementation prompts the model to generate multiple possible next steps and evaluate each one, with orchestrating code managing the search tree.

What Is Tree of Thought?

Quick Definition#

Browse all AI agent terms in the AI Agent Glossary. For the action-based counterpart to ToT, see ReAct (Reasoning + Acting). For planning in AI agents, see Agent Planning.

The Problem Tree of Thought Solves#

Standard Chain of Thought (CoT) prompting generates reasoning steps sequentially:

Step 1 → Step 2 → Step 3 → Answer

This works well for problems with a clear, linear solution path. It fails when:

Early mistakes propagate: A wrong assumption in step 2 corrupts everything that follows, and the model has no mechanism to backtrack.
Multiple valid approaches exist: The model commits to the first plausible approach even if a better one exists.
Exploration is required: Some problems require considering multiple options before evaluating which is best.

Tree of Thought addresses this by treating reasoning as a search problem, not a generation problem.

How Tree of Thought Works#

The Core Structure#

At each step, instead of generating one thought, the model generates multiple candidate thoughts:

                    Problem
                   /   |   \
               T1.1  T1.2  T1.3     ← Branch: generate 3 candidate next steps
                |     |     |
              eval   eval  eval     ← Evaluate: score each candidate
                |           |
              T2.1         T2.2     ← Expand promising branches; prune weak ones

The three key operations are:

Thought generation: Produce multiple candidate next reasoning steps
State evaluation: Score each candidate (promising / uncertain / impossible)
Search: Use BFS (breadth-first) or DFS (depth-first) to navigate the tree

BFS vs DFS Search Strategies#

BFS (Breadth-First Search): Explore all nodes at each depth before going deeper. Best when the optimal path length is unknown and early evaluation is reliable.

DFS (Depth-First Search): Explore one branch to completion before trying others. Best for problems where partial solutions are easy to evaluate and the tree is deep.

Implementing Tree of Thought#

Simple ToT with LLM Prompting#

from anthropic import Anthropic
import json

client = Anthropic()

def generate_thoughts(problem: str, current_state: str, n: int = 3) -> list[str]:
    """Generate n candidate next reasoning steps."""
    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1000,
        messages=[{
            "role": "user",
            "content": f"""Problem: {problem}

Current reasoning state: {current_state}

Generate {n} different possible next steps in the reasoning process.
Each step should explore a distinct approach or direction.
Format as JSON: {{"thoughts": ["step1", "step2", "step3"]}}"""
        }]
    )
    data = json.loads(response.content[0].text)
    return data["thoughts"]

def evaluate_thoughts(problem: str, thoughts: list[str]) -> list[dict]:
    """Score each candidate thought on promise."""
    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"""Problem: {problem}

Evaluate each reasoning step below. Score as 'sure', 'likely', or 'impossible'.
Steps: {json.dumps(thoughts)}

Format as JSON: {{"evaluations": [{{"thought": "...", "score": "sure|likely|impossible", "reasoning": "..."}}]}}"""
        }]
    )
    data = json.loads(response.content[0].text)
    return data["evaluations"]

def tree_of_thought(problem: str, depth: int = 3, breadth: int = 3) -> str:
    """Run BFS Tree of Thought search."""
    # Start with the problem as the initial state
    frontier = [""]  # Current reasoning paths

    for level in range(depth):
        all_thoughts = []

        # Generate candidates for each frontier state
        for state in frontier:
            thoughts = generate_thoughts(problem, state, n=breadth)
            all_thoughts.extend([(state, t) for t in thoughts])

        # Evaluate all candidates
        thought_texts = [t for _, t in all_thoughts]
        evaluations = evaluate_thoughts(problem, thought_texts)

        # Keep only "sure" and "likely" candidates
        promising = [
            state + "\n" + eval_item["thought"]
            for (state, _), eval_item in zip(all_thoughts, evaluations)
            if eval_item["score"] != "impossible"
        ]

        frontier = promising[:breadth]  # Limit frontier size

        if not frontier:
            break

    # Select the best final state and generate the answer
    best_state = frontier[0] if frontier else "No valid path found"
    return best_state

# Example: Mathematical reasoning
result = tree_of_thought(
    problem="What are all possible ways to arrange 3 books on a shelf?",
    depth=3,
    breadth=3
)

Tree of Thought vs Other Reasoning Approaches#

Approach	Path	Evaluation	Backtracking	Best For
Standard prompting	Single, direct	None	None	Simple queries
Chain of Thought	Single, linear	None	None	Multi-step reasoning
ReAct	Linear + actions	Implicit	None	Tool-using agents
Tree of Thought	Multiple branches	Explicit	Yes	Complex planning
Self-Consistency	Multiple linear paths	Vote	Implicit	Reasoning with uncertainty

When to Use Tree of Thought#

Use ToT when:

The problem requires exploring multiple solution approaches (algorithm design, mathematical proofs)
Early mistakes are difficult to recover from with linear reasoning
The best solution path is not obvious upfront
Solution quality matters more than latency

Avoid ToT when:

The problem has a clear linear solution path
Latency is critical (ToT uses 3–10× more LLM calls than CoT)
The problem is primarily factual retrieval rather than reasoning
You need streaming responses (ToT's branching structure is hard to stream)

Common Misconceptions#

ReAct (Reasoning + Acting) — The action-focused counterpart to ToT's search-based reasoning
Agent Planning — How agents use structured reasoning to plan multi-step tasks
Agent Self-Reflection — Evaluating reasoning quality at each step
Agent Loop — The execution cycle ToT operates within
Context Window — The limit on how much reasoning state ToT can track
Build Your First AI Agent — Tutorial covering agent reasoning patterns
LangChain vs AutoGen — Comparing reasoning framework support across agent libraries

What Is Tree of Thought?

Term Snapshot

What Is Tree of Thought?

Quick Definition#

The Problem Tree of Thought Solves#

How Tree of Thought Works#

The Core Structure#

BFS vs DFS Search Strategies#

Implementing Tree of Thought#

Simple ToT with LLM Prompting#

Tree of Thought vs Other Reasoning Approaches#

When to Use Tree of Thought#

Common Misconceptions#

Frequently Asked Questions#

What is Tree of Thought in AI?#

How does Tree of Thought differ from Chain of Thought?#

When should I use Tree of Thought vs Chain of Thought?#

How is Tree of Thought implemented?#

What Is Tree of Thought?

Term Snapshot

What Is Tree of Thought?

Quick Definition#

The Problem Tree of Thought Solves#

How Tree of Thought Works#

The Core Structure#

BFS vs DFS Search Strategies#

Implementing Tree of Thought#

Simple ToT with LLM Prompting#

Tree of Thought vs Other Reasoning Approaches#

When to Use Tree of Thought#

Common Misconceptions#

Frequently Asked Questions#

What is Tree of Thought in AI?#

How does Tree of Thought differ from Chain of Thought?#

When should I use Tree of Thought vs Chain of Thought?#

How is Tree of Thought implemented?#

Term Snapshot

What Is Tree of Thought?

Quick Definition#

The Problem Tree of Thought Solves#

How Tree of Thought Works#

The Core Structure#

BFS vs DFS Search Strategies#

Implementing Tree of Thought#

Simple ToT with LLM Prompting#

Tree of Thought vs Other Reasoning Approaches#

When to Use Tree of Thought#

Common Misconceptions#

Related Terms#

Frequently Asked Questions#

What is Tree of Thought in AI?#

How does Tree of Thought differ from Chain of Thought?#

When should I use Tree of Thought vs Chain of Thought?#

How is Tree of Thought implemented?#

Term Snapshot

What Is Tree of Thought?

Quick Definition#

The Problem Tree of Thought Solves#

How Tree of Thought Works#

The Core Structure#

BFS vs DFS Search Strategies#

Implementing Tree of Thought#

Simple ToT with LLM Prompting#

Tree of Thought vs Other Reasoning Approaches#

When to Use Tree of Thought#

Common Misconceptions#

Related Terms#

Frequently Asked Questions#

What is Tree of Thought in AI?#

How does Tree of Thought differ from Chain of Thought?#

When should I use Tree of Thought vs Chain of Thought?#

How is Tree of Thought implemented?#