Diagram of connected nodes representing retrieval and reasoning steps in an agentic system — Photo by Alina Grubnyak on Unsplash

Build an Agentic RAG System with LangChain and Python

Standard RAG pipelines follow a fixed path: embed the question, retrieve top-k chunks, stuff them into a prompt, and return an answer. This works for simple factual questions, but it breaks down fast when questions are ambiguous, require synthesizing information from multiple sources, or need verification. Agentic RAG solves this by treating retrieval as a decision — not a deterministic step.

In this tutorial you will build an Agentic RAG system that dynamically routes queries, iterates retrieval when the first pass falls short, grades its own retrieved documents, and reflects on whether the final answer actually addresses the original question. The result is a system that handles hard real-world queries with significantly higher reliability than naive RAG.

What You'll Learn#

How to implement query routing to direct questions to the right knowledge source
How to grade retrieved documents for relevance before generating an answer
How to build a self-reflection loop that detects hallucination and re-queries
How to connect all components into a coherent LangGraph workflow
How to evaluate the quality of your agentic RAG pipeline

Prerequisites#

Python 3.10+
OpenAI API key (or any LangChain-compatible LLM)
Basic familiarity with RAG concepts for AI agents
Understanding of what AI agents are and how they differ from pipelines

Architecture Overview#

The system has four stages that form a conditional loop:

Router — Classifies the incoming question and directs it to a vectorstore, a web search tool, or a structured database query.
Retriever — Fetches candidate documents from the selected source.
Grader — Evaluates each retrieved document for relevance to the question. Irrelevant documents are filtered; if too few pass, the system re-routes.
Generator with Self-Reflection — Produces an answer, then checks whether the answer is grounded in the retrieved context and actually resolves the question. If not, it loops.

This loop can run up to a configurable maximum number of iterations before falling back to a best-effort answer.

Step 1: Setup and Dependencies#

pip install langchain==0.3.0 langchain-openai==0.2.0 langchain-community==0.3.0 \
    langgraph==0.2.0 chromadb==0.5.0 tiktoken==0.7.0 python-dotenv==1.0.1

Create a .env file:

OPENAI_API_KEY=sk-...
TAVILY_API_KEY=tvly-...  # optional, for web search fallback

# config.py
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
MAX_RETRIEVAL_ITERATIONS = 3
TOP_K_DOCUMENTS = 5
RELEVANCE_THRESHOLD = 0.7

Step 2: Build the Vector Store and Retriever#

Start by indexing your documents. In production this would be a persistent Chroma or Pinecone instance. Here we index a small set of documents to demonstrate the flow.

# vectorstore.py
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader

def build_vectorstore(urls: list[str]) -> Chroma:
    """Load web pages, chunk them, and index into Chroma."""
    loader = WebBaseLoader(urls)
    docs = loader.load()

    splitter = RecursiveCharacterTextSplitter(
        chunk_size=500,
        chunk_overlap=50,
    )
    splits = splitter.split_documents(docs)

    embeddings = OpenAIEmbeddings()
    vectorstore = Chroma.from_documents(
        documents=splits,
        embedding=embeddings,
        collection_name="agentic_rag_demo",
    )
    return vectorstore

# Example usage
SAMPLE_URLS = [
    "https://ai-agents-guide.com/glossary/ai-agents/",
    "https://ai-agents-guide.com/tutorials/introduction-to-rag-for-ai-agents/",
]
vectorstore = build_vectorstore(SAMPLE_URLS)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

Step 3: Query Router#

The router decides where a question should go. It uses an LLM with a structured output to classify the query.

# router.py
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

class RouteDecision(BaseModel):
    source: str = Field(
        description="Where to route: 'vectorstore', 'web_search', or 'direct_answer'"
    )
    reasoning: str = Field(description="One sentence explaining the routing decision")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(RouteDecision)

ROUTER_PROMPT = ChatPromptTemplate.from_messages([
    ("system", """You are a query router for a RAG system about AI agents.
Route to:
- 'vectorstore': questions about AI agent concepts, frameworks, tutorials in the knowledge base
- 'web_search': questions requiring real-time or recent information
- 'direct_answer': simple factual questions the LLM can answer without retrieval

Return a JSON with 'source' and 'reasoning'."""),
    ("human", "Question: {question}"),
])

router_chain = ROUTER_PROMPT | structured_llm

def route_query(question: str) -> RouteDecision:
    return router_chain.invoke({"question": question})

Step 4: Document Grader#

Before generating an answer, grade each retrieved document. This prevents irrelevant context from polluting the generation step.

# grader.py
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.documents import Document

class GradeResult(BaseModel):
    relevant: bool = Field(description="True if the document is relevant to the question")
    score: float = Field(ge=0.0, le=1.0, description="Relevance score from 0 to 1")

grader_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_grader = grader_llm.with_structured_output(GradeResult)

GRADER_PROMPT = ChatPromptTemplate.from_messages([
    ("system", """You are grading the relevance of a retrieved document to a question.
Return relevant=True and a high score only if the document contains information
that directly helps answer the question. Be strict."""),
    ("human", "Question: {question}\n\nDocument:\n{document}"),
])

grader_chain = GRADER_PROMPT | structured_grader

def grade_documents(question: str, documents: list[Document]) -> list[Document]:
    """Filter documents by relevance score."""
    relevant_docs = []
    for doc in documents:
        result = grader_chain.invoke({
            "question": question,
            "document": doc.page_content,
        })
        if result.relevant and result.score >= 0.6:
            doc.metadata["relevance_score"] = result.score
            relevant_docs.append(doc)
    return relevant_docs

Diagram showing the agentic RAG loop with routing, grading, and self-reflection stages

Step 5: Generator with Self-Reflection#

The generator produces an answer and then runs two checks: a hallucination check (is the answer grounded in the retrieved context?) and a resolution check (does the answer actually answer the question?).

# generator.py
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.documents import Document

# --- Answer Generator ---
GENERATE_PROMPT = ChatPromptTemplate.from_messages([
    ("system", """You are an assistant answering questions about AI agents.
Use only the provided context. If the context is insufficient, say so explicitly.

Context:
{context}"""),
    ("human", "{question}"),
])

gen_llm = ChatOpenAI(model="gpt-4o", temperature=0)
generate_chain = GENERATE_PROMPT | gen_llm

# --- Hallucination Checker ---
class HallucinationResult(BaseModel):
    grounded: bool = Field(description="True if the answer is supported by the context")

hallucination_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
hallucination_chain = ChatPromptTemplate.from_messages([
    ("system", "Does this answer rely only on the provided context? Answer grounded=True or False."),
    ("human", "Context:\n{context}\n\nAnswer:\n{answer}"),
]) | hallucination_llm.with_structured_output(HallucinationResult)

# --- Resolution Checker ---
class ResolutionResult(BaseModel):
    resolved: bool = Field(description="True if the answer fully addresses the question")

resolution_chain = ChatPromptTemplate.from_messages([
    ("system", "Does this answer fully and correctly resolve the original question?"),
    ("human", "Question: {question}\n\nAnswer: {answer}"),
]) | hallucination_llm.with_structured_output(ResolutionResult)

def generate_and_reflect(question: str, documents: list[Document]) -> dict:
    context = "\n\n".join(doc.page_content for doc in documents)
    answer = generate_chain.invoke({"question": question, "context": context})
    answer_text = answer.content

    hallucination = hallucination_chain.invoke({
        "context": context,
        "answer": answer_text,
    })
    resolution = resolution_chain.invoke({
        "question": question,
        "answer": answer_text,
    })

    return {
        "answer": answer_text,
        "grounded": hallucination.grounded,
        "resolved": resolution.resolved,
    }

Step 6: Wire Everything into a LangGraph Workflow#

# workflow.py
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langchain_core.documents import Document
from router import route_query
from grader import grade_documents
from generator import generate_and_reflect
from vectorstore import retriever
import operator

class AgentState(TypedDict):
    question: str
    documents: list[Document]
    answer: str
    iterations: int
    route: str

def route_node(state: AgentState) -> AgentState:
    decision = route_query(state["question"])
    return {**state, "route": decision.source}

def retrieve_node(state: AgentState) -> AgentState:
    if state["route"] == "vectorstore":
        docs = retriever.invoke(state["question"])
    else:
        # Fallback to vectorstore for this demo; swap with Tavily in production
        docs = retriever.invoke(state["question"])
    return {**state, "documents": docs}

def grade_node(state: AgentState) -> AgentState:
    relevant = grade_documents(state["question"], state["documents"])
    return {**state, "documents": relevant}

def generate_node(state: AgentState) -> AgentState:
    result = generate_and_reflect(state["question"], state["documents"])
    return {
        **state,
        "answer": result["answer"],
        "iterations": state.get("iterations", 0) + 1,
        "_grounded": result["grounded"],
        "_resolved": result["resolved"],
    }

def should_retry(state: AgentState) -> str:
    if state.get("iterations", 0) >= 3:
        return "end"
    if not state.get("_grounded") or not state.get("_resolved"):
        return "retry"
    return "end"

# Build graph
graph = StateGraph(AgentState)
graph.add_node("route", route_node)
graph.add_node("retrieve", retrieve_node)
graph.add_node("grade", grade_node)
graph.add_node("generate", generate_node)

graph.set_entry_point("route")
graph.add_edge("route", "retrieve")
graph.add_edge("retrieve", "grade")
graph.add_edge("grade", "generate")
graph.add_conditional_edges("generate", should_retry, {
    "retry": "retrieve",
    "end": END,
})

app = graph.compile()

# Run it
result = app.invoke({
    "question": "What is the difference between a ReAct agent and a RAG pipeline?",
    "documents": [],
    "answer": "",
    "iterations": 0,
    "route": "",
})
print(result["answer"])

Step 7: Testing and Evaluation#

Test your pipeline with queries that stress-test each branch:

# tests/test_agentic_rag.py
import pytest
from workflow import app

TEST_CASES = [
    {
        "question": "What is an AI agent?",
        "expected_keywords": ["agent", "tool", "action"],
    },
    {
        "question": "How does LangGraph differ from LangChain?",
        "expected_keywords": ["graph", "state", "workflow"],
    },
]

@pytest.mark.parametrize("case", TEST_CASES)
def test_pipeline_answers(case):
    result = app.invoke({
        "question": case["question"],
        "documents": [],
        "answer": "",
        "iterations": 0,
        "route": "",
    })
    assert result["answer"], "Pipeline should return a non-empty answer"
    answer_lower = result["answer"].lower()
    matched = sum(1 for kw in case["expected_keywords"] if kw in answer_lower)
    assert matched >= 1, f"Answer missing expected keywords: {case['expected_keywords']}"

Run with: pytest tests/test_agentic_rag.py -v

Production Considerations#

Before shipping to production, review the AI agent testing guide and the security best practices guide.

Key production checklist:

Rate limiting: wrap LLM calls in a retry decorator with exponential backoff
Observability: add tracing so you can see which nodes execute on each query (see the Langfuse observability tutorial)
Persistent vectorstore: replace in-memory Chroma with a persistent or cloud-hosted instance
Cost control: use gpt-4o-mini for routing and grading, reserve gpt-4o for final generation
Fallback answers: always return the best available answer after MAX_RETRIEVAL_ITERATIONS rather than raising an exception

What's Next#

Add memory to your system using LangGraph multi-agent patterns
Deploy this pipeline as a Docker service with the Docker deployment tutorial
Explore the Agentic RAG glossary entry to understand the theoretical background
Add structured tracing and evaluation with the Langfuse observability tutorial
Review the LangChain foundational tutorial if you need to reinforce core concepts

Build an Agentic RAG System with LangChain and Python

What You'll Learn#

How to implement query routing to direct questions to the right knowledge source
How to grade retrieved documents for relevance before generating an answer
How to build a self-reflection loop that detects hallucination and re-queries
How to connect all components into a coherent LangGraph workflow
How to evaluate the quality of your agentic RAG pipeline

Prerequisites#

Python 3.10+
OpenAI API key (or any LangChain-compatible LLM)
Basic familiarity with RAG concepts for AI agents
Understanding of what AI agents are and how they differ from pipelines

Architecture Overview#

The system has four stages that form a conditional loop:

Router — Classifies the incoming question and directs it to a vectorstore, a web search tool, or a structured database query.
Retriever — Fetches candidate documents from the selected source.
Grader — Evaluates each retrieved document for relevance to the question. Irrelevant documents are filtered; if too few pass, the system re-routes.
Generator with Self-Reflection — Produces an answer, then checks whether the answer is grounded in the retrieved context and actually resolves the question. If not, it loops.

This loop can run up to a configurable maximum number of iterations before falling back to a best-effort answer.

Step 1: Setup and Dependencies#

pip install langchain==0.3.0 langchain-openai==0.2.0 langchain-community==0.3.0 \
    langgraph==0.2.0 chromadb==0.5.0 tiktoken==0.7.0 python-dotenv==1.0.1

Create a .env file:

OPENAI_API_KEY=sk-...
TAVILY_API_KEY=tvly-...  # optional, for web search fallback

# config.py
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
MAX_RETRIEVAL_ITERATIONS = 3
TOP_K_DOCUMENTS = 5
RELEVANCE_THRESHOLD = 0.7

Step 2: Build the Vector Store and Retriever#

Start by indexing your documents. In production this would be a persistent Chroma or Pinecone instance. Here we index a small set of documents to demonstrate the flow.

# vectorstore.py
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader

def build_vectorstore(urls: list[str]) -> Chroma:
    """Load web pages, chunk them, and index into Chroma."""
    loader = WebBaseLoader(urls)
    docs = loader.load()

    splitter = RecursiveCharacterTextSplitter(
        chunk_size=500,
        chunk_overlap=50,
    )
    splits = splitter.split_documents(docs)

    embeddings = OpenAIEmbeddings()
    vectorstore = Chroma.from_documents(
        documents=splits,
        embedding=embeddings,
        collection_name="agentic_rag_demo",
    )
    return vectorstore

# Example usage
SAMPLE_URLS = [
    "https://ai-agents-guide.com/glossary/ai-agents/",
    "https://ai-agents-guide.com/tutorials/introduction-to-rag-for-ai-agents/",
]
vectorstore = build_vectorstore(SAMPLE_URLS)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

Step 3: Query Router#

The router decides where a question should go. It uses an LLM with a structured output to classify the query.

# router.py
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

class RouteDecision(BaseModel):
    source: str = Field(
        description="Where to route: 'vectorstore', 'web_search', or 'direct_answer'"
    )
    reasoning: str = Field(description="One sentence explaining the routing decision")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(RouteDecision)

ROUTER_PROMPT = ChatPromptTemplate.from_messages([
    ("system", """You are a query router for a RAG system about AI agents.
Route to:
- 'vectorstore': questions about AI agent concepts, frameworks, tutorials in the knowledge base
- 'web_search': questions requiring real-time or recent information
- 'direct_answer': simple factual questions the LLM can answer without retrieval

Return a JSON with 'source' and 'reasoning'."""),
    ("human", "Question: {question}"),
])

router_chain = ROUTER_PROMPT | structured_llm

def route_query(question: str) -> RouteDecision:
    return router_chain.invoke({"question": question})

Step 4: Document Grader#

Before generating an answer, grade each retrieved document. This prevents irrelevant context from polluting the generation step.

# grader.py
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.documents import Document

class GradeResult(BaseModel):
    relevant: bool = Field(description="True if the document is relevant to the question")
    score: float = Field(ge=0.0, le=1.0, description="Relevance score from 0 to 1")

grader_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_grader = grader_llm.with_structured_output(GradeResult)

GRADER_PROMPT = ChatPromptTemplate.from_messages([
    ("system", """You are grading the relevance of a retrieved document to a question.
Return relevant=True and a high score only if the document contains information
that directly helps answer the question. Be strict."""),
    ("human", "Question: {question}\n\nDocument:\n{document}"),
])

grader_chain = GRADER_PROMPT | structured_grader

def grade_documents(question: str, documents: list[Document]) -> list[Document]:
    """Filter documents by relevance score."""
    relevant_docs = []
    for doc in documents:
        result = grader_chain.invoke({
            "question": question,
            "document": doc.page_content,
        })
        if result.relevant and result.score >= 0.6:
            doc.metadata["relevance_score"] = result.score
            relevant_docs.append(doc)
    return relevant_docs

Diagram showing the agentic RAG loop with routing, grading, and self-reflection stages

Step 5: Generator with Self-Reflection#

# generator.py
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.documents import Document

# --- Answer Generator ---
GENERATE_PROMPT = ChatPromptTemplate.from_messages([
    ("system", """You are an assistant answering questions about AI agents.
Use only the provided context. If the context is insufficient, say so explicitly.

Context:
{context}"""),
    ("human", "{question}"),
])

gen_llm = ChatOpenAI(model="gpt-4o", temperature=0)
generate_chain = GENERATE_PROMPT | gen_llm

# --- Hallucination Checker ---
class HallucinationResult(BaseModel):
    grounded: bool = Field(description="True if the answer is supported by the context")

hallucination_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
hallucination_chain = ChatPromptTemplate.from_messages([
    ("system", "Does this answer rely only on the provided context? Answer grounded=True or False."),
    ("human", "Context:\n{context}\n\nAnswer:\n{answer}"),
]) | hallucination_llm.with_structured_output(HallucinationResult)

# --- Resolution Checker ---
class ResolutionResult(BaseModel):
    resolved: bool = Field(description="True if the answer fully addresses the question")

resolution_chain = ChatPromptTemplate.from_messages([
    ("system", "Does this answer fully and correctly resolve the original question?"),
    ("human", "Question: {question}\n\nAnswer: {answer}"),
]) | hallucination_llm.with_structured_output(ResolutionResult)

def generate_and_reflect(question: str, documents: list[Document]) -> dict:
    context = "\n\n".join(doc.page_content for doc in documents)
    answer = generate_chain.invoke({"question": question, "context": context})
    answer_text = answer.content

    hallucination = hallucination_chain.invoke({
        "context": context,
        "answer": answer_text,
    })
    resolution = resolution_chain.invoke({
        "question": question,
        "answer": answer_text,
    })

    return {
        "answer": answer_text,
        "grounded": hallucination.grounded,
        "resolved": resolution.resolved,
    }

Step 6: Wire Everything into a LangGraph Workflow#

# workflow.py
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langchain_core.documents import Document
from router import route_query
from grader import grade_documents
from generator import generate_and_reflect
from vectorstore import retriever
import operator

class AgentState(TypedDict):
    question: str
    documents: list[Document]
    answer: str
    iterations: int
    route: str

def route_node(state: AgentState) -> AgentState:
    decision = route_query(state["question"])
    return {**state, "route": decision.source}

def retrieve_node(state: AgentState) -> AgentState:
    if state["route"] == "vectorstore":
        docs = retriever.invoke(state["question"])
    else:
        # Fallback to vectorstore for this demo; swap with Tavily in production
        docs = retriever.invoke(state["question"])
    return {**state, "documents": docs}

def grade_node(state: AgentState) -> AgentState:
    relevant = grade_documents(state["question"], state["documents"])
    return {**state, "documents": relevant}

def generate_node(state: AgentState) -> AgentState:
    result = generate_and_reflect(state["question"], state["documents"])
    return {
        **state,
        "answer": result["answer"],
        "iterations": state.get("iterations", 0) + 1,
        "_grounded": result["grounded"],
        "_resolved": result["resolved"],
    }

def should_retry(state: AgentState) -> str:
    if state.get("iterations", 0) >= 3:
        return "end"
    if not state.get("_grounded") or not state.get("_resolved"):
        return "retry"
    return "end"

# Build graph
graph = StateGraph(AgentState)
graph.add_node("route", route_node)
graph.add_node("retrieve", retrieve_node)
graph.add_node("grade", grade_node)
graph.add_node("generate", generate_node)

graph.set_entry_point("route")
graph.add_edge("route", "retrieve")
graph.add_edge("retrieve", "grade")
graph.add_edge("grade", "generate")
graph.add_conditional_edges("generate", should_retry, {
    "retry": "retrieve",
    "end": END,
})

app = graph.compile()

# Run it
result = app.invoke({
    "question": "What is the difference between a ReAct agent and a RAG pipeline?",
    "documents": [],
    "answer": "",
    "iterations": 0,
    "route": "",
})
print(result["answer"])

Step 7: Testing and Evaluation#

Test your pipeline with queries that stress-test each branch:

# tests/test_agentic_rag.py
import pytest
from workflow import app

TEST_CASES = [
    {
        "question": "What is an AI agent?",
        "expected_keywords": ["agent", "tool", "action"],
    },
    {
        "question": "How does LangGraph differ from LangChain?",
        "expected_keywords": ["graph", "state", "workflow"],
    },
]

@pytest.mark.parametrize("case", TEST_CASES)
def test_pipeline_answers(case):
    result = app.invoke({
        "question": case["question"],
        "documents": [],
        "answer": "",
        "iterations": 0,
        "route": "",
    })
    assert result["answer"], "Pipeline should return a non-empty answer"
    answer_lower = result["answer"].lower()
    matched = sum(1 for kw in case["expected_keywords"] if kw in answer_lower)
    assert matched >= 1, f"Answer missing expected keywords: {case['expected_keywords']}"

Run with: pytest tests/test_agentic_rag.py -v

Production Considerations#

Before shipping to production, review the AI agent testing guide and the security best practices guide.

Key production checklist:

Rate limiting: wrap LLM calls in a retry decorator with exponential backoff
Observability: add tracing so you can see which nodes execute on each query (see the Langfuse observability tutorial)
Persistent vectorstore: replace in-memory Chroma with a persistent or cloud-hosted instance
Cost control: use gpt-4o-mini for routing and grading, reserve gpt-4o for final generation
Fallback answers: always return the best available answer after MAX_RETRIEVAL_ITERATIONS rather than raising an exception

What's Next#

Add memory to your system using LangGraph multi-agent patterns
Deploy this pipeline as a Docker service with the Docker deployment tutorial
Explore the Agentic RAG glossary entry to understand the theoretical background
Add structured tracing and evaluation with the Langfuse observability tutorial
Review the LangChain foundational tutorial if you need to reinforce core concepts

Build an Agentic RAG with LangChain

Build an Agentic RAG System with LangChain and Python

What You'll Learn#

Prerequisites#

Architecture Overview#

Step 1: Setup and Dependencies#

Step 2: Build the Vector Store and Retriever#

Step 3: Query Router#

Step 4: Document Grader#

Step 5: Generator with Self-Reflection#

Step 6: Wire Everything into a LangGraph Workflow#

Step 7: Testing and Evaluation#

Production Considerations#

What's Next#

Build an Agentic RAG with LangChain

Build an Agentic RAG System with LangChain and Python

What You'll Learn#

Prerequisites#

Architecture Overview#

Step 1: Setup and Dependencies#

Step 2: Build the Vector Store and Retriever#

Step 3: Query Router#

Step 4: Document Grader#

Step 5: Generator with Self-Reflection#

Step 6: Wire Everything into a LangGraph Workflow#

Step 7: Testing and Evaluation#

Production Considerations#

What's Next#