Build a Data-Aware AI Agent with LlamaIndex
LlamaIndex is the leading framework for building data-aware AI applications. While most agent frameworks focus on tool calling and orchestration, LlamaIndex's unique strength is connecting AI models to your actual data: PDFs, databases, APIs, websites, spreadsheets, and more — and making that data queryable through an agent interface.
In this tutorial you will build a research assistant that can query two different knowledge bases (a product documentation set and a financial database), decide which one to consult for any given question, and synthesize answers that combine information from both sources. This pattern — called agentic RAG — is one of the most valuable and production-tested patterns in enterprise AI.
What You'll Learn#
- How to install LlamaIndex and connect it to OpenAI
- How to ingest documents and build vector index query engines
- How to wrap query engines as agent tools using
QueryEngineTool - How to build a
ReActAgentthat reasons over multiple data sources - How to implement the sub-question decomposition pattern for complex queries
- How to evaluate retrieval quality with LlamaIndex's evaluation tools
Prerequisites#
- Python 3.10 or higher installed
- An OpenAI API key (for both the language model and embeddings)
- Basic understanding of AI agents and agentic RAG
- Familiarity with vector databases and embeddings at a conceptual level
Step 1: Project Setup#
mkdir llamaindex-agent-demo && cd llamaindex-agent-demo
python -m venv .venv && source .venv/bin/activate
# Core LlamaIndex packages
pip install llama-index llama-index-llms-openai llama-index-embeddings-openai python-dotenv
# For PDF loading
pip install llama-index-readers-file pypdf
# For web scraping
pip install llama-index-readers-web
Create .env:
OPENAI_API_KEY=sk-...
Then configure the global LlamaIndex settings so all components use the same model and embedding:
# settings.py
import os
from dotenv import load_dotenv
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
load_dotenv()
# Set global defaults for all LlamaIndex components
Settings.llm = OpenAI(model="gpt-4o", temperature=0.1)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.chunk_size = 512
Settings.chunk_overlap = 50
Step 2: Create Document Knowledge Bases#
LlamaIndex can ingest almost any document type. Here we create two in-memory knowledge bases using SimpleDirectoryReader and build vector indices from them.
# knowledge_base.py
import os
from pathlib import Path
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
Document,
)
import settings # Import to apply global settings
def build_product_docs_index() -> VectorStoreIndex:
"""Build a vector index from product documentation text.
In production, load from a directory of PDF/Markdown files:
documents = SimpleDirectoryReader("./docs/product/").load_data()
"""
# Mock documents representing a product knowledge base
documents = [
Document(
text="""
Product: AgentHub Pro v3.0
AgentHub Pro is an enterprise AI orchestration platform for deploying multi-agent systems at scale.
Key features:
- Deploy up to 10,000 concurrent agent instances
- Built-in observability with distributed tracing
- Role-based access control for agent permissions
- Native integrations with Salesforce, HubSpot, and Zendesk
- Supports OpenAI, Anthropic, and Google Gemini models
Pricing: Starting at $2,500/month for up to 500,000 agent runs.
""",
metadata={"source": "product_overview.md", "category": "product"},
),
Document(
text="""
AgentHub Pro Technical Architecture:
The platform uses a microservices architecture with Kubernetes orchestration.
Each agent run is isolated in a separate container with 256MB RAM by default.
The event bus uses Apache Kafka for high-throughput message passing between agents.
Storage options: PostgreSQL (relational), Weaviate (vector), Redis (cache).
API: REST and gRPC interfaces, WebSocket for streaming. OpenAPI spec available.
SLA: 99.9% uptime guarantee with 24/7 on-call support for Enterprise tier.
""",
metadata={"source": "technical_architecture.md", "category": "technical"},
),
Document(
text="""
AgentHub Pro Integration Guide:
Step 1: Generate an API key in the AgentHub dashboard under Settings > API Keys.
Step 2: Install the Python SDK: pip install agenthub-sdk
Step 3: Initialize the client: from agenthub import Client; client = Client(api_key="...")
Step 4: Deploy an agent: client.deploy(agent_config)
Step 5: Invoke an agent: response = client.run(agent_id="...", input="your query")
Webhook integration: Configure a webhook URL to receive agent completion events.
""",
metadata={"source": "integration_guide.md", "category": "integration"},
),
]
return VectorStoreIndex.from_documents(documents)
def build_financial_index() -> VectorStoreIndex:
"""Build a vector index from financial reports."""
documents = [
Document(
text="""
AgentHub Inc. Q4 2025 Financial Results:
Revenue: $47.3M (up 78% year-over-year)
Gross Margin: 72% (up from 68% in Q4 2024)
Operating Loss: -$8.2M (improving from -$14.1M in Q4 2024)
Customer Count: 1,847 enterprise customers (up 52% YoY)
Net Revenue Retention: 138%
Cash and equivalents: $185M
ARR run-rate: $189M
""",
metadata={"source": "q4_2025_earnings.md", "category": "financial"},
),
Document(
text="""
AgentHub Inc. 2026 Guidance:
Full Year Revenue Guidance: $220M - $235M (55-65% growth)
Expected Gross Margin: 73-75%
Planned headcount growth: 40% (primarily in engineering and sales)
Key investment areas: Multi-agent orchestration, enterprise security, EU data residency
Geographic expansion: EMEA and APAC markets targeted in H1 2026
""",
metadata={"source": "2026_guidance.md", "category": "financial"},
),
]
return VectorStoreIndex.from_documents(documents)
Step 3: Wrap Indices as Agent Tools#
QueryEngineTool converts any LlamaIndex query engine into a tool that a ReActAgent can call. The description is critical — it tells the agent when to use each tool.
# agent_tools.py
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from knowledge_base import build_product_docs_index, build_financial_index
def create_agent_tools():
"""Create QueryEngineTools from knowledge bases."""
# Build indices
product_index = build_product_docs_index()
financial_index = build_financial_index()
# Create query engines with retrieval configuration
product_engine = product_index.as_query_engine(
similarity_top_k=3, # Retrieve top 3 most relevant chunks
response_mode="compact",
)
financial_engine = financial_index.as_query_engine(
similarity_top_k=3,
response_mode="compact",
)
# Wrap as agent tools with descriptive metadata
tools = [
QueryEngineTool(
query_engine=product_engine,
metadata=ToolMetadata(
name="product_documentation",
description=(
"Use this tool to answer questions about AgentHub Pro product features, "
"technical architecture, API documentation, integration guides, and pricing. "
"Best for: 'How does X feature work?', 'What are the API endpoints?', "
"'How do I integrate with Salesforce?'"
),
),
),
QueryEngineTool(
query_engine=financial_engine,
metadata=ToolMetadata(
name="financial_reports",
description=(
"Use this tool to answer questions about AgentHub Inc. financial performance, "
"revenue figures, growth rates, customer metrics, and forward guidance. "
"Best for: 'What was revenue in Q4?', 'What is the NRR?', "
"'What is the 2026 guidance?'"
),
),
),
]
return tools
Step 4: Build and Run the ReActAgent#
LlamaIndex's ReActAgent implements the ReAct reasoning pattern — Reason + Act. It iteratively reasons about which tool to use, executes the tool, observes the result, and continues until it has enough information to answer.
# react_agent.py
import asyncio
import settings # Apply global settings
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
from agent_tools import create_agent_tools
def main():
tools = create_agent_tools()
# Create the ReActAgent
agent = ReActAgent.from_tools(
tools=tools,
llm=OpenAI(model="gpt-4o"),
verbose=True, # Show the reasoning steps
max_iterations=10,
)
# Test with various query types
queries = [
"What are the main features of AgentHub Pro and how much does it cost?",
"What was AgentHub's revenue growth in Q4 2025 and what is the 2026 guidance?",
"I'm evaluating AgentHub for my company. What are the technical architecture details "
"and how has the company's financial health been trending?",
]
for query in queries:
print(f"\n{'='*60}")
print(f"Query: {query}")
print('='*60)
response = agent.chat(query)
print(f"\nAnswer: {response}")
if __name__ == "__main__":
main()
The verbose output will show the agent's reasoning — which tools it chooses to call and why — as it works through the query.
Step 5: Sub-Question Decomposition for Complex Queries#
For complex questions that span multiple data sources, LlamaIndex's SubQuestionQueryEngine automatically decomposes the query into sub-questions, answers each independently, and synthesizes a final answer:
# sub_question_engine.py
import settings
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.question_gen.llm_generators import LLMQuestionGenerator
from llama_index.core.response_synthesizers import get_response_synthesizer
from agent_tools import create_agent_tools
tools = create_agent_tools()
# SubQuestionQueryEngine decomposes complex questions automatically
sub_question_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=tools,
question_gen=LLMQuestionGenerator.from_defaults(),
response_synthesizer=get_response_synthesizer(response_mode="tree_summarize"),
verbose=True,
)
# This single complex question will be split into multiple targeted sub-questions
response = sub_question_engine.query(
"Compare the technical maturity of AgentHub Pro with its business performance. "
"Does the financial trajectory justify the engineering investment described in the architecture docs?"
)
print(response)
Step 6: Persist the Vector Index#
Building indices from scratch on every run is slow and expensive. Persist them to disk:
# persist_index.py
import settings
from llama_index.core import StorageContext, load_index_from_storage
from knowledge_base import build_product_docs_index
PERSIST_DIR = "./storage/product_docs"
import os
if os.path.exists(PERSIST_DIR):
# Load from disk
storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
index = load_index_from_storage(storage_context)
print("Index loaded from disk")
else:
# Build and persist
index = build_product_docs_index()
index.storage_context.persist(persist_dir=PERSIST_DIR)
print("Index built and saved to disk")
query_engine = index.as_query_engine()
response = query_engine.query("What is the pricing for AgentHub Pro?")
print(response)
What's Next#
You have built a production-quality data-aware agent that can query multiple knowledge bases and reason across them. Recommended next steps:
- Deeper RAG: Read the agentic RAG glossary entry to understand advanced retrieval patterns like hypothetical document embeddings and re-ranking
- LangChain integration: See building an agent with LangChain which also supports RAG but with a different abstraction model
- OpenAI Agents SDK: Learn how OpenAI's SDK handles tool use differently from LlamaIndex
- MCP servers: Explore connecting your agent to MCP servers to extend LlamaIndex agents with external tool ecosystems
- Tool use patterns: Review the tool use glossary entry for a conceptual grounding in how agents select and call tools
Frequently Asked Questions#
What vector databases does LlamaIndex support?
LlamaIndex supports over 20 vector store backends including Chroma, Pinecone, Weaviate, Qdrant, Milvus, PostgreSQL with pgvector, Elasticsearch, and more. Each has a separate integration package (llama-index-vector-stores-pinecone, etc.). The API is consistent across all backends so you can swap without changing agent code.
How does LlamaIndex handle large documents that exceed the context window?
LlamaIndex chunks documents at index build time (controlled by Settings.chunk_size and Settings.chunk_overlap). At query time, only the most relevant chunks are retrieved via vector similarity, so the agent only sees the most pertinent sections regardless of document size.
What is the difference between ReActAgent and OpenAIAgent in LlamaIndex?
ReActAgent uses the universal ReAct prompting pattern and works with any LLM that supports instruction following. OpenAIAgent uses OpenAI's native function calling API which is more reliable for complex tool use but requires an OpenAI model. For production with OpenAI models, OpenAIAgent is generally recommended.
Can LlamaIndex agents maintain conversation history?
Yes. All LlamaIndex agents support a chat() method that maintains chat history internally. You can also persist history to a database using ChatMemoryBuffer with a storage backend, allowing sessions to persist across application restarts.
How does LlamaIndex relate to LangChain?
LlamaIndex specializes in data ingestion, indexing, and retrieval — it excels when your agent needs to query large document collections. LangChain has a broader scope (more agent types, more tool integrations, more chains) but a steeper learning curve. Many production applications combine both: LlamaIndex for the RAG layer and LangChain or another framework for agent orchestration.