Statistical graphs and optimization charts representing automatic prompt tuning — Photo by Isaac Smith on Unsplash

DSPy is a Python framework for algorithmically optimizing LLM programs, developed by Omar Khattab and colleagues at Stanford University's NLP group. First published as a research paper and framework in 2023 and reaching widespread adoption through 2024-2025, DSPy takes a fundamentally different approach to working with LLMs than most frameworks. Rather than asking developers to craft prompts manually, DSPy treats prompts as learnable parameters in a program that can be optimized against a metric. The programmer defines their program's logic using declarative signatures — describing what inputs a module takes and what outputs it produces — and DSPy's compilers handle the translation into effective prompts. This paradigm shift has made DSPy essential in research environments and increasingly influential in production teams that need reproducible, systematically-improving LLM pipelines.

Key Features#

Declarative Signatures The core abstraction in DSPy is the Signature, a concise declaration of an LLM module's input and output fields with type annotations and docstrings. A signature like question: str -> answer: str tells DSPy what the module needs to do without specifying how the prompt should be worded. DSPy modules built from signatures can be composed into programs just like regular Python functions, with the framework handling prompt generation.

Automatic Prompt Optimization DSPy's optimizers (called Teleprompters) automatically tune the prompts and few-shot examples in your program to maximize performance on a metric you define. Algorithms like BootstrapFewShot, MIPRO, and BootstrapFinetune explore the space of possible prompt formulations and example sets to find configurations that outperform hand-engineered alternatives. This is DSPy's core value proposition and what distinguishes it from all other frameworks.

Module Composability DSPy programs are composed from modular building blocks: Predict for straightforward generation, ChainOfThought for step-by-step reasoning, ReAct for tool-using agents, Retrieve for retrieval-augmented generation, and more. These modules compose naturally, allowing complex multi-step reasoning programs to be built from well-tested components and optimized end-to-end.

Assertions and Constraints DSPy provides dspy.Assert and dspy.Suggest for adding constraints to LLM outputs. Assertions cause the framework to retry the LLM call if the output violates a condition, while suggestions softly nudge the model toward constraint satisfaction. This makes it easier to enforce output formats, factual constraints, or safety requirements without building custom retry logic.

Multi-Model and Multi-Provider Support DSPy supports OpenAI, Anthropic, Google, Cohere, Databricks, Ollama, and any OpenAI-compatible API endpoint. Importantly, the optimization phase and the inference phase can use different models — common practice is to optimize with a powerful model like GPT-4o, then run inference with a cheaper model like GPT-4o-mini or a fine-tuned smaller model.

Pricing#

DSPy is free and open-source under the MIT license. There are no framework fees. However, optimization runs — especially with algorithms like MIPRO that require many candidate program evaluations — can consume substantial API credits during the compilation phase. A careful cost estimate before running optimization experiments is advisable. Runtime inference costs are standard LLM API fees. Some teams use DSPy's fine-tuning optimizer to produce a custom model that reduces ongoing inference costs after the initial optimization investment.

Who It's For#

DSPy is the right choice for:

ML engineers and researchers: Teams with access to labeled datasets and a clear evaluation metric who want to systematically improve LLM program performance beyond what manual prompt engineering achieves.
RAG pipeline developers: Organizations building retrieval-augmented generation systems who want to optimize retrieval strategies, reranking, and generation together as a unified program.
Teams with reproducibility requirements: Research and production teams who need to document and reproduce LLM program performance — DSPy's compilation artifacts make this tractable.

It is less suitable for developers who need quick results without labeled data, teams without ML engineering expertise, or use cases where the overhead of optimization compilation is not justified by the performance gains.

Strengths#

Systematic prompt improvement. The ability to optimize prompts against a metric rather than relying on human intuition is genuinely transformative for teams that have struggled to improve LLM pipeline accuracy through manual iteration.

Research-grade rigor. DSPy's academic origins mean it comes with a solid theoretical foundation, reproducible benchmarks, and an active research community publishing improvements. This is rare among practical ML frameworks.

End-to-end optimization. Unlike frameworks that optimize individual prompts in isolation, DSPy can optimize entire multi-step programs end-to-end, accounting for how each module's output affects downstream module performance.

Limitations#

Requires labeled data for optimization. DSPy's optimization algorithms need examples with known correct outputs to evaluate candidate programs. Teams without such datasets cannot fully leverage the framework's core capability.

Steeper learning curve. The declarative signature paradigm and optimization concepts are different enough from conventional LLM prompting that developers accustomed to writing prompts directly may find the mental model shift challenging.

Explore the full AI Agent Tools Directory for a comprehensive look at LLM development frameworks.

Understand the ReAct reasoning pattern that DSPy's ReAct module implements
Learn about tool use in AI agents and how DSPy extends this concept
Read our AI Agents foundational overview for essential context
Compare agent frameworks in our LangChain vs CrewAI analysis
See the LangChain directory entry for an alternative pipeline framework
Explore LangGraph's graph-based approach to multi-step reasoning

Key Features#

Pricing#

Who It's For#

DSPy is the right choice for:

ML engineers and researchers: Teams with access to labeled datasets and a clear evaluation metric who want to systematically improve LLM program performance beyond what manual prompt engineering achieves.
RAG pipeline developers: Organizations building retrieval-augmented generation systems who want to optimize retrieval strategies, reranking, and generation together as a unified program.
Teams with reproducibility requirements: Research and production teams who need to document and reproduce LLM program performance — DSPy's compilation artifacts make this tractable.

Strengths#

Limitations#

Explore the full AI Agent Tools Directory for a comprehensive look at LLM development frameworks.

Understand the ReAct reasoning pattern that DSPy's ReAct module implements
Learn about tool use in AI agents and how DSPy extends this concept
Read our AI Agents foundational overview for essential context
Compare agent frameworks in our LangChain vs CrewAI analysis
See the LangChain directory entry for an alternative pipeline framework
Explore LangGraph's graph-based approach to multi-step reasoning

DSPy: AI Agent Platform Overview & Pricing 2026

Key Features#

Pricing#

Who It's For#

Strengths#

Limitations#

DSPy: AI Agent Platform Overview & Pricing 2026

Key Features#

Pricing#

Who It's For#

Strengths#

Limitations#

Key Features#

Pricing#

Who It's For#

Strengths#

Limitations#

Related Resources#

Key Features#

Pricing#

Who It's For#

Strengths#

Limitations#

Related Resources#