What Is Few-Shot Prompting?
Few-shot prompting is one of the foundational techniques for working with large language models. By including a small number of input-output examples — "shots" — directly in your prompt, you show the model exactly what kind of output you want before asking it to process the actual input. The model learns the pattern from your examples and applies it to new inputs without any training or parameter updates.
This capability — called in-context learning — is one of the things that makes large language models distinctively powerful. They can adapt to new tasks immediately based on context provided at inference time, without the time, cost, or data requirements of fine-tuning.
For agents specifically, few-shot prompting is used to reliably produce structured tool calls, consistent output formats, and accurate classification in task-specific contexts. See how agents use it in AI Agent Tutorials and tool-calling examples in the AI Agents Guide directory. Browse all AI agent prompting concepts in the AI agents glossary.
Quick Definition#
A few-shot prompt has three parts:
- Task instruction: A description of what the model should do
- Examples: 2–8 sample inputs with their correct outputs
- Query: The actual input you want the model to process
Task: Classify customer support tickets by department.
Examples:
Input: "My password isn't working"
Output: IT
Input: "I was charged twice for my order"
Output: Billing
Input: "I want to change my delivery address"
Output: Fulfillment
Input: "When will my refund arrive?"
Output:
The model reads the examples, infers the classification pattern, and applies it to the final query.
Why Few-Shot Prompting Matters#
Rapid Adaptation Without Training#
The most important benefit is speed. Adding 3–5 examples to a prompt takes minutes. Fine-tuning a model on hundreds of examples takes hours and requires infrastructure. For many tasks, few-shot prompting achieves 80–95% of fine-tuning performance at near-zero cost.
Format and Style Control#
LLMs naturally produce varied outputs. Few-shot examples are the most reliable way to enforce specific output formats without building post-processing logic. Whether you need JSON objects, specific sentence structures, code in a particular style, or classification labels from a fixed set — examples are more reliable than instructions alone.
Handling Edge Cases#
When you add examples that cover known tricky cases, the model learns how to handle them. If your classification task has ambiguous categories, adding examples that demonstrate the correct behavior for ambiguous inputs is far more effective than trying to describe the ambiguity in instructions.
Zero-Shot vs. One-Shot vs. Few-Shot#
| Approach | Examples | Setup Cost | Reliability |
|---|---|---|---|
| Zero-shot | 0 | Lowest | Variable |
| One-shot | 1 | Low | Moderate |
| Few-shot | 2–8+ | Low-Medium | Higher |
| Fine-tuning | Hundreds+ | High | Highest |
Zero-shot prompting asks the model to perform a task based only on instructions. Modern capable models (GPT-4o, Claude 3 Sonnet, Gemini 1.5) handle many common tasks zero-shot. It's appropriate when instructions are sufficient to specify the task unambiguously.
One-shot prompting provides a single example. One example gives the model a template but doesn't demonstrate variation — useful for simple, consistent tasks.
Few-shot prompting provides multiple examples. Multiple examples demonstrate the range of acceptable inputs and outputs, making the model's behavior more robust across real-world variation.
How Few-Shot Prompting Works in AI Agents#
Tool Call Structure#
When building agents, few-shot prompting helps produce correctly formatted tool calls. If your agent calls functions with specific argument structures, showing examples of correct tool call syntax helps the model produce parseable output consistently.
Function call examples:
User: "Get the weather in New York"
Call: search_weather({"location": "New York", "units": "fahrenheit"})
User: "What's the temperature in Paris tomorrow?"
Call: search_weather({"location": "Paris", "units": "celsius", "date": "tomorrow"})
User: "Weather forecast for Tokyo this weekend"
Call:
Classification and Routing#
Agent routers — components that decide which specialized agent or tool should handle a request — use few-shot classification to route accurately. Examples demonstrate how to categorize ambiguous inputs that instructions alone would handle inconsistently.
Extraction and Parsing#
When agents need to extract structured information from unstructured text (customer data from emails, entities from documents, parameters from natural language), few-shot examples of correct extractions dramatically improve accuracy.
Reasoning Style#
Few-shot prompting can shape how the model reasons, not just what it produces. Chain-of-thought prompting — showing examples that include reasoning steps before the answer — teaches the model to think through problems rather than jump to conclusions.
Best Practices#
Example Selection#
Cover the distribution: Examples should represent the range of inputs the model will actually encounter. If 20% of your queries are a specific type, include examples of that type.
Include edge cases: Add examples for the specific inputs you know are tricky or that past prompts have handled incorrectly.
Keep examples consistent: All examples should follow the same format. Inconsistent formatting confuses the model about what's required.
Quality over quantity: 3 excellent examples outperform 8 mediocre ones. Each example should be unambiguously correct.
Ordering#
Examples near the end of the prompt (closest to the query) have the highest influence. If you have one "anchor" example that best represents the desired behavior, place it last.
Separators#
Use consistent delimiters to distinguish examples from each other and from the query:
===
Input: [example input 1]
Output: [example output 1]
===
Input: [example input 2]
Output: [example output 2]
===
Input: [your actual query]
Output:
Clear structure reduces model confusion about where examples end and the actual task begins.
Few-Shot Prompting vs. Fine-Tuning#
| Factor | Few-Shot | Fine-Tuning |
|---|---|---|
| Setup time | Minutes | Hours to days |
| Data required | 2–8 examples | 100–10,000+ examples |
| Flexibility | Change examples instantly | Requires re-training |
| Token cost | Higher per request | Lower per request |
| Maximum performance | 80–95% of fine-tuned | Highest |
| Latency | Same | Same (post-training) |
Use few-shot when: Requirements change frequently, you have limited examples, or you're still figuring out the right behavior.
Use fine-tuning when: You have clear, stable requirements, high-volume use cases where token cost matters, or you need maximum accuracy on a specific narrow task.
Limitations#
Context window consumption: Examples take up context budget. Long examples at high volume can push the query and important context out of the model's attention window.
Example leakage: Models sometimes echo example content inappropriately — using example entities or phrasings in outputs about unrelated inputs.
Sensitivity to example quality: Poor-quality examples hurt performance. Ambiguous or inconsistent examples can make performance worse than zero-shot.
Not a substitute for fine-tuning at scale: For very high-volume inference, the token cost of including examples in every request adds up. At scale, fine-tuning becomes economically attractive.
Related Terms#
- Chain-of-Thought Prompting — Adding reasoning steps to few-shot examples
- Grounding — Anchoring model outputs to factual context
- Prompt Engineering — Broader discipline of designing effective prompts
- AI Agents — Systems that use prompting as part of their reasoning loop
Frequently Asked Questions#
What is few-shot prompting in AI? Few-shot prompting is including 2–8 example input-output pairs in your prompt to show the model the pattern it should follow. The model uses these examples to understand what output format, style, or reasoning approach you want — then applies that pattern to new inputs.
How many examples do I need for few-shot prompting? Start with 3–5 examples. Most tasks see significant improvement moving from 0 to 3 examples. Beyond 5–8, additional examples provide diminishing returns unless the task has many distinct categories or edge cases to cover.
Does few-shot prompting work with all LLMs? Yes, though effectiveness varies by model capability. Larger, more capable models (GPT-4o, Claude 3 Sonnet, Gemini 1.5 Pro) generally apply few-shot examples more reliably. Smaller models may still pattern-match but with less consistency.
Is few-shot prompting the same as in-context learning? Few-shot prompting is the most common form of in-context learning. In-context learning is the broader capability of LLMs to adapt based on information provided at inference time — few-shot examples are one mechanism; system prompt instructions, retrieved documents, and conversation history are others.