How many examples are needed for few-shot prompting?

"Few-shot" typically means 2–8 examples. The optimal number varies by task complexity and model capability. More capable models often perform well with 2–3 examples; complex classification or structured extraction tasks may benefit from 5–8.

What is the difference between zero-shot, one-shot, and few-shot prompting?

Zero-shot provides no examples — just instructions. One-shot provides exactly one example. Few-shot provides two or more examples. Each approach trades setup cost against task performance; zero-shot is fastest to write but less reliable on precise format requirements.

When should I use few-shot prompting vs. fine-tuning?

Use few-shot prompting when you need quick adaptation, when examples fit in the context window, or when requirements change frequently. Use fine-tuning when you have hundreds or thousands of examples, need maximum performance on a specific task, or want to reduce token costs at high volume.

An arrow hits the bullseye on a target. — Photo by Duc Van on Unsplash

What Is Few-Shot Prompting?

Q: What is few-shot prompting?

Few-shot prompting means including a small number of example input-output pairs in your prompt before presenting the actual task. The model learns the expected pattern from these examples and applies it to the new input — without any training or fine-tuning required.

Few-shot prompting is one of the foundational techniques for working with large language models. By including a small number of input-output examples — "shots" — directly in your prompt, you show the model exactly what kind of output you want before asking it to process the actual input. The model learns the pattern from your examples and applies it to new inputs without any training or parameter updates.

This capability — called in-context learning — is one of the things that makes large language models distinctively powerful. They can adapt to new tasks immediately based on context provided at inference time, without the time, cost, or data requirements of fine-tuning.

For agents specifically, few-shot prompting is used to reliably produce structured tool calls, consistent output formats, and accurate classification in task-specific contexts. See how agents use it in AI Agent Tutorials and tool-calling examples in the AI Agents Guide directory. Browse all AI agent prompting concepts in the AI agents glossary.

Quick Definition#

A few-shot prompt has three parts:

Task instruction: A description of what the model should do
Examples: 2–8 sample inputs with their correct outputs
Query: The actual input you want the model to process

Task: Classify customer support tickets by department.

Examples:
Input: "My password isn't working"
Output: IT

Input: "I was charged twice for my order"
Output: Billing

Input: "I want to change my delivery address"
Output: Fulfillment

Input: "When will my refund arrive?"
Output:

The model reads the examples, infers the classification pattern, and applies it to the final query.

Why Few-Shot Prompting Matters#

Rapid Adaptation Without Training#

The most important benefit is speed. Adding 3–5 examples to a prompt takes minutes. Fine-tuning a model on hundreds of examples takes hours and requires infrastructure. For many tasks, few-shot prompting achieves 80–95% of fine-tuning performance at near-zero cost.

Format and Style Control#

LLMs naturally produce varied outputs. Few-shot examples are the most reliable way to enforce specific output formats without building post-processing logic. Whether you need JSON objects, specific sentence structures, code in a particular style, or classification labels from a fixed set — examples are more reliable than instructions alone.

Handling Edge Cases#

When you add examples that cover known tricky cases, the model learns how to handle them. If your classification task has ambiguous categories, adding examples that demonstrate the correct behavior for ambiguous inputs is far more effective than trying to describe the ambiguity in instructions.

Zero-Shot vs. One-Shot vs. Few-Shot#

Approach	Examples	Setup Cost	Reliability
Zero-shot	0	Lowest	Variable
One-shot	1	Low	Moderate
Few-shot	2–8+	Low-Medium	Higher
Fine-tuning	Hundreds+	High	Highest

Zero-shot prompting asks the model to perform a task based only on instructions. Modern capable models (GPT-4o, Claude 3 Sonnet, Gemini 1.5) handle many common tasks zero-shot. It's appropriate when instructions are sufficient to specify the task unambiguously.

One-shot prompting provides a single example. One example gives the model a template but doesn't demonstrate variation — useful for simple, consistent tasks.

Few-shot prompting provides multiple examples. Multiple examples demonstrate the range of acceptable inputs and outputs, making the model's behavior more robust across real-world variation.

How Few-Shot Prompting Works in AI Agents#

Tool Call Structure#

When building agents, few-shot prompting helps produce correctly formatted tool calls. If your agent calls functions with specific argument structures, showing examples of correct tool call syntax helps the model produce parseable output consistently.

Function call examples:

User: "Get the weather in New York"
Call: search_weather({"location": "New York", "units": "fahrenheit"})

User: "What's the temperature in Paris tomorrow?"
Call: search_weather({"location": "Paris", "units": "celsius", "date": "tomorrow"})

User: "Weather forecast for Tokyo this weekend"
Call:

Classification and Routing#

Agent routers — components that decide which specialized agent or tool should handle a request — use few-shot classification to route accurately. Examples demonstrate how to categorize ambiguous inputs that instructions alone would handle inconsistently.

Extraction and Parsing#

When agents need to extract structured information from unstructured text (customer data from emails, entities from documents, parameters from natural language), few-shot examples of correct extractions dramatically improve accuracy.

Reasoning Style#

Few-shot prompting can shape how the model reasons, not just what it produces. Chain-of-thought prompting — showing examples that include reasoning steps before the answer — teaches the model to think through problems rather than jump to conclusions.

Best Practices#

Example Selection#

Cover the distribution: Examples should represent the range of inputs the model will actually encounter. If 20% of your queries are a specific type, include examples of that type.

Include edge cases: Add examples for the specific inputs you know are tricky or that past prompts have handled incorrectly.

Keep examples consistent: All examples should follow the same format. Inconsistent formatting confuses the model about what's required.

Quality over quantity: 3 excellent examples outperform 8 mediocre ones. Each example should be unambiguously correct.

Ordering#

Examples near the end of the prompt (closest to the query) have the highest influence. If you have one "anchor" example that best represents the desired behavior, place it last.

Separators#

Use consistent delimiters to distinguish examples from each other and from the query:

===
Input: [example input 1]
Output: [example output 1]
===
Input: [example input 2]
Output: [example output 2]
===
Input: [your actual query]
Output:

Clear structure reduces model confusion about where examples end and the actual task begins.

Few-Shot Prompting vs. Fine-Tuning#

Factor	Few-Shot	Fine-Tuning
Setup time	Minutes	Hours to days
Data required	2–8 examples	100–10,000+ examples
Flexibility	Change examples instantly	Requires re-training
Token cost	Higher per request	Lower per request
Maximum performance	80–95% of fine-tuned	Highest
Latency	Same	Same (post-training)

Use few-shot when: Requirements change frequently, you have limited examples, or you're still figuring out the right behavior.

Use fine-tuning when: You have clear, stable requirements, high-volume use cases where token cost matters, or you need maximum accuracy on a specific narrow task.

Limitations#

Context window consumption: Examples take up context budget. Long examples at high volume can push the query and important context out of the model's attention window.

Example leakage: Models sometimes echo example content inappropriately — using example entities or phrasings in outputs about unrelated inputs.

Sensitivity to example quality: Poor-quality examples hurt performance. Ambiguous or inconsistent examples can make performance worse than zero-shot.

Not a substitute for fine-tuning at scale: For very high-volume inference, the token cost of including examples in every request adds up. At scale, fine-tuning becomes economically attractive.

Chain-of-Thought Prompting — Adding reasoning steps to few-shot examples
Grounding — Anchoring model outputs to factual context
Prompt Engineering — Broader discipline of designing effective prompts
AI Agents — Systems that use prompting as part of their reasoning loop

Frequently Asked Questions#

What is few-shot prompting in AI? Few-shot prompting is including 2–8 example input-output pairs in your prompt to show the model the pattern it should follow. The model uses these examples to understand what output format, style, or reasoning approach you want — then applies that pattern to new inputs.

How many examples do I need for few-shot prompting? Start with 3–5 examples. Most tasks see significant improvement moving from 0 to 3 examples. Beyond 5–8, additional examples provide diminishing returns unless the task has many distinct categories or edge cases to cover.

Does few-shot prompting work with all LLMs? Yes, though effectiveness varies by model capability. Larger, more capable models (GPT-4o, Claude 3 Sonnet, Gemini 1.5 Pro) generally apply few-shot examples more reliably. Smaller models may still pattern-match but with less consistency.

Is few-shot prompting the same as in-context learning? Few-shot prompting is the most common form of in-context learning. In-context learning is the broader capability of LLMs to adapt based on information provided at inference time — few-shot examples are one mechanism; system prompt instructions, retrieved documents, and conversation history are others.

What Is Few-Shot Prompting?

Quick Definition#

A few-shot prompt has three parts:

Task instruction: A description of what the model should do
Examples: 2–8 sample inputs with their correct outputs
Query: The actual input you want the model to process

Task: Classify customer support tickets by department.

Examples:
Input: "My password isn't working"
Output: IT

Input: "I was charged twice for my order"
Output: Billing

Input: "I want to change my delivery address"
Output: Fulfillment

Input: "When will my refund arrive?"
Output:

The model reads the examples, infers the classification pattern, and applies it to the final query.

Why Few-Shot Prompting Matters#

Rapid Adaptation Without Training#

Format and Style Control#

Handling Edge Cases#

Zero-Shot vs. One-Shot vs. Few-Shot#

Approach	Examples	Setup Cost	Reliability
Zero-shot	0	Lowest	Variable
One-shot	1	Low	Moderate
Few-shot	2–8+	Low-Medium	Higher
Fine-tuning	Hundreds+	High	Highest

One-shot prompting provides a single example. One example gives the model a template but doesn't demonstrate variation — useful for simple, consistent tasks.

Few-shot prompting provides multiple examples. Multiple examples demonstrate the range of acceptable inputs and outputs, making the model's behavior more robust across real-world variation.

How Few-Shot Prompting Works in AI Agents#

Tool Call Structure#

Function call examples:

User: "Get the weather in New York"
Call: search_weather({"location": "New York", "units": "fahrenheit"})

User: "What's the temperature in Paris tomorrow?"
Call: search_weather({"location": "Paris", "units": "celsius", "date": "tomorrow"})

User: "Weather forecast for Tokyo this weekend"
Call:

Classification and Routing#

Extraction and Parsing#

Reasoning Style#

Best Practices#

Example Selection#

Cover the distribution: Examples should represent the range of inputs the model will actually encounter. If 20% of your queries are a specific type, include examples of that type.

Include edge cases: Add examples for the specific inputs you know are tricky or that past prompts have handled incorrectly.

Keep examples consistent: All examples should follow the same format. Inconsistent formatting confuses the model about what's required.

Quality over quantity: 3 excellent examples outperform 8 mediocre ones. Each example should be unambiguously correct.

Ordering#

Examples near the end of the prompt (closest to the query) have the highest influence. If you have one "anchor" example that best represents the desired behavior, place it last.

Separators#

Use consistent delimiters to distinguish examples from each other and from the query:

===
Input: [example input 1]
Output: [example output 1]
===
Input: [example input 2]
Output: [example output 2]
===
Input: [your actual query]
Output:

Clear structure reduces model confusion about where examples end and the actual task begins.

Few-Shot Prompting vs. Fine-Tuning#

Factor	Few-Shot	Fine-Tuning
Setup time	Minutes	Hours to days
Data required	2–8 examples	100–10,000+ examples
Flexibility	Change examples instantly	Requires re-training
Token cost	Higher per request	Lower per request
Maximum performance	80–95% of fine-tuned	Highest
Latency	Same	Same (post-training)

Use few-shot when: Requirements change frequently, you have limited examples, or you're still figuring out the right behavior.

Use fine-tuning when: You have clear, stable requirements, high-volume use cases where token cost matters, or you need maximum accuracy on a specific narrow task.

Limitations#

Context window consumption: Examples take up context budget. Long examples at high volume can push the query and important context out of the model's attention window.

Example leakage: Models sometimes echo example content inappropriately — using example entities or phrasings in outputs about unrelated inputs.

Sensitivity to example quality: Poor-quality examples hurt performance. Ambiguous or inconsistent examples can make performance worse than zero-shot.

Not a substitute for fine-tuning at scale: For very high-volume inference, the token cost of including examples in every request adds up. At scale, fine-tuning becomes economically attractive.

Chain-of-Thought Prompting — Adding reasoning steps to few-shot examples
Grounding — Anchoring model outputs to factual context
Prompt Engineering — Broader discipline of designing effective prompts
AI Agents — Systems that use prompting as part of their reasoning loop

Term Snapshot

What Is Few-Shot Prompting?

Quick Definition#

Why Few-Shot Prompting Matters#

Rapid Adaptation Without Training#

Format and Style Control#

Handling Edge Cases#

Zero-Shot vs. One-Shot vs. Few-Shot#

How Few-Shot Prompting Works in AI Agents#

Tool Call Structure#

Classification and Routing#

Extraction and Parsing#

Reasoning Style#

Best Practices#

Example Selection#

Ordering#

Separators#

Few-Shot Prompting vs. Fine-Tuning#

Limitations#

Related Terms#

Frequently Asked Questions#

Term Snapshot

What Is Few-Shot Prompting?

Quick Definition#

Why Few-Shot Prompting Matters#

Rapid Adaptation Without Training#

Format and Style Control#

Handling Edge Cases#

Zero-Shot vs. One-Shot vs. Few-Shot#

How Few-Shot Prompting Works in AI Agents#

Tool Call Structure#

Classification and Routing#

Extraction and Parsing#

Reasoning Style#

Best Practices#

Example Selection#

Ordering#

Separators#

Few-Shot Prompting vs. Fine-Tuning#

Limitations#

Related Terms#

Frequently Asked Questions#