What Is LLM Cost per Token?#
LLM cost per token is the pricing unit used by AI model providers to charge for usage of large language models. A "token" is roughly equivalent to 4 characters or 0.75 words in English — "the quick brown fox" is approximately 4 tokens. Providers charge separately for input tokens (what you send to the model) and output tokens (what the model generates back), with output tokens consistently priced higher.
Understanding token pricing is foundational to estimating, managing, and optimizing AI agent costs at scale. A single LLM call might cost a fraction of a cent, but production agents making thousands of calls per hour can accumulate significant monthly bills.
How Token Pricing Works#
Input vs. Output Token Pricing#
The fundamental pricing split:
- Input tokens = your prompt, system instructions, conversation history, retrieved context
- Output tokens = the model's generated response
Output tokens are more expensive because text generation (autoregressive decoding) is computationally more intensive than encoding input text. The pricing ratio varies by model:
- GPT-4o: Output is 4x more expensive than input
- Claude 3.5 Sonnet: Output is 5x more expensive than input
- Gemini 1.5 Pro: Output is 4x more expensive than input
This asymmetry means that for agents producing long, detailed outputs — reports, code, elaborate plans — output costs dominate. For classification agents that return short labels, input costs dominate. Design your agent's output format accordingly.
Pricing Units#
Most providers price in "per million tokens" to avoid small decimal numbers. When you see "$2.50/1M input tokens," that means:
- 1 million tokens = $2.50
- 1,000 tokens = $0.0025
- 100 tokens = $0.00025
A typical agent interaction with 2,000 input tokens and 500 output tokens using GPT-4o costs approximately: (2,000/1,000,000 x $2.50) + (500/1,000,000 x $10.00) = $0.005 + $0.005 = $0.01 — one cent per interaction.
LLM Pricing Comparison (2026)#
Frontier Models#
| Model | Input (per 1M) | Output (per 1M) | Context Window | Notes |
|---|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | 128K | Cached input: $1.25 |
| Claude 3.5 Sonnet | $3.00 | $15.00 | 200K | Cached input: $0.30 |
| Gemini 1.5 Pro | $1.25 | $5.00 | 2M | Rates apply ≤128K |
| Gemini 1.5 Pro (long) | $2.50 | $10.00 | 2M | Rates apply >128K |
Mid-Tier Models (Best Value)#
| Model | Input (per 1M) | Output (per 1M) | Best For |
|---|---|---|---|
| GPT-4o-mini | $0.15 | $0.60 | Classification, routing, simple extraction |
| Claude 3.5 Haiku | $0.80 | $4.00 | Fast complex tasks, coding, structured output |
| Gemini 1.5 Flash | $0.075 | $0.30 | High-volume, cost-sensitive workloads |
Cost Ratio: Frontier vs. Mid-Tier#
Using GPT-4o vs GPT-4o-mini for the same input volume:
- GPT-4o input: $2.50/1M
- GPT-4o-mini input: $0.15/1M
- Cost savings: 94% cheaper for input tokens
The quality gap has narrowed significantly in 2025-2026. For straightforward tasks, mid-tier models perform comparably to frontier models at a fraction of the cost.
Prompt Caching: The Highest-Leverage Discount#
Prompt caching is where the largest cost savings for production agents come from. Both major providers now offer caching, with different mechanics:
Anthropic Prompt Caching#
Anthropic's prefix caching stores the beginning of your prompt (system instructions + stable context) and serves cache hits at approximately 10% of the normal input price — a 90% discount.
Cache write cost: 25% premium over standard input pricing on first write Cache hit cost: ~10% of standard input pricing Cache lifetime: 5 minutes (refreshed on each hit) Minimum cacheable tokens: 1,024 tokens (Haiku), 2,048 tokens (Sonnet/Opus)
Example calculation:
- System prompt: 8,000 tokens
- Normal input cost: 8,000 x $3.00/1M = $0.024 per call
- Cache hit cost: 8,000 x $0.30/1M = $0.0024 per call
- Daily savings at 1,000 calls: $21.60
OpenAI Prompt Caching#
OpenAI applies automatic prompt caching for prompts over 1,024 tokens, with no developer configuration required. Cache hits are charged at 50% of the standard input price.
Cache hit cost: 50% of standard input price Cache eligibility: Prompts ≥1,024 tokens, automatically managed Cache lifetime: Varies; frequently used prompts stay cached longer
OpenAI's caching is less dramatic than Anthropic's but requires zero implementation effort.
Which Provider Offers Better Caching?#
For agents with large, stable system prompts:
- Anthropic wins for maximum savings (90% off cache hits)
- OpenAI wins for ease of implementation (zero configuration)
- At scale with large prompts, Anthropic's deeper discount justifies the explicit cache management
Batch API Pricing#
For workloads that can tolerate latency (typically 1-24 hours), batch APIs offer significant discounts:
OpenAI Batch API#
- Discount: 50% off standard pricing
- Delivery: Within 24 hours
- Use cases: Bulk document processing, overnight analysis, content generation pipelines
- Limitation: No streaming, asynchronous only
Anthropic Message Batches#
- Discount: 50% off standard pricing
- Delivery: Within 24 hours
- Use cases: Same as OpenAI; particularly cost-effective with Claude 3.5 Sonnet
Combined with caching: Batch processing + prompt caching can reduce frontier model costs by 70-80% for suitable workloads.
Real-World Cost Scenarios#
Scenario 1: Customer Service Agent (Real-time)#
- Volume: 5,000 conversations/day
- Input: 3,000 tokens avg (system prompt + history + user message)
- Output: 500 tokens avg (response)
- Model: Claude 3.5 Sonnet with prompt caching
Uncached cost: (3,000 x $3.00 + 500 x $15.00) / 1,000,000 x 5,000
= ($9,000 + $7,500) / 1,000,000 x 5,000
= $82.50/day = ~$2,475/month
With 80% cache hit rate (2,400 cached + 600 uncached input tokens):
Cached portion: 2,400 x $0.30/1M x 5,000 = $3.60/day
Uncached portion: 600 x $3.00/1M x 5,000 = $9.00/day
Output: 500 x $15.00/1M x 5,000 = $37.50/day
Total: ~$50.10/day = ~$1,503/month
Savings: ~$972/month (39% reduction)
Scenario 2: Document Analysis Pipeline (Batch)#
- Volume: 10,000 documents/day
- Input: 8,000 tokens avg per document
- Output: 300 tokens avg (structured extract)
- Model: GPT-4o batch API
Standard GPT-4o: (8,000 x $2.50 + 300 x $10.00) / 1,000,000 x 10,000
= ($20,000 + $3,000) / 1,000,000 x 10,000 = $230/day
Batch API (50% off): $115/day = ~$3,450/month
Monitoring Token Costs in Production#
Never deploy a production agent without cost monitoring. Integrate observability from day one:
- LangFuse: Open-source, traces every LLM call with cost attribution. Self-hostable.
- Helicone: Proxy-based cost tracking with zero code changes. Shows cost per user, per model, per endpoint.
- Provider dashboards: OpenAI, Anthropic, and Google all provide usage dashboards with per-model breakdowns. Set billing alerts.
For teams optimizing costs, set up dashboards tracking: cost per task, cost per user session, model distribution, and cache hit rates.
Related Resources#
- Agent Cost Optimization Techniques
- Token Efficiency
- LLM Routing
- How Much Does It Cost to Build an AI Agent?
- Best AI Agent Observability Tools
More Resources#
Browse the complete AI agent glossary for more AI agent terminology.