What is cost per token for LLMs?

LLM providers charge for API usage based on the number of tokens processed. Input tokens (your prompt + context) and output tokens (the model's response) are priced separately. Cost per token varies significantly between models and providers.

How do I calculate the cost of running an AI agent?

Multiply input tokens × input token price + output tokens × output token price. For agents, account for all LLM calls in the workflow, including tool call reasoning, intermediate steps, and final response generation. Use provider dashboards or observability tools to track actual usage.

Which LLM models have the lowest cost per token?

Smaller, faster models (GPT-4o Mini, Claude Haiku, Gemini Flash) cost 10-50x less per token than frontier models (GPT-4o, Claude 3 Opus, Gemini Pro). For agents, routing simple tasks to cheaper models can reduce costs by 80%+ without sacrificing quality on straightforward subtasks.

What Is LLM Cost per Token? (2026)

a stack of gold coins sitting next to each other — Photo by William Warby on Unsplash

What Is LLM Cost per Token?#

LLM cost per token is the pricing unit used by AI model providers to charge for usage of large language models. A "token" is roughly equivalent to 4 characters or 0.75 words in English — "the quick brown fox" is approximately 4 tokens. Providers charge separately for input tokens (what you send to the model) and output tokens (what the model generates back), with output tokens consistently priced higher.

Understanding token pricing is foundational to estimating, managing, and optimizing AI agent costs at scale. A single LLM call might cost a fraction of a cent, but production agents making thousands of calls per hour can accumulate significant monthly bills.

How Token Pricing Works#

Input vs. Output Token Pricing#

The fundamental pricing split:

Input tokens = your prompt, system instructions, conversation history, retrieved context
Output tokens = the model's generated response

Output tokens are more expensive because text generation (autoregressive decoding) is computationally more intensive than encoding input text. The pricing ratio varies by model:

GPT-4o: Output is 4x more expensive than input
Claude 3.5 Sonnet: Output is 5x more expensive than input
Gemini 1.5 Pro: Output is 4x more expensive than input

This asymmetry means that for agents producing long, detailed outputs — reports, code, elaborate plans — output costs dominate. For classification agents that return short labels, input costs dominate. Design your agent's output format accordingly.

Pricing Units#

Most providers price in "per million tokens" to avoid small decimal numbers. When you see "$2.50/1M input tokens," that means:

1 million tokens = $2.50
1,000 tokens = $0.0025
100 tokens = $0.00025

A typical agent interaction with 2,000 input tokens and 500 output tokens using GPT-4o costs approximately: (2,000/1,000,000 x $2.50) + (500/1,000,000 x $10.00) = $0.005 + $0.005 = $0.01 — one cent per interaction.

LLM Pricing Comparison (2026)#

Frontier Models#

Model	Input (per 1M)	Output (per 1M)	Context Window	Notes
GPT-4o	$2.50	$10.00	128K	Cached input: $1.25
Claude 3.5 Sonnet	$3.00	$15.00	200K	Cached input: $0.30
Gemini 1.5 Pro	$1.25	$5.00	2M	Rates apply ≤128K
Gemini 1.5 Pro (long)	$2.50	$10.00	2M	Rates apply >128K

Mid-Tier Models (Best Value)#

Model	Input (per 1M)	Output (per 1M)	Best For
GPT-4o-mini	$0.15	$0.60	Classification, routing, simple extraction
Claude 3.5 Haiku	$0.80	$4.00	Fast complex tasks, coding, structured output
Gemini 1.5 Flash	$0.075	$0.30	High-volume, cost-sensitive workloads

Cost Ratio: Frontier vs. Mid-Tier#

Using GPT-4o vs GPT-4o-mini for the same input volume:

GPT-4o input: $2.50/1M
GPT-4o-mini input: $0.15/1M
Cost savings: 94% cheaper for input tokens

The quality gap has narrowed significantly in 2025-2026. For straightforward tasks, mid-tier models perform comparably to frontier models at a fraction of the cost.

Prompt Caching: The Highest-Leverage Discount#

Prompt caching is where the largest cost savings for production agents come from. Both major providers now offer caching, with different mechanics:

Anthropic Prompt Caching#

Anthropic's prefix caching stores the beginning of your prompt (system instructions + stable context) and serves cache hits at approximately 10% of the normal input price — a 90% discount.

Cache write cost: 25% premium over standard input pricing on first write Cache hit cost: ~10% of standard input pricing Cache lifetime: 5 minutes (refreshed on each hit) Minimum cacheable tokens: 1,024 tokens (Haiku), 2,048 tokens (Sonnet/Opus)

Example calculation:

System prompt: 8,000 tokens
Normal input cost: 8,000 x $3.00/1M = $0.024 per call
Cache hit cost: 8,000 x $0.30/1M = $0.0024 per call
Daily savings at 1,000 calls: $21.60

OpenAI Prompt Caching#

OpenAI applies automatic prompt caching for prompts over 1,024 tokens, with no developer configuration required. Cache hits are charged at 50% of the standard input price.

Cache hit cost: 50% of standard input price Cache eligibility: Prompts ≥1,024 tokens, automatically managed Cache lifetime: Varies; frequently used prompts stay cached longer

OpenAI's caching is less dramatic than Anthropic's but requires zero implementation effort.

Which Provider Offers Better Caching?#

For agents with large, stable system prompts:

Anthropic wins for maximum savings (90% off cache hits)
OpenAI wins for ease of implementation (zero configuration)
At scale with large prompts, Anthropic's deeper discount justifies the explicit cache management

Finance and data charts illustrating cost comparison across LLM providers

Batch API Pricing#

For workloads that can tolerate latency (typically 1-24 hours), batch APIs offer significant discounts:

OpenAI Batch API#

Discount: 50% off standard pricing
Delivery: Within 24 hours
Use cases: Bulk document processing, overnight analysis, content generation pipelines
Limitation: No streaming, asynchronous only

Anthropic Message Batches#

Discount: 50% off standard pricing
Delivery: Within 24 hours
Use cases: Same as OpenAI; particularly cost-effective with Claude 3.5 Sonnet

Combined with caching: Batch processing + prompt caching can reduce frontier model costs by 70-80% for suitable workloads.

Real-World Cost Scenarios#

Scenario 1: Customer Service Agent (Real-time)#

Volume: 5,000 conversations/day
Input: 3,000 tokens avg (system prompt + history + user message)
Output: 500 tokens avg (response)
Model: Claude 3.5 Sonnet with prompt caching

Uncached cost: (3,000 x $3.00 + 500 x $15.00) / 1,000,000 x 5,000
= ($9,000 + $7,500) / 1,000,000 x 5,000
= $82.50/day = ~$2,475/month

With 80% cache hit rate (2,400 cached + 600 uncached input tokens):
Cached portion: 2,400 x $0.30/1M x 5,000 = $3.60/day
Uncached portion: 600 x $3.00/1M x 5,000 = $9.00/day
Output: 500 x $15.00/1M x 5,000 = $37.50/day
Total: ~$50.10/day = ~$1,503/month

Savings: ~$972/month (39% reduction)

Scenario 2: Document Analysis Pipeline (Batch)#

Volume: 10,000 documents/day
Input: 8,000 tokens avg per document
Output: 300 tokens avg (structured extract)
Model: GPT-4o batch API

Standard GPT-4o: (8,000 x $2.50 + 300 x $10.00) / 1,000,000 x 10,000
= ($20,000 + $3,000) / 1,000,000 x 10,000 = $230/day

Batch API (50% off): $115/day = ~$3,450/month

Monitoring Token Costs in Production#

Never deploy a production agent without cost monitoring. Integrate observability from day one:

LangFuse: Open-source, traces every LLM call with cost attribution. Self-hostable.
Helicone: Proxy-based cost tracking with zero code changes. Shows cost per user, per model, per endpoint.
Provider dashboards: OpenAI, Anthropic, and Google all provide usage dashboards with per-model breakdowns. Set billing alerts.

For teams optimizing costs, set up dashboards tracking: cost per task, cost per user session, model distribution, and cache hit rates.

More Resources#

Browse the complete AI agent glossary for more AI agent terminology.

What Is LLM Cost per Token?#

How Token Pricing Works#

Input vs. Output Token Pricing#

The fundamental pricing split:

Input tokens = your prompt, system instructions, conversation history, retrieved context
Output tokens = the model's generated response

Output tokens are more expensive because text generation (autoregressive decoding) is computationally more intensive than encoding input text. The pricing ratio varies by model:

GPT-4o: Output is 4x more expensive than input
Claude 3.5 Sonnet: Output is 5x more expensive than input
Gemini 1.5 Pro: Output is 4x more expensive than input

Pricing Units#

Most providers price in "per million tokens" to avoid small decimal numbers. When you see "$2.50/1M input tokens," that means:

1 million tokens = $2.50
1,000 tokens = $0.0025
100 tokens = $0.00025

LLM Pricing Comparison (2026)#

Frontier Models#

Model	Input (per 1M)	Output (per 1M)	Context Window	Notes
GPT-4o	$2.50	$10.00	128K	Cached input: $1.25
Claude 3.5 Sonnet	$3.00	$15.00	200K	Cached input: $0.30
Gemini 1.5 Pro	$1.25	$5.00	2M	Rates apply ≤128K
Gemini 1.5 Pro (long)	$2.50	$10.00	2M	Rates apply >128K

Mid-Tier Models (Best Value)#

Model	Input (per 1M)	Output (per 1M)	Best For
GPT-4o-mini	$0.15	$0.60	Classification, routing, simple extraction
Claude 3.5 Haiku	$0.80	$4.00	Fast complex tasks, coding, structured output
Gemini 1.5 Flash	$0.075	$0.30	High-volume, cost-sensitive workloads

Cost Ratio: Frontier vs. Mid-Tier#

Using GPT-4o vs GPT-4o-mini for the same input volume:

GPT-4o input: $2.50/1M
GPT-4o-mini input: $0.15/1M
Cost savings: 94% cheaper for input tokens

The quality gap has narrowed significantly in 2025-2026. For straightforward tasks, mid-tier models perform comparably to frontier models at a fraction of the cost.

Prompt Caching: The Highest-Leverage Discount#

Prompt caching is where the largest cost savings for production agents come from. Both major providers now offer caching, with different mechanics:

Anthropic Prompt Caching#

Anthropic's prefix caching stores the beginning of your prompt (system instructions + stable context) and serves cache hits at approximately 10% of the normal input price — a 90% discount.

Example calculation:

System prompt: 8,000 tokens
Normal input cost: 8,000 x $3.00/1M = $0.024 per call
Cache hit cost: 8,000 x $0.30/1M = $0.0024 per call
Daily savings at 1,000 calls: $21.60

OpenAI Prompt Caching#

OpenAI applies automatic prompt caching for prompts over 1,024 tokens, with no developer configuration required. Cache hits are charged at 50% of the standard input price.

Cache hit cost: 50% of standard input price Cache eligibility: Prompts ≥1,024 tokens, automatically managed Cache lifetime: Varies; frequently used prompts stay cached longer

OpenAI's caching is less dramatic than Anthropic's but requires zero implementation effort.

Which Provider Offers Better Caching?#

For agents with large, stable system prompts:

Anthropic wins for maximum savings (90% off cache hits)
OpenAI wins for ease of implementation (zero configuration)
At scale with large prompts, Anthropic's deeper discount justifies the explicit cache management

Finance and data charts illustrating cost comparison across LLM providers

Batch API Pricing#

For workloads that can tolerate latency (typically 1-24 hours), batch APIs offer significant discounts:

OpenAI Batch API#

Discount: 50% off standard pricing
Delivery: Within 24 hours
Use cases: Bulk document processing, overnight analysis, content generation pipelines
Limitation: No streaming, asynchronous only

Anthropic Message Batches#

Discount: 50% off standard pricing
Delivery: Within 24 hours
Use cases: Same as OpenAI; particularly cost-effective with Claude 3.5 Sonnet

Combined with caching: Batch processing + prompt caching can reduce frontier model costs by 70-80% for suitable workloads.

Real-World Cost Scenarios#

Scenario 1: Customer Service Agent (Real-time)#

Volume: 5,000 conversations/day
Input: 3,000 tokens avg (system prompt + history + user message)
Output: 500 tokens avg (response)
Model: Claude 3.5 Sonnet with prompt caching

Uncached cost: (3,000 x $3.00 + 500 x $15.00) / 1,000,000 x 5,000
= ($9,000 + $7,500) / 1,000,000 x 5,000
= $82.50/day = ~$2,475/month

With 80% cache hit rate (2,400 cached + 600 uncached input tokens):
Cached portion: 2,400 x $0.30/1M x 5,000 = $3.60/day
Uncached portion: 600 x $3.00/1M x 5,000 = $9.00/day
Output: 500 x $15.00/1M x 5,000 = $37.50/day
Total: ~$50.10/day = ~$1,503/month

Savings: ~$972/month (39% reduction)

Scenario 2: Document Analysis Pipeline (Batch)#

Volume: 10,000 documents/day
Input: 8,000 tokens avg per document
Output: 300 tokens avg (structured extract)
Model: GPT-4o batch API

Standard GPT-4o: (8,000 x $2.50 + 300 x $10.00) / 1,000,000 x 10,000
= ($20,000 + $3,000) / 1,000,000 x 10,000 = $230/day

Batch API (50% off): $115/day = ~$3,450/month

Monitoring Token Costs in Production#

Never deploy a production agent without cost monitoring. Integrate observability from day one:

LangFuse: Open-source, traces every LLM call with cost attribution. Self-hostable.
Helicone: Proxy-based cost tracking with zero code changes. Shows cost per user, per model, per endpoint.
Provider dashboards: OpenAI, Anthropic, and Google all provide usage dashboards with per-model breakdowns. Set billing alerts.

For teams optimizing costs, set up dashboards tracking: cost per task, cost per user session, model distribution, and cache hit rates.

More Resources#

Browse the complete AI agent glossary for more AI agent terminology.

Term Snapshot

What Is LLM Cost per Token?#

How Token Pricing Works#

Input vs. Output Token Pricing#

Pricing Units#

LLM Pricing Comparison (2026)#

Frontier Models#

Mid-Tier Models (Best Value)#

Cost Ratio: Frontier vs. Mid-Tier#

Prompt Caching: The Highest-Leverage Discount#

Anthropic Prompt Caching#

OpenAI Prompt Caching#

Which Provider Offers Better Caching?#

Batch API Pricing#

OpenAI Batch API#

Anthropic Message Batches#

Real-World Cost Scenarios#

Scenario 1: Customer Service Agent (Real-time)#

Scenario 2: Document Analysis Pipeline (Batch)#

Monitoring Token Costs in Production#

Related Resources#

More Resources#

Term Snapshot

What Is LLM Cost per Token?#

How Token Pricing Works#

Input vs. Output Token Pricing#

Pricing Units#

LLM Pricing Comparison (2026)#

Frontier Models#

Mid-Tier Models (Best Value)#

Cost Ratio: Frontier vs. Mid-Tier#

Prompt Caching: The Highest-Leverage Discount#

Anthropic Prompt Caching#

OpenAI Prompt Caching#

Which Provider Offers Better Caching?#

Batch API Pricing#

OpenAI Batch API#

Anthropic Message Batches#

Real-World Cost Scenarios#

Scenario 1: Customer Service Agent (Real-time)#

Scenario 2: Document Analysis Pipeline (Batch)#

Monitoring Token Costs in Production#

Related Resources#

More Resources#