Retell AI entered the voice agent market in 2023 with a clear thesis: the developer-focused voice agent market needed a platform that was as capable as Vapi but with simpler onboarding and cleaner pricing. By bundling telephony, STT, and TTS into a single per-minute rate — while keeping LLM costs transparent and separate — Retell AI found a pricing model that resonated with development teams evaluating the space.
Product Philosophy#
Retell AI's design choices reflect a deliberate balance between control and simplicity. The platform gives developers enough configurability to build sophisticated voice products, but it makes reasonable defaults for each component so teams do not need to evaluate every possible STT/TTS combination before getting started.
This is distinct from Vapi's approach, which is maximally configurable but requires more decisions upfront. It is also distinct from Bland AI, which trades configurability for a no-code-friendly pathway builder. Retell AI sits closer to Vapi on the technical spectrum but with a more opinionated default configuration.
Technical Architecture#
Core Pipeline Design#
Every Retell AI call runs through the following pipeline:
Incoming audio → VAD → STT (Deepgram) → LLM (your choice) → TTS (ElevenLabs or OpenAI) → Outgoing audio
The pipeline is designed around streaming — each component begins processing before the previous stage has completed. TTS begins synthesizing audio as soon as the first tokens arrive from the LLM, rather than waiting for the full response. This streaming approach is the primary mechanism behind Retell AI's sub-800ms latency claim.
Voice Activity Detection#
Retell AI's VAD system uses a combination of energy-based detection and ML-based endpoint detection to identify when a caller has finished speaking. The system is tuned to handle:
- Natural speech pauses within a sentence (do not trigger turn change)
- Brief silence at sentence endings (trigger turn change)
- Background noise and audio artifacts (do not trigger false positives)
- Interruptions (caller speaks while agent is speaking — triggers interruption handling)
VAD quality directly affects conversation naturalness. Overly sensitive VAD causes the agent to respond before the caller finishes; insufficiently sensitive VAD causes unnatural silence after the caller stops. Retell AI has tuned its VAD specifically for phone-quality audio, including the compression artifacts introduced by cellular and VOIP networks.
LLM Context Management#
Retell AI manages the conversation context passed to the LLM, which includes:
- System prompt (agent personality, goals, guardrails)
- Full conversation history (previous turns)
- Dynamic context injected at call creation time (customer data, custom variables)
- Tool call results from previous function invocations
Context management includes token budgeting — for long calls, Retell AI can be configured to summarize or truncate earlier conversation history to stay within the LLM's context window, while preserving the most relevant recent content.
Batch Calling System#
The batch calling API is one of Retell AI's most-used features for production deployments. The system architecture for batch calls:
- Job submission: API call with list of targets, call config, and optional per-target custom data
- Queue management: Retell AI queues calls and manages concurrent call limits to avoid carrier flagging
- Retry logic: Unanswered or failed calls are retried according to configurable retry schedules
- Progress tracking: Batch job status is retrievable via API with per-call status and outcomes
- Webhook delivery: Call completion events fire webhooks for each individual call in the batch
The batch calling system integrates with the same analytics infrastructure as individual calls, so batch campaign performance is visible in the same dashboard as interactive call data.
LLM Support Matrix#
| Provider | Models | Notes |
|---|---|---|
| OpenAI | GPT-4o, GPT-4o mini, GPT-4 Turbo | Best default choice for general use |
| Anthropic | Claude 3.5 Sonnet, Claude 3 Haiku | Strong for complex reasoning tasks |
| Gemini Pro, Gemini Flash | Cost-efficient at high volume | |
| Meta | Llama 3.1 70B, Llama 3.1 8B | Via compatible hosting endpoints |
| Custom | Any OpenAI-compatible endpoint | Self-hosted models, fine-tunes |
The ability to use custom endpoints means teams with fine-tuned models — trained on their specific domain vocabulary, product knowledge, or conversation patterns — can plug those models into Retell AI's voice infrastructure. This is relevant for industries with specialized terminology (medical, legal, financial) where general-purpose LLMs may underperform.
Feature Set#
Phone Calls (Inbound and Outbound)#
Retell AI provides built-in telephone numbers across US area codes and international numbers for supported regions. The platform handles:
- Inbound: Associate an agent with a number; all calls to that number are answered by the agent
- Outbound: Initiate calls via API specifying the agent, number to call, and from-number
- Warm transfer: Retell AI agents can transfer calls to human agents mid-conversation with context summary
Web Calls#
Retell AI's WebRTC integration enables browser-based voice calls. Developers embed the Retell AI web SDK in their application; users click to call and are connected to the agent via browser audio without a phone number. Useful for product demos, voice-enabled web forms, and customer portal features.
Function Calling#
Agents can invoke external tools mid-conversation. Retell AI handles the function call protocol — when the LLM invokes a tool, Retell AI pauses the conversation, fires a webhook with the function parameters, waits for your server response, and injects the result back into the LLM context. The agent continues the conversation with the updated information.
Common tool integrations: CRM lookups, calendar booking, order status checks, FAQ knowledge base queries, ticket creation.
Call Recording and Analytics#
All calls are recorded (configurable per use case). The analytics suite provides:
- Transcript with speaker diarization and timestamps
- Latency breakdown per pipeline stage (STT time, LLM time, TTS time)
- LLM cost per call (tokens used, provider cost)
- Call outcome classification (configurable custom categories)
- Batch campaign analytics (conversion rate, duration distribution, outcome breakdown)
Developer SDK#
The Python and TypeScript SDKs cover the full API surface. Key capabilities:
- Async call initiation and batch submission
- Webhook handler helpers with signature verification
- Real-time call monitoring via event streams
- Test mode for development without incurring telephony costs
Pricing Model Analysis#
Retell AI's pricing: $0.07/min covering telephony, STT, TTS, and platform. LLM costs are separate.
Comparison with alternatives:
| Platform | All-in cost estimate (per min) | LLM included |
|---|---|---|
| Retell AI | $0.07 + $0.01-0.05 LLM | No (separate) |
| Vapi | $0.05 + $0.04-0.10 providers | No (separate) |
| Bland AI | $0.09 | Yes (included) |
| ElevenLabs Conv. AI | Plan-based + per-min | Partial |
In practice, Retell AI and Vapi come out to similar all-in costs. Bland AI's higher per-minute rate reflects its included LLM costs and more complete enterprise feature set.
At 50,000 minutes/month with GPT-4o mini:
- Retell AI: $3,500 (platform) + ~$1,000 (LLM) = ~$4,500
- Vapi: $2,500 (platform) + ~$2,000 (all providers) = ~$4,500
- Bland AI: $4,500 (all-inclusive)
The total cost is similar across all three; the tradeoffs are in control, features, and operational complexity.
Comparison to Vapi#
Since Vapi is Retell AI's closest competitor, a direct comparison is warranted. See Vapi vs Retell AI for the full analysis. Key differences:
Retell AI advantages:
- Simpler onboarding (telephony included, no Twilio account needed)
- Native batch calling API with campaign management features
- Cleaner per-minute pricing (easier to estimate all-in costs)
Vapi advantages:
- More granular control over every pipeline component
- Larger community and ecosystem of integrations
- Multi-agent "Squads" feature for complex routing
- Web call support with more configuration options
Neither platform is clearly superior — the choice depends on team preferences for control vs. simplicity and specific feature requirements.
Production Deployment Patterns#
Pattern 1: Customer Service with Escalation#
A Retell AI agent handles inbound calls, resolves common issues via function calling, and escalates to human agents when complexity exceeds the agent's capability. The escalation transfers the call to a human queue with a spoken summary of what was discussed. This human-in-the-loop pattern is common in production deployments where full automation is not yet viable.
Pattern 2: High-Volume Outbound Campaigns#
The batch calling API handles lists of 10,000+ phone numbers for sales or reminder campaigns. Each call uses per-target custom data (customer name, account details, appointment time) injected at runtime. Analytics track campaign performance in near real-time.
Pattern 3: Embedded Web Voice Agent#
A web application embeds a Retell AI voice agent using the WebRTC SDK. Users can click a button to speak to an AI assistant instead of filling out a form. The agent collects information via conversation, uses function calls to process the data, and provides confirmation — all without the user needing to type.
For context on broader agentic workflows these patterns fit into, see our glossary. For more on the customer service and sales use cases in detail, see those guides.