Vapi is purpose-built voice AI infrastructure for developers. Where some platforms offer voice agents as a no-code product, Vapi treats voice as a programmable layer — you compose the agent's behavior using your preferred LLM, TTS provider, and telephony backend, then Vapi handles the real-time orchestration.
The platform launched in late 2023 and quickly became a reference implementation for developers building production voice agents. Its WebSocket-based architecture and transparent pay-per-minute pricing model made it popular with technical teams who wanted control without building the underlying infrastructure from scratch.
Core Architecture#
WebSocket-Based Real-Time Communication#
Vapi's voice agents run over WebSocket connections, which enables true bidirectional streaming. When a user speaks, the audio streams to Vapi's servers in real time. The platform handles:
- Speech-to-Text (STT): Transcribes user audio using Deepgram (default), OpenAI Whisper, or other providers
- LLM Inference: Sends the transcript to your configured LLM with full conversation context
- Text-to-Speech (TTS): Converts the LLM response to audio using ElevenLabs, OpenAI TTS, Cartesia, or other providers
- Audio Streaming: Returns synthesized audio back to the caller in real time
This pipeline runs continuously during a conversation, with Vapi managing latency optimization between each stage.
Telephony Integration#
Vapi integrates with Twilio, Vonage, and SIP trunking providers. This means you can:
- Purchase or port phone numbers through the Vapi dashboard
- Assign assistants to inbound phone numbers
- Trigger outbound calls via API with specific assistant configurations
- Route calls based on caller ID, time of day, or custom logic
For teams already using Twilio, Vapi acts as an orchestration layer on top of your existing telephony setup.
Key Features#
LLM-Agnostic Architecture: Vapi supports OpenAI, Anthropic Claude, Google Gemini, Meta Llama, Mistral, and any OpenAI-compatible custom endpoint. This is a significant differentiator — you are not locked into a specific model, and you can switch or A/B test different LLMs without changing your integration.
Call Analytics: Every call generates detailed analytics including transcripts, latency metrics per pipeline stage, cost breakdown by provider, and custom metadata you attach at call creation time.
Webhook Events: Vapi fires webhooks for conversation events including call start, end-of-utterance, tool calls, and call completion. This lets you integrate with CRMs, ticketing systems, and data pipelines in real time.
Tool Calling / Function Calling: Vapi supports LLM function calling during conversations. You define tool schemas (e.g., "look up customer account," "book appointment"), and the LLM can invoke them mid-conversation. Vapi handles the tool execution request and injects the result back into the conversation context.
Custom Voices: Vapi supports bringing your own ElevenLabs voice, or using any of the supported TTS providers' voice libraries.
Web Calls: In addition to phone calls, Vapi supports browser-based WebRTC audio for web applications. This lets you embed a voice agent directly in a web page without a phone number.
Pricing Model#
Vapi charges a platform fee of $0.05 per minute plus pass-through costs for underlying providers:
| Cost Component | Typical Range | Provider |
|---|---|---|
| Vapi platform fee | $0.05/min | Vapi |
| LLM (GPT-4o mini) | ~$0.01-0.02/min | OpenAI |
| TTS (ElevenLabs) | ~$0.02-0.05/min | ElevenLabs |
| STT (Deepgram) | ~$0.005-0.01/min | Deepgram |
| Telephony (Twilio) | ~$0.008-0.015/min | Twilio |
A typical production call costs between $0.09 and $0.15 per minute all-in. Compare this to Bland AI ($0.09/min all-inclusive) and Retell AI ($0.07/min all-inclusive), which bundle provider costs but offer less flexibility in provider selection.
Enterprise volume discounts are available by contacting Vapi directly.
Developer Experience#
Vapi is built with developers as the primary user. The platform provides:
- REST API for creating and managing assistants, phone numbers, and calls
- Python and TypeScript SDKs with first-class support
- Dashboard for building and testing assistants visually before deploying to production
- Playground for live call testing without writing code
- Detailed Documentation covering every API parameter, webhook payload, and integration pattern
The configuration model is declarative — you define an assistant as a JSON object specifying the LLM, TTS, STT, system prompt, tools, and behavior settings. This makes it easy to version-control and deploy assistant configurations alongside your application code.
Use Cases#
Customer Support Automation: Businesses use Vapi to handle inbound support calls, gathering initial information before routing to human agents or resolving common issues fully autonomously. Learn more in our Voice AI Agents for Customer Service guide.
Sales Outreach: Sales teams use Vapi to make outbound prospecting calls, qualify leads, and schedule demos. For compliance details on outbound calling, see Voice AI Agents for Sales.
Appointment Scheduling: Healthcare, dental, and service businesses use Vapi to handle appointment booking over the phone, integrating with calendar APIs via tool calls.
IVR Replacement: Companies replace traditional Interactive Voice Response systems with Vapi-powered agents that can understand natural language instead of requiring callers to press numbered menu options.
Internal Tools: Some teams build internal voice agents for things like CRM data entry via voice, status update calls, and hands-free workflow automation.
Comparing Vapi to Alternatives#
| Feature | Vapi | Bland AI | Retell AI | ElevenLabs |
|---|---|---|---|---|
| LLM choice | Any | Limited | Any | Limited |
| Pricing model | Per-min + providers | $0.09/min all-in | $0.07/min all-in | Per-plan + per-min |
| Telephony | Twilio/Vonage/SIP | Built-in | Built-in | Third-party only |
| Batch calling | Via API | Yes | Yes | No |
| Web calls | Yes | No | Yes | Yes |
| Target user | Developers | Enterprise | Developers | All users |
See the full Voice AI Agent Platforms Compared 2026 for a comprehensive breakdown, or our focused Vapi vs Retell AI comparison for a head-to-head.
When to Choose Vapi#
Vapi is the right choice when:
- You need full control over every component of the voice stack (LLM, TTS, STT)
- Your team has engineering resources to build and maintain a custom integration
- You want to A/B test different LLMs or voice providers
- You already use Twilio and want to build on top of existing telephony infrastructure
- You need web-based voice calls in addition to phone calls
If you need a faster path to deployment with less configuration overhead, Retell AI offers similar developer-friendly features with a simpler setup. For enterprise outbound calling with scripted conversational pathways, Bland AI may be a better fit.