Introduction
I recently built a 24/7 AI receptionist for a plumbing business that now handles 200+ calls per month. The owner went from missing 40% of calls to capturing 98%. Total setup time: 3 hours.
This guide teaches you how to build production-ready AI voice agents using Vapi.ai, the platform that makes voice AI development actually feasible for small teams.
By the end, you'll understand:
- The STT ā LLM ā TTS architecture
- How to set up Vapi with proper function calling
- Production patterns for error handling
- Cost optimization strategies
- Real deployment patterns I use
Let's dive in.
The Voice AI Architecture
Before writing code, understand the pipeline:
Voice AI Pipeline Architecture
How voice AI transforms speech into intelligent responses
The Flow:
- Speech-to-Text (STT): Vapi uses Deepgram or Whisper to transcribe speech
- LLM Processing: OpenAI, Claude, or your custom model processes intent
- Function Calling: Your backend endpoints handle business logic
- Text-to-Speech (TTS): ElevenLabs or OpenAI converts response to voice
- Response: Caller hears the AI speak
Why Vapi?
- Handles all the hard parts (latency, interruptions, speech detection)
- Sub-1-second response times
- Built-in function calling
- Simple API
- ~$0.05-0.10 per minute (vs building yourself: $50k+ dev time)
Prerequisites
Before starting, you'll need:
- Vapi account (free tier available)
- Twilio account (for phone numbers)
- OpenAI API key (or Anthropic for Claude)
- Basic knowledge of TypeScript/JavaScript
- Backend endpoint (can be Vercel, Railway, or your server)
Time commitment: 2-4 hours for first agent.
Step 1: Setting Up Your First Assistant
Create the Assistant
// lib/vapi.ts
import { VapiClient } from "@vapi-ai/server-sdk";
const client = new VapiClient({ token: process.env.VAPI_API_KEY });
export async function createAssistant() {
const assistant = await client.assistants.create({
name: "Plumbing Receptionist",
model: {
provider: "openai",
model: "gpt-4",
temperature: 0.7,
systemPrompt: `You are a professional receptionist for a plumbing company.
Your job:
- Greet callers warmly
- Collect their name, address, and issue description
- Determine if it's an emergency
- Schedule appointments for non-emergencies
- Capture lead information
Rules:
- Be concise (under 30 seconds per response)
- Always confirm details back to the caller
- If emergency, collect contact number and say a plumber will call within 15 minutes
- Never say "I don't know", offer to have a human call back`,
},
voice: {
provider: "elevenlabs",
voiceId: "21m00Tcm4TlvDq8ikWAM", // Rachel
},
firstMessage: "Hello! Thanks for calling. I'm here to help with your plumbing needs. What can I assist you with today?",
});
return assistant;
}Key Configuration Explained
Model Selection:
gpt-4: Best quality, higher latency (~1.5s), $0.03/1K tokensgpt-3.5-turbo: Faster (~0.8s), cheaper, good for simple agentsclaude-3-sonnet: Excellent for complex reasoning
Voice Selection:
- ElevenLabs voices sound most natural
- OpenAI voices are cheaper but more robotic
- Test multiple voices, it matters more than you think
System Prompt Tips:
- Be specific about tone and length
- Include examples of what NOT to say
- Add guardrails for edge cases
- Keep it under 2000 tokens
Step 2: Adding Function Calling
Functions let your AI interact with your systems. Here's a lead capture function:
// lib/vapi-functions.ts
interface LeadCaptureParams {
name: string;
phone: string;
address: string;
issue: string;
isEmergency: boolean;
}
export async function captureLead(params: LeadCaptureParams) {
// Save to your CRM (Airtable, HubSpot, etc.)
await saveToCRM({
...params,
source: "voice-agent",
timestamp: new Date().toISOString(),
});
// Send notification
if (params.isEmergency) {
await sendEmergencyAlert(params);
}
return {
success: true,
message: params.isEmergency
? "Emergency logged. A plumber will call you within 15 minutes."
: "Great! I've scheduled you for tomorrow between 2-4 PM. You'll get a confirmation text.",
};
}
// Vapi function schema
export const captureLeadSchema = {
name: "captureLead",
description: "Capture lead information and schedule appointment",
parameters: {
type: "object",
properties: {
name: {
type: "string",
description: "Caller's full name",
},
phone: {
type: "string",
description: "Caller's phone number",
},
address: {
type: "string",
description: "Service address",
},
issue: {
type: "string",
description: "Brief description of plumbing issue",
},
isEmergency: {
type: "boolean",
description: "True if water damage, no water, or sewage backup",
},
},
required: ["name", "phone", "issue", "isEmergency"],
},
};Voice Agent Function Calling
How your AI connects to real business systems
Register Functions with Assistant
// Update your assistant creation
const assistant = await client.assistants.create({
// ... previous config
functions: [
{
name: "captureLead",
description: captureLeadSchema.description,
parameters: captureLeadSchema.parameters,
async: true, // Your endpoint handles this
},
],
});Create the Webhook Endpoint
// app/api/vapi/webhook/route.ts
import { NextRequest, NextResponse } from "next/server";
import { captureLead } from "@/lib/vapi-functions";
export async function POST(request: NextRequest) {
const body = await request.json();
// Vapi sends different message types
const { message } = body;
switch (message.type) {
case "function-call":
if (message.functionCall.name === "captureLead") {
const result = await captureLead(message.functionCall.parameters);
return NextResponse.json({
result,
});
}
break;
case "end-of-call-report":
// Log call analytics
console.log("Call ended:", message);
break;
case "speech-update":
// Real-time transcription (optional)
break;
}
return NextResponse.json({ status: "ok" });
}Step 3: Connecting a Phone Number
Buy a Number on Twilio
// lib/twilio.ts
import twilio from "twilio";
const client = twilio(
process.env.TWILIO_ACCOUNT_SID,
process.env.TWILIO_AUTH_TOKEN
);
export async function buyPhoneNumber(areaCode: string) {
const numbers = await client.availablePhoneNumbers("US")
.local
.list({ areaCode, limit: 1 });
if (numbers.length === 0) {
throw new Error("No numbers available in this area code");
}
const purchased = await client.incomingPhoneNumbers.create({
phoneNumber: numbers[0].phoneNumber,
voiceUrl: "https://your-domain.com/api/vapi/inbound", // We'll create this
});
return purchased.phoneNumber;
}Create Inbound Call Handler
// app/api/vapi/inbound/route.ts
import { NextRequest, NextResponse } from "next/server";
import twilio from "twilio";
const TWIML = `
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Connect>
<Stream url="wss://your-vapi-websocket-url">
<Parameter name="assistantId" value="your-assistant-id" />
</Stream>
</Connect>
</Response>
`;
export async function POST(request: NextRequest) {
// You can add call screening logic here
// E.g., check if business hours, spam detection, etc.
return new NextResponse(TWIML, {
headers: {
"Content-Type": "text/xml",
},
});
}Using Vapi's Phone Number
Alternatively, buy directly through Vapi:
// Easier but less control
const phoneNumber = await client.phoneNumbers.create({
provider: "twilio",
areaCode: "555",
assistantId: assistant.id,
});Phone Number Setup Options
Connect your voice agent to phone lines
Step 4: Production Patterns
Error Handling
Voice agents fail. Plan for it:
// lib/vapi-error-handler.ts
interface ErrorContext {
callId: string;
assistantId: string;
customerPhone: string;
error: Error;
transcript: string;
}
export async function handleVoiceError(context: ErrorContext) {
// 1. Log to monitoring
await logError(context);
// 2. Send fallback SMS
await sendSMS({
to: context.customerPhone,
body: "Sorry, we had a technical issue. A human will call you back within 10 minutes.",
});
// 3. Alert team
await sendSlackAlert({
channel: "#voice-agent-alerts",
text: `Voice agent failed for call ${context.callId}`,
error: context.error.message,
});
// 4. Create task for human follow-up
await createTask({
type: "voice-agent-fallback",
priority: "high",
customerPhone: context.customerPhone,
context: context.transcript,
});
}Rate Limiting & Abuse Prevention
// middleware.ts or API route
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
const ratelimit = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(10, "1 m"), // 10 calls per minute per number
});
export async function middleware(request: NextRequest) {
const ip = request.ip ?? "127.0.0.1";
const { success, limit, remaining } = await ratelimit.limit(ip);
if (!success) {
return new NextResponse("Rate limit exceeded", { status: 429 });
}
return NextResponse.next();
}Cost Optimization
Voice agents can get expensive fast. Here's how I keep costs under $200/month for 200+ calls:
| Strategy | Savings | Implementation |
|---|---|---|
| Use gpt-3.5 for simple queries | 60% | Route based on intent |
| Cache common responses | 30% | Redis for FAQs |
| Shorten prompts | 20% | Keep under 1000 tokens |
| Use OpenAI voices | 50% | vs ElevenLabs |
| Batch function calls | 15% | Combine operations |
// Smart routing example
const model = detectSimpleQuery(transcript)
? "gpt-3.5-turbo" // $0.0015/1K tokens
: "gpt-4"; // $0.03/1K tokensStep 5: Testing & Monitoring
Automated Testing
// tests/vapi.test.ts
import { test, expect } from "@playwright/test";
test("AI receptionist captures leads correctly", async ({ page }) => {
// Mock a call
const call = await startTestCall({
phoneNumber: "+15551234567",
scenario: "emergency_leak",
});
// Simulate conversation
await call.speak("Hi, I have a water leak in my basement");
const response = await call.waitForResponse();
expect(response).toContain("emergency");
expect(response).toContain("15 minutes");
// Verify function was called
const lead = await getLastLead();
expect(lead.isEmergency).toBe(true);
expect(lead.issue).toContain("leak");
});Monitoring Dashboard
Track these metrics:
// Key metrics to log
interface CallMetrics {
callId: string;
duration: number; // seconds
cost: number; // USD
transcriptLength: number; // characters
functionsCalled: number;
errors: number;
customerSatisfaction?: number; // Post-call survey
resolved: boolean; // Did it solve the problem?
}
// Alert on:
// - >5% error rate
// - >$0.50 average cost per call
// - >10s average latency
// - <80% resolution rateReal-World Example: Plumbing Business Results
The Setup:
- 1 phone number (local area code)
- GPT-4 for complex issues, GPT-3.5 for simple
- 3 functions: captureLead, scheduleAppointment, checkAvailability
- Fallback to human after 3 errors
Results After 3 Months:
- 247 calls handled
- 94% captured (vs 60% before)
- Average call duration: 2m 15s
- Cost per call: $0.12
- 23 after-hours emergency calls captured
- Owner's time saved: 15 hours/week
What Worked:
- Specific system prompt with examples
- Emergency detection function
- SMS fallback for failed calls
- Daily transcript review for first 2 weeks
What Didn't:
- Initially too verbose (fixed with "be concise" instruction)
- Didn't capture email (added to function)
- Confused addresses (added confirmation step)
Next Steps
Now that you understand the basics:
- Build your first agent: Start simple, add functions gradually
- Test with real calls: Have friends/family call and give feedback
- Monitor and iterate: Review transcripts daily at first
- Scale gradually: Add features as you understand the failure modes
In the next post, I'll cover advanced patterns:
- Multi-language support
- Custom LLM fine-tuning
- Integration with CRM systems
- Building voice agents that sound truly human
FAQ
Q: How much does Vapi cost? A: ~$0.05-0.15 per minute depending on configuration. A 2-minute call costs about $0.20.
Q: Can I use my own phone number? A: Yes, via Twilio integration. Or buy through Vapi directly.
Q: What if the AI doesn't understand? A: Build in fallback logic. After 2 misunderstandings, offer to transfer or take a message.
Q: Is it HIPAA compliant? A: Vapi is SOC 2 compliant. For HIPAA, you'll need a BAA and additional safeguards.
Q: Can it handle accents? A: Yes, but test with your specific customer base. Deepgram handles most accents well.
Need help implementing this for your business? Book a free 30-minute call and I'll help you design your voice agent.
Series Navigation: ā Previous: [Voice AI Architecture Overview] | Next: [Advanced Vapi Patterns: Functions & Workflows] ā