Skip to main content
ai voiceJanuary 20, 202618 min read

Building Production AI Voice Agents with Vapi.ai (2026 Complete Guide)

Step-by-step guide to building AI voice agents with Vapi.ai. Learn the architecture, implementation patterns, and production best practices from real-world experience.

Loic Bachellerie

Senior Product Engineer

Introduction

I recently built a 24/7 AI receptionist for a plumbing business that now handles 200+ calls per month. The owner went from missing 40% of calls to capturing 98%. Total setup time: 3 hours.

This guide teaches you how to build production-ready AI voice agents using Vapi.ai, the platform that makes voice AI development actually feasible for small teams.

By the end, you'll understand:

  • The STT → LLM → TTS architecture
  • How to set up Vapi with proper function calling
  • Production patterns for error handling
  • Cost optimization strategies
  • Real deployment patterns I use

Let's dive in.

The Voice AI Architecture

Before writing code, understand the pipeline:

Voice AI Pipeline Architecture

How voice AI transforms speech into intelligent responses

Voice In
STT
LLM
Function
TTS
Vapi.ai Platform
TwilioOpenAIElevenLabsYour APICRM
Less than 1s latency

The Flow:

  1. Speech-to-Text (STT): Vapi uses Deepgram or Whisper to transcribe speech
  2. LLM Processing: OpenAI, Claude, or your custom model processes intent
  3. Function Calling: Your backend endpoints handle business logic
  4. Text-to-Speech (TTS): ElevenLabs or OpenAI converts response to voice
  5. Response: Caller hears the AI speak

Why Vapi?

  • Handles all the hard parts (latency, interruptions, speech detection)
  • Sub-1-second response times
  • Built-in function calling
  • Simple API
  • ~$0.05-0.10 per minute (vs building yourself: $50k+ dev time)

Prerequisites

Before starting, you'll need:

  • Vapi account (free tier available)
  • Twilio account (for phone numbers)
  • OpenAI API key (or Anthropic for Claude)
  • Basic knowledge of TypeScript/JavaScript
  • Backend endpoint (can be Vercel, Railway, or your server)

Time commitment: 2-4 hours for first agent.

Step 1: Setting Up Your First Assistant

Create the Assistant

// lib/vapi.ts
import { VapiClient } from "@vapi-ai/server-sdk";
 
const client = new VapiClient({ token: process.env.VAPI_API_KEY });
 
export async function createAssistant() {
  const assistant = await client.assistants.create({
    name: "Plumbing Receptionist",
    model: {
      provider: "openai",
      model: "gpt-4",
      temperature: 0.7,
      systemPrompt: `You are a professional receptionist for a plumbing company.
      
Your job:
- Greet callers warmly
- Collect their name, address, and issue description
- Determine if it's an emergency
- Schedule appointments for non-emergencies
- Capture lead information
 
Rules:
- Be concise (under 30 seconds per response)
- Always confirm details back to the caller
- If emergency, collect contact number and say a plumber will call within 15 minutes
- Never say "I don't know", offer to have a human call back`,
    },
    voice: {
      provider: "elevenlabs",
      voiceId: "21m00Tcm4TlvDq8ikWAM", // Rachel
    },
    firstMessage: "Hello! Thanks for calling. I'm here to help with your plumbing needs. What can I assist you with today?",
  });
 
  return assistant;
}

Key Configuration Explained

Model Selection:

  • gpt-4: Best quality, higher latency (~1.5s), $0.03/1K tokens
  • gpt-3.5-turbo: Faster (~0.8s), cheaper, good for simple agents
  • claude-3-sonnet: Excellent for complex reasoning

Voice Selection:

  • ElevenLabs voices sound most natural
  • OpenAI voices are cheaper but more robotic
  • Test multiple voices, it matters more than you think

System Prompt Tips:

  • Be specific about tone and length
  • Include examples of what NOT to say
  • Add guardrails for edge cases
  • Keep it under 2000 tokens

Step 2: Adding Function Calling

Functions let your AI interact with your systems. Here's a lead capture function:

// lib/vapi-functions.ts
 
interface LeadCaptureParams {
  name: string;
  phone: string;
  address: string;
  issue: string;
  isEmergency: boolean;
}
 
export async function captureLead(params: LeadCaptureParams) {
  // Save to your CRM (Airtable, HubSpot, etc.)
  await saveToCRM({
    ...params,
    source: "voice-agent",
    timestamp: new Date().toISOString(),
  });
 
  // Send notification
  if (params.isEmergency) {
    await sendEmergencyAlert(params);
  }
 
  return {
    success: true,
    message: params.isEmergency 
      ? "Emergency logged. A plumber will call you within 15 minutes."
      : "Great! I've scheduled you for tomorrow between 2-4 PM. You'll get a confirmation text.",
  };
}
 
// Vapi function schema
export const captureLeadSchema = {
  name: "captureLead",
  description: "Capture lead information and schedule appointment",
  parameters: {
    type: "object",
    properties: {
      name: {
        type: "string",
        description: "Caller's full name",
      },
      phone: {
        type: "string", 
        description: "Caller's phone number",
      },
      address: {
        type: "string",
        description: "Service address",
      },
      issue: {
        type: "string",
        description: "Brief description of plumbing issue",
      },
      isEmergency: {
        type: "boolean",
        description: "True if water damage, no water, or sewage backup",
      },
    },
    required: ["name", "phone", "issue", "isEmergency"],
  },
};

Voice Agent Function Calling

How your AI connects to real business systems

Voice Agent
Function
External Systems
CRM / Airtable
Calendar
Email / SendGrid
Slack
Sub-1 second latency

Register Functions with Assistant

// Update your assistant creation
const assistant = await client.assistants.create({
  // ... previous config
  functions: [
    {
      name: "captureLead",
      description: captureLeadSchema.description,
      parameters: captureLeadSchema.parameters,
      async: true, // Your endpoint handles this
    },
  ],
});

Create the Webhook Endpoint

// app/api/vapi/webhook/route.ts
import { NextRequest, NextResponse } from "next/server";
import { captureLead } from "@/lib/vapi-functions";
 
export async function POST(request: NextRequest) {
  const body = await request.json();
  
  // Vapi sends different message types
  const { message } = body;
 
  switch (message.type) {
    case "function-call":
      if (message.functionCall.name === "captureLead") {
        const result = await captureLead(message.functionCall.parameters);
        return NextResponse.json({
          result,
        });
      }
      break;
      
    case "end-of-call-report":
      // Log call analytics
      console.log("Call ended:", message);
      break;
      
    case "speech-update":
      // Real-time transcription (optional)
      break;
  }
 
  return NextResponse.json({ status: "ok" });
}

Step 3: Connecting a Phone Number

Buy a Number on Twilio

// lib/twilio.ts
import twilio from "twilio";
 
const client = twilio(
  process.env.TWILIO_ACCOUNT_SID,
  process.env.TWILIO_AUTH_TOKEN
);
 
export async function buyPhoneNumber(areaCode: string) {
  const numbers = await client.availablePhoneNumbers("US")
    .local
    .list({ areaCode, limit: 1 });
 
  if (numbers.length === 0) {
    throw new Error("No numbers available in this area code");
  }
 
  const purchased = await client.incomingPhoneNumbers.create({
    phoneNumber: numbers[0].phoneNumber,
    voiceUrl: "https://your-domain.com/api/vapi/inbound", // We'll create this
  });
 
  return purchased.phoneNumber;
}

Create Inbound Call Handler

// app/api/vapi/inbound/route.ts
import { NextRequest, NextResponse } from "next/server";
import twilio from "twilio";
 
const TWIML = `
<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Connect>
    <Stream url="wss://your-vapi-websocket-url">
      <Parameter name="assistantId" value="your-assistant-id" />
    </Stream>
  </Connect>
</Response>
`;
 
export async function POST(request: NextRequest) {
  // You can add call screening logic here
  // E.g., check if business hours, spam detection, etc.
  
  return new NextResponse(TWIML, {
    headers: {
      "Content-Type": "text/xml",
    },
  });
}

Using Vapi's Phone Number

Alternatively, buy directly through Vapi:

// Easier but less control
const phoneNumber = await client.phoneNumbers.create({
  provider: "twilio",
  areaCode: "555",
  assistantId: assistant.id,
});

Phone Number Setup Options

Connect your voice agent to phone lines

Bring Your Own Twilio
1Buy number on Twilio
2Configure voice URL to Vapi
3Full control over routing
~$1/monthRecommended
Vapi-Managed Numbers
1Buy through Vapi dashboard
2Zero config needed
3Included in per-minute rate
Built into per-min costQuick Start
Caller - Phone - Voice Agent - Response

Step 4: Production Patterns

Error Handling

Voice agents fail. Plan for it:

// lib/vapi-error-handler.ts
 
interface ErrorContext {
  callId: string;
  assistantId: string;
  customerPhone: string;
  error: Error;
  transcript: string;
}
 
export async function handleVoiceError(context: ErrorContext) {
  // 1. Log to monitoring
  await logError(context);
 
  // 2. Send fallback SMS
  await sendSMS({
    to: context.customerPhone,
    body: "Sorry, we had a technical issue. A human will call you back within 10 minutes.",
  });
 
  // 3. Alert team
  await sendSlackAlert({
    channel: "#voice-agent-alerts",
    text: `Voice agent failed for call ${context.callId}`,
    error: context.error.message,
  });
 
  // 4. Create task for human follow-up
  await createTask({
    type: "voice-agent-fallback",
    priority: "high",
    customerPhone: context.customerPhone,
    context: context.transcript,
  });
}

Rate Limiting & Abuse Prevention

// middleware.ts or API route
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
 
const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, "1 m"), // 10 calls per minute per number
});
 
export async function middleware(request: NextRequest) {
  const ip = request.ip ?? "127.0.0.1";
  const { success, limit, remaining } = await ratelimit.limit(ip);
 
  if (!success) {
    return new NextResponse("Rate limit exceeded", { status: 429 });
  }
 
  return NextResponse.next();
}

Cost Optimization

Voice agents can get expensive fast. Here's how I keep costs under $200/month for 200+ calls:

StrategySavingsImplementation
Use gpt-3.5 for simple queries60%Route based on intent
Cache common responses30%Redis for FAQs
Shorten prompts20%Keep under 1000 tokens
Use OpenAI voices50%vs ElevenLabs
Batch function calls15%Combine operations
// Smart routing example
const model = detectSimpleQuery(transcript) 
  ? "gpt-3.5-turbo" // $0.0015/1K tokens
  : "gpt-4";        // $0.03/1K tokens

Step 5: Testing & Monitoring

Automated Testing

// tests/vapi.test.ts
import { test, expect } from "@playwright/test";
 
test("AI receptionist captures leads correctly", async ({ page }) => {
  // Mock a call
  const call = await startTestCall({
    phoneNumber: "+15551234567",
    scenario: "emergency_leak",
  });
 
  // Simulate conversation
  await call.speak("Hi, I have a water leak in my basement");
  
  const response = await call.waitForResponse();
  expect(response).toContain("emergency");
  expect(response).toContain("15 minutes");
 
  // Verify function was called
  const lead = await getLastLead();
  expect(lead.isEmergency).toBe(true);
  expect(lead.issue).toContain("leak");
});

Monitoring Dashboard

Track these metrics:

// Key metrics to log
interface CallMetrics {
  callId: string;
  duration: number;           // seconds
  cost: number;               // USD
  transcriptLength: number;   // characters
  functionsCalled: number;
  errors: number;
  customerSatisfaction?: number; // Post-call survey
  resolved: boolean;          // Did it solve the problem?
}
 
// Alert on:
// - >5% error rate
// - >$0.50 average cost per call
// - >10s average latency
// - <80% resolution rate

Real-World Example: Plumbing Business Results

The Setup:

  • 1 phone number (local area code)
  • GPT-4 for complex issues, GPT-3.5 for simple
  • 3 functions: captureLead, scheduleAppointment, checkAvailability
  • Fallback to human after 3 errors

Results After 3 Months:

  • 247 calls handled
  • 94% captured (vs 60% before)
  • Average call duration: 2m 15s
  • Cost per call: $0.12
  • 23 after-hours emergency calls captured
  • Owner's time saved: 15 hours/week

What Worked:

  • Specific system prompt with examples
  • Emergency detection function
  • SMS fallback for failed calls
  • Daily transcript review for first 2 weeks

What Didn't:

  • Initially too verbose (fixed with "be concise" instruction)
  • Didn't capture email (added to function)
  • Confused addresses (added confirmation step)

Next Steps

Now that you understand the basics:

  1. Build your first agent: Start simple, add functions gradually
  2. Test with real calls: Have friends/family call and give feedback
  3. Monitor and iterate: Review transcripts daily at first
  4. Scale gradually: Add features as you understand the failure modes

In the next post, I'll cover advanced patterns:

  • Multi-language support
  • Custom LLM fine-tuning
  • Integration with CRM systems
  • Building voice agents that sound truly human

FAQ

Q: How much does Vapi cost? A: ~$0.05-0.15 per minute depending on configuration. A 2-minute call costs about $0.20.

Q: Can I use my own phone number? A: Yes, via Twilio integration. Or buy through Vapi directly.

Q: What if the AI doesn't understand? A: Build in fallback logic. After 2 misunderstandings, offer to transfer or take a message.

Q: Is it HIPAA compliant? A: Vapi is SOC 2 compliant. For HIPAA, you'll need a BAA and additional safeguards.

Q: Can it handle accents? A: Yes, but test with your specific customer base. Deepgram handles most accents well.


Need help implementing this for your business? Book a free 30-minute call and I'll help you design your voice agent.

Series Navigation: ← Previous: [Voice AI Architecture Overview] | Next: [Advanced Vapi Patterns: Functions & Workflows] →

Share:

Get practical engineering insights

AI voice agents, automation workflows, and shipping fast. No spam, unsubscribe anytime.