The Problem Nobody Warned Me About
When I started building onSpark, I thought the hard part was getting professionals to sign up. I was wrong. The hard part was making their first match feel like it was made by someone who actually read their profile.
Manual curation does not scale. At 200 users you can eyeball compatibility. At 2,000 you are guessing. At 17,000 you are drowning. The founding team tried tags, categories, and free-text search. All three produced the same result: low-quality introductions, low reply rates, and churning users who said the platform "didn't get them."
That was the brief I inherited when I was brought in to rebuild the matching layer. This post is a complete technical walkthrough of what we built: a voice-first onboarding system backed by OpenAI embeddings, Pinecone vector search, and a multi-factor compatibility scorer. The stack runs on an Angular frontend with a Node.js backend deployed on Google Cloud Run.
By the time we shipped v2, match acceptance rates had risen from 14% to 61%. This is how we got there.
The Challenge: Matching Partners at Scale
Partnership matching is harder than job matching or dating matching for one specific reason: the compatibility surface area is enormous. Two professionals might share an industry, an audience size, and a growth goal, but one wants a co-promotion deal while the other wants a revenue share. Those are incompatible partners regardless of every other signal.
The classic approach, faceted filters plus a relevance sort, breaks down because:
- Users self-describe inconsistently. One person says "SaaS founder," another says "B2B software entrepreneur." Both are valid. Neither finds the other through keyword search.
- Goals change. Someone who signed up looking for a podcast guest now wants a joint webinar partner. Static profiles go stale immediately.
- Trust is asymmetric. A new user with 500 LinkedIn followers and a verified product launch is a better partner than an older account with 50,000 followers and no verifiable assets.
We needed a system that understood meaning, not keywords, and could rank candidates by a composite score rather than a single similarity metric. That pulled us toward embeddings and vector search from day one.
Voice-First Onboarding with Vapi.ai
The first insight that changed everything: the richest profile data does not come from forms. It comes from conversation.
When we switched from a 12-field signup form to a 7-minute AI voice interview, average profile completeness went from 38% to 91%. More importantly, the data we extracted was qualitatively different. People told the AI things they would never type into a text box.
Why Voice Outperforms Forms
Forms optimize for completion speed. Users abbreviate, skip optional fields, and paste in boilerplate. A conversational interview is different because a good follow-up question surfaces the detail that the user did not think to volunteer.
For example: a user types "I run a newsletter" into a form field. The Vapi.ai assistant hears that and asks: "What topics do you cover, and who is your typical reader?" The answer - "I write about operations for Shopify brands doing between one and five million in revenue" - is infinitely more useful for matching.
The Interview Architecture
We built the onboarding assistant on Vapi.ai because it gave us clean function-calling integration with our own backend. The assistant follows a structured but conversational interview guide covering four domains:
- Core identity - role, industry, business stage
- Partnership assets - audience, reach, existing relationships, content channels
- Partnership goals - what they want to get from collaborations in the next 90 days
- Deal preferences - revenue share appetite, time commitment, exclusivity requirements
The Vapi configuration for the assistant looks like this:
// vapi-onboarding-assistant.config.ts
import type { CreateAssistantDTO } from "@vapi-ai/server-sdk";
export const onboardingAssistantConfig: CreateAssistantDTO = {
name: "onSpark Onboarding",
firstMessage:
"Hi, I'm here to build your partnership profile. " +
"This takes about seven minutes and everything you share helps us find you better matches. " +
"Let's start - what does your business do, and who do you serve?",
model: {
provider: "openai",
model: "gpt-4o",
temperature: 0.3,
systemPrompt: ONBOARDING_SYSTEM_PROMPT,
tools: [
{
type: "function",
function: {
name: "save_profile_section",
description:
"Save a completed section of the user's partnership profile. " +
"Call this after gathering sufficient information for each domain.",
parameters: {
type: "object",
properties: {
section: {
type: "string",
enum: ["identity", "assets", "goals", "deal_preferences"],
},
data: {
type: "object",
description: "Structured data extracted from the conversation",
},
confidence: {
type: "number",
description: "0-1 confidence score for the extracted data",
},
},
required: ["section", "data", "confidence"],
},
},
},
{
type: "function",
function: {
name: "complete_onboarding",
description:
"Mark onboarding as complete and trigger profile embedding generation.",
parameters: {
type: "object",
properties: {
summary: {
type: "string",
description:
"A 2-3 sentence plain-language summary of the user's partnership profile.",
},
},
required: ["summary"],
},
},
},
],
},
voice: {
provider: "11labs",
voiceId: "21m00Tcm4TlvDq8ikWAM",
},
endCallFunctionEnabled: true,
recordingEnabled: true,
transcriptPlan: {
enabled: true,
},
};The save_profile_section function fires incrementally during the call. By the time the user hangs up, we have structured JSON covering all four domains. No post-processing required.
Extracting Structured Data from the Transcript
Even with function calling, the raw data from each section needed normalization before embedding. We ran a second LLM pass to produce a canonical profile object:
// profile-extractor.service.ts
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
interface RawProfileSection {
section: string;
data: Record<string, unknown>;
confidence: number;
}
interface NormalizedProfile {
userId: string;
industry: string;
businessStage: "idea" | "mvp" | "growth" | "scale";
audienceSize: number;
audienceDescription: string;
contentChannels: string[];
partnershipGoals: string[];
dealTypes: string[];
timeCommitmentHoursPerMonth: number;
revenueShareWilling: boolean;
summary: string;
rawSections: RawProfileSection[];
extractedAt: string;
}
export async function normalizeProfileSections(
userId: string,
sections: RawProfileSection[],
callSummary: string
): Promise<NormalizedProfile> {
const completion = await openai.chat.completions.create({
model: "gpt-4o-mini",
temperature: 0,
response_format: { type: "json_object" },
messages: [
{
role: "system",
content:
"You are a data normalization assistant. Given raw profile sections " +
"from a voice interview, produce a single canonical JSON profile object " +
"matching the specified schema exactly. Infer missing numeric fields " +
"from context where possible.",
},
{
role: "user",
content: JSON.stringify({ sections, callSummary }),
},
],
});
const parsed = JSON.parse(
completion.choices[0].message.content ?? "{}"
) as Omit<NormalizedProfile, "userId" | "rawSections" | "extractedAt">;
return {
...parsed,
userId,
rawSections: sections,
extractedAt: new Date().toISOString(),
};
}Building the Matching Engine
With normalized profiles stored in Firestore, the next layer was the matching engine itself. The design goal was simple to state and hard to execute: given a user, return the top N compatible partners ranked by a score that captures more than text similarity.
Embedding Profiles with OpenAI
The embedding input is not the raw profile JSON. That would encode field names and structure into the vector, which is noise. Instead, we render each profile into a rich natural-language passage before embedding:
// profile-passage.builder.ts
export function buildProfilePassage(profile: NormalizedProfile): string {
const goalsList = profile.partnershipGoals.join(", ");
const channelsList = profile.contentChannels.join(", ");
const dealsList = profile.dealTypes.join(", ");
return [
`${profile.businessStage} stage business in the ${profile.industry} industry.`,
`Audience: ${profile.audienceDescription} (approximately ${profile.audienceSize.toLocaleString()} people).`,
`Primary content channels: ${channelsList}.`,
`Partnership goals for the next 90 days: ${goalsList}.`,
`Open to the following deal types: ${dealsList}.`,
`Available approximately ${profile.timeCommitmentHoursPerMonth} hours per month for partnership work.`,
profile.revenueShareWilling
? "Open to revenue share arrangements."
: "Prefers non-revenue-share arrangements.",
`Summary: ${profile.summary}`,
]
.filter(Boolean)
.join(" ");
}We then embed this passage using text-embedding-3-large, which produces 3,072-dimensional vectors. For storage costs we truncate to 1,536 dimensions - the first half of the vector retains ~96% of retrieval quality in our benchmarks, and halves Pinecone storage costs.
// embedding.service.ts
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const EMBEDDING_DIMENSIONS = 1536;
export async function generateProfileEmbedding(
passage: string
): Promise<number[]> {
const response = await openai.embeddings.create({
model: "text-embedding-3-large",
input: passage,
dimensions: EMBEDDING_DIMENSIONS,
});
return response.data[0].embedding;
}Storing Vectors in Pinecone
Each Pinecone record stores the embedding alongside a metadata payload that enables pre-filtering before similarity search. Pre-filtering is critical at 17K+ records - without it, every query scans the entire index.
// pinecone.service.ts
import { Pinecone } from "@pinecone-database/pinecone";
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! });
const index = pinecone.index(process.env.PINECONE_INDEX_NAME!);
interface ProfileMetadata {
userId: string;
industry: string;
businessStage: string;
audienceSize: number;
revenueShareWilling: boolean;
dealTypes: string[];
updatedAt: number;
}
export async function upsertProfileVector(
profile: NormalizedProfile,
embedding: number[]
): Promise<void> {
const metadata: ProfileMetadata = {
userId: profile.userId,
industry: profile.industry,
businessStage: profile.businessStage,
audienceSize: profile.audienceSize,
revenueShareWilling: profile.revenueShareWilling,
dealTypes: profile.dealTypes,
updatedAt: Date.now(),
};
await index.upsert([
{
id: profile.userId,
values: embedding,
metadata,
},
]);
}
export interface MatchCandidate {
userId: string;
cosineSimilarity: number;
metadata: ProfileMetadata;
}
export async function findSimilarProfiles(
queryEmbedding: number[],
requestingUserId: string,
filters: Partial<ProfileMetadata>,
topK = 50
): Promise<MatchCandidate[]> {
const queryResponse = await index.query({
vector: queryEmbedding,
topK,
includeMetadata: true,
filter: buildPineconeFilter(filters),
});
return (queryResponse.matches ?? [])
.filter((m) => m.id !== requestingUserId)
.map((m) => ({
userId: m.id,
cosineSimilarity: m.score ?? 0,
metadata: m.metadata as ProfileMetadata,
}));
}
function buildPineconeFilter(
filters: Partial<ProfileMetadata>
): Record<string, unknown> {
const conditions: Record<string, unknown>[] = [];
if (filters.industry) {
conditions.push({ industry: { $eq: filters.industry } });
}
if (filters.audienceSize) {
conditions.push({
audienceSize: {
$gte: filters.audienceSize * 0.1,
$lte: filters.audienceSize * 10,
},
});
}
if (!conditions.length) return {};
return conditions.length === 1 ? conditions[0] : { $and: conditions };
}Scoring and Ranking
Raw cosine similarity from Pinecone is a strong signal but not the whole story. We layer three additional dimensions into a final compatibility score: goal alignment, deal compatibility, and trust index.
The System Architecture
onSpark Matching Engine Architecture
From voice call to ranked match results
Compatibility Score Computation
The final score is a weighted sum of four components:
// compatibility-scorer.ts
interface ScoringWeights {
semanticSimilarity: number;
goalAlignment: number;
dealCompatibility: number;
trustIndex: number;
}
const DEFAULT_WEIGHTS: ScoringWeights = {
semanticSimilarity: 0.4,
goalAlignment: 0.3,
dealCompatibility: 0.2,
trustIndex: 0.1,
};
interface CompatibilityResult {
userId: string;
finalScore: number;
breakdown: {
semantic: number;
goals: number;
deals: number;
trust: number;
};
explanation: string;
}
export function computeCompatibilityScore(
requester: NormalizedProfile,
candidate: MatchCandidate,
candidateProfile: NormalizedProfile,
trustScore: number,
weights: ScoringWeights = DEFAULT_WEIGHTS
): CompatibilityResult {
const semantic = candidate.cosineSimilarity;
const goals = computeGoalAlignment(requester, candidateProfile);
const deals = computeDealCompatibility(requester, candidateProfile);
const trust = trustScore;
const finalScore =
semantic * weights.semanticSimilarity +
goals * weights.goalAlignment +
deals * weights.dealCompatibility +
trust * weights.trustIndex;
return {
userId: candidate.userId,
finalScore,
breakdown: { semantic, goals, deals, trust },
explanation: buildExplanation(requester, candidateProfile, {
semantic,
goals,
deals,
trust,
}),
};
}
function computeGoalAlignment(
a: NormalizedProfile,
b: NormalizedProfile
): number {
// Jaccard similarity on goal token sets
const setA = new Set(a.partnershipGoals.map(normalizeGoalToken));
const setB = new Set(b.partnershipGoals.map(normalizeGoalToken));
const intersection = new Set([...setA].filter((g) => setB.has(g)));
const union = new Set([...setA, ...setB]);
return union.size === 0 ? 0 : intersection.size / union.size;
}
function computeDealCompatibility(
a: NormalizedProfile,
b: NormalizedProfile
): number {
const sharedDealTypes = a.dealTypes.filter((d) => b.dealTypes.includes(d));
const revenueShareConflict =
a.revenueShareWilling !== b.revenueShareWilling ? 0.3 : 0;
const timeCompatibility =
1 -
Math.abs(
a.timeCommitmentHoursPerMonth - b.timeCommitmentHoursPerMonth
) /
Math.max(a.timeCommitmentHoursPerMonth, b.timeCommitmentHoursPerMonth, 1);
const dealTypeScore =
sharedDealTypes.length /
Math.max(a.dealTypes.length, b.dealTypes.length, 1);
return Math.max(
0,
dealTypeScore * 0.5 + timeCompatibility * 0.5 - revenueShareConflict
);
}
function normalizeGoalToken(goal: string): string {
return goal.toLowerCase().trim().replace(/\s+/g, "_");
}The Trust Index
The trust index is not a reputation score - it is a signal completeness and verification score. A profile earns trust points for:
- Completing the voice onboarding (40 points)
- Verifying a LinkedIn profile via OAuth (20 points)
- Having at least one completed partnership with a positive rating (20 points)
- Providing verifiable audience metrics (newsletter subscriber count, podcast downloads) (20 points)
We normalize to 0–1 and store it in Firestore alongside the profile, refreshed on each verification event.
The Deal Flow Pipeline
Matching produces a ranked list. Converting that list into live partnerships required a deal flow pipeline that kept momentum without manual intervention.
The pipeline has four stages: Suggested (match presented to user), Interested (user signals intent), Introduced (mutual interest, introduction sent), Active (deal in progress).
We built a Cloud Scheduler job that runs every 6 hours and advances or expires deals based on response time thresholds. If a user does not act on a Suggested match within 72 hours, the match drops out of their feed and is replaced. This keeps the feed fresh and creates a mild urgency signal without dark patterns.
The introduction message is generated by GPT-4o and references specific shared assets between both parties. Early versions used a generic template. Switch to a personalized generation reduced introduction ignore rates from 44% to 18%.
// introduction-generator.service.ts
export async function generateIntroductionMessage(
sender: NormalizedProfile,
recipient: NormalizedProfile,
compatibilityResult: CompatibilityResult
): Promise<string> {
const prompt = [
"Write a warm, professional partnership introduction message.",
"The message is sent from the platform on behalf of the sender to the recipient.",
"Reference one specific shared goal and one complementary asset.",
"Keep it under 120 words. Do not use salesy language.",
"",
`Sender summary: ${sender.summary}`,
`Recipient summary: ${recipient.summary}`,
`Top compatibility reason: ${compatibilityResult.explanation}`,
].join("\n");
const completion = await openai.chat.completions.create({
model: "gpt-4o",
temperature: 0.6,
max_tokens: 200,
messages: [{ role: "user", content: prompt }],
});
return completion.choices[0].message.content?.trim() ?? "";
}Scaling Challenges
Getting from prototype to 17,000 active profiles was not a straight line. These were the three problems that cost us the most engineering time.
Embedding Latency on Onboarding Completion
When a voice call ends, the user expects to see their first matches within seconds. The embedding pipeline - normalize, build passage, embed, upsert to Pinecone - took 3–4 seconds synchronously, which felt slow in the Angular UI.
We solved this with a two-phase approach. On call end, we immediately return a loading state and show the user a "building your profile" screen. A Cloud Tasks job runs the full pipeline asynchronously. The Angular frontend polls a /onboarding-status endpoint every 2 seconds and transitions to the matches view when the job completes. Perceived latency dropped from 4 seconds to under 1 second.
Vector Index Staleness
Profiles change. A user who completed onboarding six months ago may have entirely different goals. We added a profile freshness signal to the trust index: profiles older than 90 days without an update receive a 15-point freshness penalty. This surfaces recently active users higher in results and creates a natural re-engagement prompt.
We also added a lightweight "quick update" flow - a 90-second voice check-in using the same Vapi assistant - that refreshes just the goals and deal preferences sections and triggers a re-embedding without a full onboarding repeat.
Cold Start for New Users
A brand-new user with no history and a fresh profile has no behavioral signal. Their first matches rely entirely on the semantic embedding. We found that the quality of the first 3 matches is decisive for retention - users who engaged with at least one match in the first 48 hours were 4.7x more likely to still be active at 30 days.
To improve cold-start quality, we increased the Pinecone topK to 100 for users under 7 days old, then re-ranked aggressively using goal alignment and deal compatibility. New users also receive a manual "quality gate" flag that prevents incomplete profiles (confidence below 0.7 on any section) from appearing in other users' feeds until they complete the quick-update check-in.
Results and Metrics
After six months of running v2 in production, the numbers look like this:
- 17,400 active profiles with complete embeddings in Pinecone
- Match acceptance rate: 61% (up from 14% with the filter-based system)
- Introduction reply rate: 49% (up from 22%)
- Active partnerships created: 3,200+ across the user base
- Voice onboarding completion rate: 84% (versus 61% for the form)
- Median matching latency: 340ms for a ranked list of 20 candidates (p99: 820ms)
- Embedding pipeline cost: $0.0023 per profile at current OpenAI pricing
The 340ms median latency surprised us. Pinecone's HNSW index on a pod-based deployment handles the vector search in under 80ms. The remaining time is split between fetching candidate profiles from Firestore, running the scoring functions, and generating the introduction text.
The single biggest driver of the acceptance rate improvement was not the embedding model. It was the voice onboarding data quality. When we ran an ablation where we replaced voice-derived profiles with equivalent form-filled profiles for a subset of users, acceptance rate dropped from 61% to 39%. The conversational depth of the voice data is the actual moat.
Lessons Learned
Garbage In, Garbage Out Applies to Embeddings Too
An embedding is only as good as its input. The passage builder is not a cosmetic layer - it is one of the most important components in the system. We spent two weeks iterating on passage structure before embedding quality became stable. If matches feel off, check the passage before touching the model.
Pre-filter Aggressively in Pinecone
Without metadata filters, a query against 17K records with topK=50 returns results from across the entire population. Users building a B2B SaaS newsletter audience do not benefit from being matched to e-commerce influencers. Pre-filtering on industry and audience size band before semantic similarity reduces irrelevant results and improves perceived accuracy. The cost of a pre-filtered query is nearly identical to an unfiltered one.
The Score Weights Are a Product Decision, Not an Engineering One
We spent too long treating the compatibility score weights as an engineering optimization problem. They are not. They reflect product values: how much should semantic fit matter versus deal structure fit? The answer changes based on which kind of partnership the platform is optimizing for. Talk to users, pick weights, measure outcomes, adjust. Ship faster.
Voice Is Underrated as a Data Collection Channel
The field of AI onboarding is about to change significantly. Form abandonment is a known problem. Voice abandonment at 84% completion is not. When I first proposed switching the onboarding to voice, the pushback was "users won't do it." They did. The modal length of our voice onboarding calls is 6 minutes and 42 seconds. Nobody fills out a 7-minute form.
What We Are Building Next
The current system is reactive: it matches users based on stated goals. The next version will be proactive. We are experimenting with a signal layer that ingests public activity - LinkedIn posts, podcast appearances, newsletter content - and infers unstated goals from behavioral patterns. A user who starts writing a lot about YouTube growth is probably thinking about video partnerships even if their profile still says "newsletter."
We are also exploring multi-modal embeddings that combine the voice transcript with the audio prosody features - pace, energy, pauses - as an additional signal for communication style compatibility. Early tests suggest it adds ~4 points to acceptance rate on its own.
Conclusion
Building an AI matching engine at scale is a systems problem before it is a machine learning problem. The quality of your data collection, the care in your passage construction, and the honesty of your scoring weights matter more than which embedding model you pick.
The combination of Vapi.ai for voice onboarding, OpenAI for embeddings and normalization, and Pinecone for vector storage gave us a stack that is both affordable and performant at 17K users. The architecture scales to 10x that number without structural changes.
If you are building a marketplace, a professional network, or any platform where the quality of introductions determines retention, I am convinced this approach outperforms filter-based matching at every scale above a few hundred users.
If you are working on something similar and want to compare notes on the architecture, reach out directly. I am particularly interested in conversations about trust scoring, multi-modal onboarding, and scaling Pinecone past the 100K record threshold.