Overview

This blueprint walks you through building a voice conversation web application where users speak to an AI agent powered by ElevenLabs Conversational AI. The app handles microphone input, real-time speech recognition, AI responses, and displays transcripts and closed captions.

It is designed for:

developers building voice-enabled web applications
teams integrating conversational AI into their products
anyone experimenting with AI coding agents who wants repeatable results

The focus is on real-time communication and user experience with proper error handling and state management.

What You'll Build

By the end of this blueprint, you'll have:

A Next.js web application with voice conversation capabilities
Microphone access and WebSocket connection to ElevenLabs Conversational AI
Real-time user transcript display
Real-time AI agent response display (closed captions)
Connection state management and error recovery
A polished UI with clear visual separation between user and agent content

Live Examples

See this blueprint in action:

Voice Conversation Demo — A working implementation built following this blueprint

How to Use This Blueprint

This blueprint is meant to be followed step by step using an AI coding agent such as Cursor.

Important rules:

Follow the sequence as written
Do not skip verification steps
Ignore agent suggestions that expand scope

The goal is to finish with something working, not perfect.

Pre-requisites

Accounts & Tools

An ElevenLabs account with Conversational AI access
Node.js 22+ (LTS recommended)
Chrome or Edge browser (for microphone access)
An AI-enabled code editor (Cursor recommended)

Add Documentation as Agent Context (Strongly Recommended)

Keep these docs open or add them as context in your editor:

Setup Steps

Create an ElevenLabs Conversational AI agent in the dashboard
Copy the Agent ID from agent settings
Get your ElevenLabs API key from account settings

Prompt 0 — Agent Working Agreement

You are my senior full-stack engineer working inside this repo.

Rules:
- Keep scope minimal and shippable.
- Prefer boring defaults.
- Do not add dependencies unless necessary; ask first.
- Make small, reviewable changes.
- Explain which files you touched.
- Stop after completing each task.
- Follow the blueprint sequence. Do not jump ahead.
- Use Next.js App Router with TypeScript.
- Use Tailwind CSS for styling and Shadcn UI for components.
- Use native fetch() instead of external HTTP libraries.
- Handle errors gracefully with user-friendly messages.
- Add comprehensive console logging for debugging (use emojis: 📨, ✅, ❌, ⚠️, 🔄).

Verification:

Agent acknowledges the rules
No scope expansion is proposed

Prompt 1 — Initialize Next.js Project (Agent)

Goal: Create a new Next.js 16 project with TypeScript and required dependencies.

Create a new Next.js 16 project with TypeScript using the App Router.

Requirements:
- Next.js 16+ with App Router
- React 19+ with TypeScript
- Node.js 22+

Install dependencies:
- @elevenlabs/react (use ^0.12.3 or check npm for available versions)
- tailwindcss
- postcss
- autoprefixer
- class-variance-authority
- clsx
- tailwind-merge
- tailwindcss-animate

Install dev dependencies:
- @types/node
- @types/react
- @types/react-dom
- eslint: ^9.0.0 (CRITICAL: Next.js 16.1.1+ requires ESLint 9.x)
- @eslint/eslintrc: ^3.1.0 (required for ESLint 9 flat config compatibility)
- eslint-config-next: ^16.0.0

Create .env.local.example with:
NEXT_PUBLIC_ELEVENLABS_API_KEY=<API_KEY>
NEXT_PUBLIC_ELEVENLABS_AGENT_ID=<AGENT_ID>

Update tsconfig.json with paths alias "@/*" pointing to "./*"

Initialize Tailwind CSS and Shadcn UI:
- Run: npx tailwindcss init -p
- Configure tailwind.config.ts to scan app and components directories
- Add Tailwind directives to global CSS file
- Run: npx shadcn@latest init (choose default settings)

Create ESLint 9 flat config (eslint.config.mjs):
- Import FlatCompat from @eslint/eslintrc
- Extend "next/core-web-vitals" and "next/typescript"
- Do NOT create .eslintrc.json (ESLint 9 uses flat config only)

Deliverables:

package.json, tsconfig.json, .env.local.example, Tailwind config, Shadcn setup, ESLint 9 config, basic Next.js structure

Verification:

npm install succeeds without dependency conflicts
ESLint 9.x is installed (check: npm list eslint)
eslint.config.mjs exists (NOT .eslintrc.json)
Tailwind CSS and Shadcn UI are configured
npm run dev starts without errors
http://localhost:3000 shows Next.js default page
npm run build compiles
npm run lint works without errors

Prompt 2 — Create API Route for Agent Configuration (Agent)

Goal: Create a server-side API route that provides the ElevenLabs agent ID to the client.

Create an API route that:

1. Exports an async GET function
2. Reads ELEVENLABS_AGENT_ID or NEXT_PUBLIC_ELEVENLABS_AGENT_ID from environment variables
3. Returns JSON with agentId if found
4. Returns 500 error with message if agent ID is missing

Use NextResponse from next/server. Add error handling.

Deliverables:

API route file at correct path

Verification:

File exists at correct path
Set NEXT_PUBLIC_ELEVENLABS_AGENT_ID in .env.local and restart server
API endpoint returns JSON with agentId
Removing env var returns error response

Prompt 3 — Create Main Conversation Component Structure (Agent)

Goal: Create the core React component that manages the voice conversation.

Create a new component that:

1. Uses "use client" directive
2. Imports React hooks: useState, useCallback, useEffect, useRef
3. Imports useConversation from @elevenlabs/react
4. Sets up state: hasInit (boolean), closedCaption (string), transcript (string), isInitializing (boolean), error (string | null), agentId (string | null)
5. Creates refs: hasInitRef (boolean), isIntentionallyEndingRef (boolean), isInitializingRef (boolean), initializationCompleteTimeRef (number) - use refs to avoid stale closures in callbacks and track initialization timing
6. Initializes useConversation hook with textOnly: false and empty callbacks (will be filled in later)
7. Returns a basic div with placeholder text and Tailwind classes

Add TypeScript types. Use Tailwind CSS classes for styling.

Deliverables:

Component file with Tailwind classes

Verification:

Component compiles without errors
File structure matches Next.js conventions
Tailwind classes are applied correctly
All refs are properly typed

Prompt 4 — Implement Microphone Access and Conversation Initialization (Agent)

Goal: Add functionality to request microphone access and initialize the ElevenLabs conversation.

Update the component to add:

1. Configure useConversation hook callbacks:
   - onConnect: clears error state
   - onDisconnect: receives DisconnectionDetails object with reason ("user" | "agent" | "error"). Handle error disconnections, ignore user-initiated disconnections.
   - onError: receives Error or object, extract error message and set error state
   - onStatusChange: receives { status: string } object (NOT a string). Extract status value. The status type is `"disconnected" | "connecting" | "connected" | "disconnecting"`. Only check for "connected" status. Clear error on connected status.
   - onModeChange: receives { mode: string } object (NOT a string). Extract mode value.
   - onMessage: will be added in next prompt
   - onInterruption: optional handler
   - onAudio: optional handler

2. init function that:
   - Sets isInitializingRef.current = true
   - Checks for microphone support
   - Requests microphone access using navigator.mediaDevices.getUserMedia({ audio: true })
   - Fetches agent ID from API endpoint using fetch("/api/agent")
   - Validates agent ID exists
   - Calls conversation.startSession() with:
     * agentId: fetchedAgentId (use directly, not state variable - state updates are async)
     * connectionType: "websocket"
     * NOTE: Do NOT pass apiKey for public agents. API key is only used server-side for private agents that require signed URLs or conversation tokens.
   - Sets hasInitRef.current = true, setHasInit(true), isInitializingRef.current = false, initializationCompleteTimeRef.current = Date.now() on success
   - Handles errors: NotAllowedError -> "Microphone permission denied", NotFoundError -> "No microphone found", CloseEvent -> extract close code and reason
   - Uses useCallback with proper dependencies

3. endConversation function that:
   - Sets isIntentionallyEndingRef.current = true
   - Calls conversation.endSession()
   - Resets all state and refs
   - Resets isIntentionallyEndingRef after delay

4. useEffect cleanup that:
   - In development: only cleanup if user explicitly ended OR more than 5 minutes have passed (prevents Fast Refresh from disconnecting)
   - In production: cleanup after 10 seconds grace period
   - Checks isInitializingRef.current to prevent cleanup during initialization
   - Checks isIntentionallyEndingRef.current to avoid double cleanup

Use useCallback for functions. Add proper error handling. The conversation status is derived from conversation?.status property.

Deliverables:

Updated component file

Verification:

Component compiles
Microphone permission prompt appears
Console shows "✅ Conversation connected" after granting permission
Connection status shows "Connected" immediately after session starts
Error messages display correctly
Status changes are logged in console

Prompt 5 — Handle Message Events and Display Transcripts (Agent)

Goal: Process incoming messages from ElevenLabs and display user transcripts and agent responses.

Update useConversation hook to add comprehensive onMessage callback:

1. onMessage callback with robust message parsing:
   - Log full message: console.log("📨 RAW Message received:", JSON.stringify(message, null, 2))
   - Extract role from multiple possible fields: message.role || message.type || message.sender || message.message_type || message.source || message.from || ""
   - Check nested structures: message.message?.role || message.data?.role || message.payload?.role
   - Extract content from multiple fields: message.content || message.text || message.message || message.transcript || message.data || ""
   - Check boolean flags: message.is_user, message.is_agent, message.from_user, message.from_agent
   
   - Determine if user message (check role lowercase: "user", "user_message", "user_transcript", "transcript", "human", OR is_user === true, OR from_user === true, OR source === "user")
   - Determine if agent message (check role lowercase: "agent", "assistant", "agent_message", "agent_response", "ai", "bot", OR is_agent === true, OR from_agent === true, OR source === "agent"/"assistant")
   
   - If user message: update transcript state (handle interim vs final transcripts)
   - If agent message: update closedCaption state (handle interim vs final responses)
   - Log classification: console.log("🔍 Classification - isUser:", isUserMessage, "isAgent:", isAgentMessage)
   - NEVER default unknown messages to transcript - this causes all messages to appear as user input
   - If message type unknown but has content, log it but don't update state (or try to infer from context)

2. Update component JSX to display:
   - closedCaption and transcript in clearly separated sections
   - Connection status with visual indicators
   - Always show transcript area (even when empty) with placeholder text
   - Use distinct colors: blue for user input, green for agent response
   - Add visual dividers between sections
   - Use Tailwind classes for styling (text sizes, colors, spacing, gradients, borders)

Add comprehensive logging throughout for debugging.

Deliverables:

Updated component with robust message parsing and Tailwind styling

Verification:

Speaking shows transcript in real-time in user input section
Agent responses appear in agent response section (NOT in user input)
Console logs show detailed message parsing
Messages are correctly classified as user vs agent
Styling clearly separates user and agent sections
Transcript area is always visible

Prompt 6 — Add Start Button and Initialization Flow (Agent)

Goal: Create a UI that allows users to start the conversation with a button click.

Update component to:

1. Show "Start Conversation" button when hasInit is false
2. Button calls init() when clicked
3. Show loading state while initializing (use isInitializing state)
4. Hide button and show conversation UI when hasInit is true
5. Display connection status with visual indicators (pulsing dot when connected)
6. Show transcript and agent response sections even when empty (with placeholders)

Use Shadcn Button component for the start button. Add Tailwind classes for loading indicator, status indicator, and conversation layout. Install Shadcn components as needed: `npx shadcn@latest add button`

Deliverables:

Updated component with Shadcn Button and Tailwind styling

Verification:

Button appears on initial load
Clicking button requests microphone permission
Button shows loading state during initialization
Button disappears after successful connection
Connection status displays correctly
Shadcn Button component is installed and working

Prompt 7 — Update Home Page to Use Component (Agent)

Goal: Integrate the VoiceConversation component into the main page.

Update the home page to:

1. Import VoiceConversation component
2. Render it in the main element
3. Remove default Next.js content

Use Tailwind classes on the main element to center content, add spacing, and make it full viewport height (flex, items-center, justify-center, min-h-screen).

Deliverables:

Updated page with Tailwind classes

Verification:

Home page shows component
Layout is centered and responsive using Tailwind
No console errors

Prompt 8 — Add Environment Variable Validation (Agent)

Goal: Ensure required environment variables are set before runtime.

Add runtime validation:

1. In API route, check for agent ID and return error if missing
2. In component, validate agent ID exists before starting
3. Show clear error message with instructions about setting env vars

Create .env.local from .env.local.example with actual values. Document setup in README.md.

Deliverables:

Updated API route and component with validation
.env.local file and updated README.md

Verification:

Missing env vars show clear error messages
Valid env vars allow conversation to start
README explains setup process

Prompt 9 — Add Error Recovery and Reconnection (Agent)

Goal: Handle connection errors gracefully and allow users to retry.

Add error recovery:

1. Track connection errors in state (already have error state)
2. Show "Reconnect" button if connection fails
3. Allow users to restart conversation after errors
4. Clear previous state when restarting (use resetState function)
5. Handle WebSocket disconnections gracefully (use hasInitRef to check if was connected)
6. Use refs in callbacks to avoid stale closures

Add UI for error messages, retry button, and connection status indicator. Use Shadcn Alert for error messages and Button for retry. Install components as needed: `npx shadcn@latest add alert button`

Deliverables:

Updated component with Shadcn Alert and Tailwind styling

Verification:

Errors display user-friendly messages
Retry button appears on errors
Clicking retry reinitializes conversation
Disconnections are handled gracefully
Shadcn Alert component displays errors correctly

Prompt 10 — Polish UI and Add Final Touches (Agent)

Goal: Improve visual design and user experience.

Enhance the UI:

1. Add smooth transitions using Tailwind transition classes
2. Style closed captions for readability (text-xl or text-2xl, good contrast colors)
3. Style transcripts differently from agent responses:
   - User input: Blue gradient background, blue borders, microphone icon
   - Agent response: Green/emerald gradient background, green borders, info icon
   - Clear visual divider between sections
4. Add visual feedback for listening state:
   - Pulsing indicators when listening
   - Connection status badges
   - Icons for each section
5. Use Shadcn Button variants for hover/disabled states
6. Make layout responsive using Tailwind responsive classes (sm:, md:, lg:)
7. Always show both transcript and agent response sections (even when empty) with placeholders
8. Add loading state during initialization (use Button loading state or Shadcn Spinner if needed)

Use Tailwind utility classes for colors and spacing. Ensure accessibility with proper ARIA labels and semantic HTML.

Deliverables:

Updated component with polished Tailwind styling and Shadcn components

Verification:

UI looks polished with Tailwind classes
User and agent sections are clearly visually separated
Responsive on mobile using Tailwind breakpoints
Transitions are smooth with Tailwind transitions
Accessibility basics met (ARIA labels, semantic HTML)
Shadcn components are properly styled
Both sections always visible with appropriate placeholders

Known Pitfalls

ESLint Configuration: Next.js 16.1.1+ requires ESLint 9.x, but ESLint 8.x causes dependency conflicts. Install eslint@^9.0.0 and @eslint/eslintrc@^3.1.0. Use ESLint 9 flat config (eslint.config.mjs) instead of .eslintrc.json.
ElevenLabs React Package: Use @elevenlabs/react@^0.12.3 (not @elevenlabs/client). The React package provides a useConversation hook that's easier to use in React components. Check npm registry for latest version.
Microphone Permissions: Browsers require HTTPS for microphone access (except localhost). Use localhost for development, HTTPS for production.
Environment Variables: Next.js requires NEXT_PUBLIC_ prefix for client-side access. Use NEXT_PUBLIC_ELEVENLABS_API_KEY for client-side SDK initialization.
WebSocket Connection: For public agents, API key is NOT needed in startSession(). Only pass agentId and connectionType: "websocket". API key is only used server-side for private agents that require signed URLs or conversation tokens. Connection may fail silently if agent ID is invalid - check browser console for WebSocket errors.
Connection Status Detection: onStatusChange receives an object { status: string }, NOT a string directly. Extract the status value: const statusValue = prop?.status || ''. The actual status type from @elevenlabs/react is "disconnected" | "connecting" | "connected" | "disconnecting" - it does NOT include "listening", "active", "ready", or "open". Only check for "connected" status. The conversation status is available via conversation?.status property. Similarly, onModeChange receives { mode: string } object.
React State Updates in Callbacks: Callbacks may capture stale state values due to closures. Use refs for values needed in callbacks (like hasInitRef, isIntentionallyEndingRef, isInitializingRef). Update both ref and state when values change. When calling startSession(), use the fetched agent ID directly instead of state variable, as state updates are asynchronous.
Message Parsing for User vs Agent: ElevenLabs SDK message format may vary. All messages might be routed to transcript if role detection fails. Implement comprehensive message parsing checking multiple fields and nested structures. Check boolean flags (is_user, is_agent, from_user, from_agent). CRITICAL: Never default unknown messages to transcript - this causes all messages to appear as user input. Log all messages for debugging: JSON.stringify(message, null, 2).
TypeScript Types: ElevenLabs SDK types may be incomplete. Use any type for message callbacks if needed, add type guards.
UI/UX Best Practices: Always show transcript and agent response sections (even when empty) with placeholder text. Use distinct colors and visual separation (blue for user, green for agent). Add visual feedback (pulsing indicators, status badges). Update UI optimistically, then refine based on callbacks.
React Strict Mode / Fast Refresh: In development, React Strict Mode and Fast Refresh can cause components to unmount/remount, triggering cleanup effects. Be very conservative about cleanup in development mode - only cleanup if user explicitly ended conversation OR more than 5 minutes have passed. In production, use a 10-second grace period. Track initialization time with initializationCompleteTimeRef to implement grace periods.
Disconnect Handling: onDisconnect receives DisconnectionDetails object with reason: "user" | "agent" | "error". Only show error messages for "error" reason or "agent" reason with error close codes. User-initiated disconnections are normal and shouldn't show errors.
Production Code: Remove excessive console logging in production. Keep only essential error logging (console.error). The blueprint includes comprehensive logging for debugging, but clean it up before production.

Definition of Done

Users can start conversations with a button click
Microphone permission works correctly
WebSocket connects and shows "Connected" status immediately
User transcripts appear in real-time in the user input section
Agent responses appear in the agent response section (NOT mixed with user input)
Errors are handled gracefully with retry options
UI is responsive and clearly separates user vs agent content
Environment variables are configured and validated
App builds and runs successfully
ESLint passes without errors
Test microphone permission denial and no microphone scenarios
Test conversation start/stop multiple times and error recovery
Verify user messages appear in user section, agent messages in agent section
Test on mobile device and different browsers
Verify environment variable validation
Verify ESLint 9 configuration works
Check console logs show proper message classification

Changelog

v1.0 — Initial release
- Core voice conversation functionality
- ElevenLabs SDK integration
- UI with transcripts and captions
- Error handling and reconnection