cover
AIGen AIHugging Facellama-indexOpenAIPythonvoce-aiRAGagents

How I Built a Talking, Knowledgeable AI Sidekick (and How You Can too build a Voice AI RAG agent )

Cover
Cogniwerk-image.png
Slug
voice-ai-rag-agent
Published
Published
Date
Jun 30, 2025
Category
AI
Gen AI
Hugging Face
llama-index
OpenAI
Python
voce-ai
RAG
agents

A Story of Code and a Chatty Voice AI agent That Actually Knows Stuff from your docs

Chapter 1: The Dream

It all started on a rainy afternoon. I was talking to my computer (as one does when he is a remote worker), and realized:
Wouldn’t it be cool if my computer could actually listen, understand, and answer me with real knowledge from my own files?
Not just “Hey Siri, what’s the weather?” but “Hey AI, what’s in my project docs?” or “Remind me what the HR policy says about bringing cats to work?”
And so, the quest began:
I would build a Voice AI RAG agent!
(That’s Retrieval-Augmented Generation, but let’s just call it “RAG” because it sounds like a pirate.)

Chapter 2: The Ingredients

Before you can summon your own digital sidekick, you’ll need a few magical artifacts:
  • Python 3.11+ (the spellbook)
  • Cartesia (for making your AI talk like a human, not a fax machine)
  • AssemblyAI (so your AI can understand your voice, even if you mumble)
  • Anthropic Claude (the brain—OpenAI is cool, but Claude is the new wizard in town)
  • LiveKit (for real-time voice rooms, so your AI can join you in a virtual “room”)
  • A pile of your own documents (so your AI knows your world)
  • API keys (the secret runes—don’t lose them!)

Chapter 3: The Spell (a.k.a. The Code)

Here’s the full incantation. Don’t worry, I’ll explain every part after you read it.
(Copy, paste, and prepare to be amazed!)
import logging import os from dotenv import load_dotenv from livekit.agents import JobContext, JobProcess, WorkerOptions, cli from livekit.agents.job import AutoSubscribe from livekit.agents.llm import ( ChatContext, ) from livekit.agents.pipeline import VoicePipelineAgent from livekit.plugins import cartesia, silero, llama_index, assemblyai load_dotenv() logger = logging.getLogger("voice-assistant") from llama_index.llms.anthropic import Anthropic from llama_index.core import ( SimpleDirectoryReader, StorageContext, VectorStoreIndex, load_index_from_storage, Settings ) from llama_index.core.chat_engine.types import ChatMode from llama_index.embeddings.huggingface import HuggingFaceEmbedding load_dotenv() # Set up the embedding model and LLM embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5") llm = Anthropic(model="claude-3-haiku-20240307", max_tokens=512) Settings.llm = llm Settings.embed_model = embed_model # check if storage already exists PERSIST_DIR = "./chat-engine-storage" if not os.path.exists(PERSIST_DIR): # load the documents and create the index documents = SimpleDirectoryReader("docs").load_data() index = VectorStoreIndex.from_documents(documents) # store it for later index.storage_context.persist(persist_dir=PERSIST_DIR) else: # load the existing index storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR) index = load_index_from_storage(storage_context) def prewarm(proc: JobProcess): proc.userdata["vad"] = silero.VAD.load() async def entrypoint(ctx: JobContext): chat_context = ChatContext().append( role="system", text=( "You are a funny, witty assistant." "Respond with short and concise answers. Avoid using unpronouncable punctuation or emojis." ), ) chat_engine = index.as_chat_engine(chat_mode=ChatMode.CONTEXT) logger.info(f"Connecting to room {ctx.room.name}") await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) participant = await ctx.wait_for_participant() logger.info(f"Starting voice assistant for participant {participant.identity}") agent = VoicePipelineAgent( vad=ctx.proc.userdata["vad"], stt=assemblyai.STT(), llm=llama_index.LLM(chat_engine=chat_engine), tts=cartesia.TTS( model="sonic-2", voice="bf0a246a-8642-498a-9950-80c35e9276b5", ), chat_ctx=chat_context, ) agent.start(ctx.room, participant) await agent.say( "Hey there! How can I help you today?", allow_interruptions=True, ) if __name__ == "__main__": print("Starting voice agent with Anthropic...") cli.run_app( WorkerOptions( entrypoint_fnc=entrypoint, prewarm_fnc=prewarm, ), )

Chapter 4: The Magic Explained

Let’s break down this spellbook, line by line:
1. Imports and Setup
We import all the libraries:
  • livekit for voice rooms
  • cartesia for text-to-speech
  • assemblyai for speech-to-text
  • llama_index for RAG (so your AI can actually know things from your docs)
  • Anthropic for the LLM (the brain)
We also load environment variables with dotenv—because hardcoding API keys is a rookie mistake.
2. Embeddings and LLM
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5") llm = Anthropic(model="claude-3-haiku-20240307", max_tokens=512) Settings.llm = llm Settings.embed_model = embed_model
  • The embedding model turns your docs into “AI food” (vectors).
  • The LLM (Claude) is the brain that answers questions using those vectors.
3. Document Indexing
PERSIST_DIR = "./chat-engine-storage" if not os.path.exists(PERSIST_DIR): documents = SimpleDirectoryReader("docs").load_data() index = VectorStoreIndex.from_documents(documents) index.storage_context.persist(persist_dir=PERSIST_DIR) else: storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR) index = load_index_from_storage(storage_context)
  • If you haven’t indexed your docs before, it reads everything in docs/ and builds a knowledge base.
  • If you have, it loads the existing index (so it doesn’t have to re-read your 500-page PDF every time).
4. Voice Activity Detection (VAD)
def prewarm(proc: JobProcess): proc.userdata["vad"] = silero.VAD.load()
  • This makes sure your AI only listens when you’re actually talking, not when you’re yelling at your cat.
5. The Entrypoint: Where the Magic Happens
async def entrypoint(ctx: JobContext): chat_context = ChatContext().append( role="system", text=( "You are a funny, witty assistant." "Respond with short and concise answers. Avoid using unpronouncable punctuation or emojis." ), ) chat_engine = index.as_chat_engine(chat_mode=ChatMode.CONTEXT) ...
  • Sets the “personality” of your AI (witty, concise, no weird punctuation).
  • Prepares the chat engine with your indexed docs.
logger.info(f"Connecting to room {ctx.room.name}") await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) participant = await ctx.wait_for_participant() logger.info(f"Starting voice assistant for participant {participant.identity}")
  • Connects to a LiveKit room (more on this soon).
  • Waits for a participant (that’s you!) to join.
agent = VoicePipelineAgent( vad=ctx.proc.userdata["vad"], stt=assemblyai.STT(), llm=llama_index.LLM(chat_engine=chat_engine), tts=cartesia.TTS( model="sonic-2", voice="bf0a246a-8642-498a-9950-80c35e9276b5", ), chat_ctx=chat_context, ) agent.start(ctx.room, participant) await agent.say( "Hey there! How can I help you today?", allow_interruptions=True, )
  • Sets up the full voice pipeline: listens, understands, thinks, and talks back.
  • Greets you with a friendly message.
6. The Main Event
if __name__ == "__main__": print("Starting voice agent with Anthropic...") cli.run_app( WorkerOptions( entrypoint_fnc=entrypoint, prewarm_fnc=prewarm, ), )

Chapter 5: Summoning Your AI (a.k.a. Running the Code)

  1. Install your dependencies (see requirements.txt).
  1. Put your API keys in a .env file:
ANTHROPIC_API_KEY=your_anthropic_key ASSEMBLYAI_API_KEY=your_assemblyai_key CARTESIA_API_KEY=your_cartesia_api_key LIVEKIT_URL=your_livekit_url LIVEKIT_API_KEY=your_livekit_api_key LIVEKIT_API_SECRET=your_livekit_api_secret
  1. Add your documents to the docs/ folder.
  1. Run
python voice_agent_anthropic.py start

Chapter 6: Entering the LiveKit Room

  • What’s a LiveKit room?
Think of it as a virtual meeting room where your AI is always waiting for you.
  • How do you join?
Use the LiveKit Playground or your own LiveKit client, enter your room name, and your AI will greet you like an old friend (who actually remembers your last conversation).

Chapter 7: The Result

Now, you can:
  • Talk to your AI: Ask questions, get answers from your own docs.
  • Get witty, concise responses: No more boring bots!
  • Impress your friends: “Yeah, my Voice AI actually knows what’s in my files.”
if you have any doubts you can contact me 🙂

Happy coding! 🚀

Related Posts