Audio-RAG : Voice-Enabled RAG System with Groq, Whisper, and Qdrant

A high-performance Retrieval-Augmented Generation (RAG) system built with Streamlit, leveraging Groq for ultra-low latency LLM processing, Whisper for accurate transcription, and Qdrant for efficient vector storage. This project uses the recommended src layout, which keeps the application's core Python modules separate from the project root.

Audio_RAG_Demo.mp4

Architecture

Key Features

Voice-Enabled Interaction: Speak your questions using the Streamlit UI and get a text and audio response.
Groq Integration: Utilizes Groq's low-latency Llama models for near-instantaneous RAG answers.
Advanced Audio Preprocessing: pydub (requires FFmpeg) is used to normalize volume and trim long silence gaps, drastically improving Whisper's transcription accuracy and speed.
LLM-Based Transcript Correction: Uses a dedicated LLM chain to correct technical terms or misheard words in the transcribed prompt based on the RAG context documents.
PDF Ingestion & Qdrant: Uploads and chunks PDF documents, storing vector embeddings in Qdrant (supports both local or cloud instances).
Evaluation Metrics: Provides Relevance, Faithfulness, Completeness, and Retrieval Similarity scores for RAG quality assurance.
Text-to-Speech (TTS): Generates high-quality audio responses using Groq's Play.ai TTS models.
Modular Codebase: Core logic is split into src/app.py (UI/Main Loop) and src/utils.py (Backend/Helper Functions).

Flow Diagram

sequenceDiagram
    actor User
    participant Streamlit(app.py)
    participant AudioRecorder
    participant Utils(utils.py)
    participant Groq_Whisper
    participant Qdrant_DB
    participant Groq_LLM
    participant Groq_TTS

    %% 1. Input and Recording
    User ->> Streamlit(app.py): Starts recording
    Streamlit(app.py) ->> AudioRecorder: Captures audio bytes
    AudioRecorder -->> Streamlit(app.py): Returns RAW audio bytes

    %% 2. Audio Preprocessing & Transcription
    alt Audio Preprocessing Enabled
        Streamlit(app.py) ->> Utils(utils.py): preprocess_audio(raw_audio)
        Utils(utils.py) ->> Utils(utils.py): Normalize + Silence Trimming (pydub/FFmpeg)
        Utils(utils.py) -->> Streamlit(app.py): Processed audio bytes
    else Audio Preprocessing Disabled
        Streamlit(app.py) ->> Streamlit(app.py): Uses RAW audio bytes
    end
    
    Streamlit(app.py) ->> Utils(utils.py): transcribe_audio(processed_audio)
    Utils(utils.py) ->> Groq_Whisper: Send audio for ASR
    Groq_Whisper -->> Utils(utils.py): Returns raw_transcript
    Utils(utils.py) -->> Streamlit(app.py): raw_transcript

    %% 3. RAG and Conditional LLM Correction
    Streamlit(app.py) ->> Utils(utils.py): setup_retriever()
    
    alt LLM Correction Enabled
        Streamlit(app.py) ->> Qdrant_DB: Retrieve context for correction
        Qdrant_DB -->> Streamlit(app.py): Context Docs
        Streamlit(app.py) ->> Utils(utils.py): correct_transcript(raw_transcript, context, LLM)
        Utils(utils.py) ->> Groq_LLM: Send context & raw_transcript
        Groq_LLM -->> Utils(utils.py): Returns corrected_prompt
        Utils(utils.py) -->> Streamlit(app.py): corrected_prompt (Final Query)
    else LLM Correction Disabled
        Streamlit(app.py) ->> Streamlit(app.py): Uses raw_transcript as Final Query
    end

    %% 4. RAG Chain Execution
    Streamlit(app.py) ->> Utils(utils.py): run_rag_chain(Final Query)
    Utils(utils.py) ->> Qdrant_DB: Retrieve final source documents (k)
    Qdrant_DB -->> Utils(utils.py): Context Chunks
    Utils(utils.py) ->> Groq_LLM: Prompt (Final Query + Context)
    Groq_LLM -->> Utils(utils.py): Returns Answer (Text)
    Utils(utils.py) -->> Streamlit(app.py): Answer, Source Docs, Latency

    %% 5. Output Generation (TTS)
    Streamlit(app.py) ->> Groq_TTS: generate_speech(Answer)
    Groq_TTS -->> Streamlit(app.py): Returns audio bytes
    Streamlit(app.py) -->> User: Display Text Answer
    Streamlit(app.py) -->> User: Play Audio Answer

Prerequisites

You must have Python 3.8+ installed. This project requires two main types of dependencies: Python packages and a system dependency.

Python Dependencies Install the required Python packages using pip:

pip install -r requirements.txt

System Dependency (FFmpeg) The audio preprocessing feature using pydub requires FFmpeg to be installed on your operating system and accessible from the command line. macOS (using Homebrew): brew install ffmpeg Debian/Ubuntu: sudo apt update && sudo apt install ffmpeg Windows: Download the correct build from ffmpeg.org and add the bin directory to your system's PATH.

🚀 Installation & Setup

Project Structure Your project structure should look like this:

AUDIO-RAG/
├── src/
│   ├── app.py        # Streamlit UI and Main Logic
│   └── utils.py      # All Backend/Helper Functions
├── venv/
├── logs/
├── .env              # Environment variables (API keys)
├── .gitignore
├── README.md
└── requirements.txt

Configure Environment Variables Create a file named .env in the project root (AUDIO-RAG/) and populate it with your keys.

GROQ_API_KEY="your_groq_api_key_here"
QDRANT_URL="your_qdrant_cloud_url"
QDRANT_API_KEY="your_qdrant_cloud_key"

Run the Application Because your application files are inside the src folder, you must tell Python where to find them. The easiest way to run the Streamlit app is to change the working directory:

# 1. Change to the directory containing app.py
cd src

# 2. Run the Streamlit application
streamlit run app.py

The application will open in your browser, usually at http://localhost:8501.

Usage Instructions

Step 1: Configure & Embed Documents

In the Streamlit sidebar, enter your Groq API Key.
Navigate to 📂 Document Ingestion.
Upload one or more PDF documents.
Click the 🚀 Embed & Store button. Wait for the success message.

Step 2: Voice Interaction

Ensure you are in 🎙️ Voice Mode Active (uncheck "Switch to Text Chat Mode" in the sidebar).
Click the Click to record microphone button under the chat window.
Speak your question, referencing the content of your uploaded PDFs.
Click the microphone again to stop recording. The system will automatically transcribe, run the RAG process, and speak the answer.

Step 3: View Quality Scores

If Enable Evaluation is checked in the sidebar, you will see a detailed breakdown of the RAG response quality, including Relevance, Faithfulness, Completeness, and Retrieval similarity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio-RAG : Voice-Enabled RAG System with Groq, Whisper, and Qdrant

Architecture

Key Features

Flow Diagram

Prerequisites

Usage Instructions

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Rakesh-Seenu/Audio-RAG

Folders and files

Latest commit

History

Repository files navigation

Audio-RAG : Voice-Enabled RAG System with Groq, Whisper, and Qdrant

Architecture

Key Features

Flow Diagram

Prerequisites

Usage Instructions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages