A high-performance Retrieval-Augmented Generation (RAG) system built with Streamlit, leveraging Groq for ultra-low latency LLM processing, Whisper for accurate transcription, and Qdrant for efficient vector storage. This project uses the recommended src layout, which keeps the application's core Python modules separate from the project root.
Audio_RAG_Demo.mp4
- Voice-Enabled Interaction: Speak your questions using the Streamlit UI and get a text and audio response.
- Groq Integration: Utilizes Groq's low-latency Llama models for near-instantaneous RAG answers.
- Advanced Audio Preprocessing: pydub (requires FFmpeg) is used to normalize volume and trim long silence gaps, drastically improving Whisper's transcription accuracy and speed.
- LLM-Based Transcript Correction: Uses a dedicated LLM chain to correct technical terms or misheard words in the transcribed prompt based on the RAG context documents.
- PDF Ingestion & Qdrant: Uploads and chunks PDF documents, storing vector embeddings in Qdrant (supports both local or cloud instances).
- Evaluation Metrics: Provides Relevance, Faithfulness, Completeness, and Retrieval Similarity scores for RAG quality assurance.
- Text-to-Speech (TTS): Generates high-quality audio responses using Groq's Play.ai TTS models.
- Modular Codebase: Core logic is split into src/app.py (UI/Main Loop) and src/utils.py (Backend/Helper Functions).
sequenceDiagram
actor User
participant Streamlit(app.py)
participant AudioRecorder
participant Utils(utils.py)
participant Groq_Whisper
participant Qdrant_DB
participant Groq_LLM
participant Groq_TTS
%% 1. Input and Recording
User ->> Streamlit(app.py): Starts recording
Streamlit(app.py) ->> AudioRecorder: Captures audio bytes
AudioRecorder -->> Streamlit(app.py): Returns RAW audio bytes
%% 2. Audio Preprocessing & Transcription
alt Audio Preprocessing Enabled
Streamlit(app.py) ->> Utils(utils.py): preprocess_audio(raw_audio)
Utils(utils.py) ->> Utils(utils.py): Normalize + Silence Trimming (pydub/FFmpeg)
Utils(utils.py) -->> Streamlit(app.py): Processed audio bytes
else Audio Preprocessing Disabled
Streamlit(app.py) ->> Streamlit(app.py): Uses RAW audio bytes
end
Streamlit(app.py) ->> Utils(utils.py): transcribe_audio(processed_audio)
Utils(utils.py) ->> Groq_Whisper: Send audio for ASR
Groq_Whisper -->> Utils(utils.py): Returns raw_transcript
Utils(utils.py) -->> Streamlit(app.py): raw_transcript
%% 3. RAG and Conditional LLM Correction
Streamlit(app.py) ->> Utils(utils.py): setup_retriever()
alt LLM Correction Enabled
Streamlit(app.py) ->> Qdrant_DB: Retrieve context for correction
Qdrant_DB -->> Streamlit(app.py): Context Docs
Streamlit(app.py) ->> Utils(utils.py): correct_transcript(raw_transcript, context, LLM)
Utils(utils.py) ->> Groq_LLM: Send context & raw_transcript
Groq_LLM -->> Utils(utils.py): Returns corrected_prompt
Utils(utils.py) -->> Streamlit(app.py): corrected_prompt (Final Query)
else LLM Correction Disabled
Streamlit(app.py) ->> Streamlit(app.py): Uses raw_transcript as Final Query
end
%% 4. RAG Chain Execution
Streamlit(app.py) ->> Utils(utils.py): run_rag_chain(Final Query)
Utils(utils.py) ->> Qdrant_DB: Retrieve final source documents (k)
Qdrant_DB -->> Utils(utils.py): Context Chunks
Utils(utils.py) ->> Groq_LLM: Prompt (Final Query + Context)
Groq_LLM -->> Utils(utils.py): Returns Answer (Text)
Utils(utils.py) -->> Streamlit(app.py): Answer, Source Docs, Latency
%% 5. Output Generation (TTS)
Streamlit(app.py) ->> Groq_TTS: generate_speech(Answer)
Groq_TTS -->> Streamlit(app.py): Returns audio bytes
Streamlit(app.py) -->> User: Display Text Answer
Streamlit(app.py) -->> User: Play Audio Answer
You must have Python 3.8+ installed. This project requires two main types of dependencies: Python packages and a system dependency.
- Python Dependencies Install the required Python packages using pip:
pip install -r requirements.txt
- System Dependency (FFmpeg) The audio preprocessing feature using pydub requires FFmpeg to be installed on your operating system and accessible from the command line. macOS (using Homebrew): brew install ffmpeg Debian/Ubuntu: sudo apt update && sudo apt install ffmpeg Windows: Download the correct build from ffmpeg.org and add the bin directory to your system's PATH.
🚀 Installation & Setup
- Project Structure Your project structure should look like this:
AUDIO-RAG/
├── src/
│ ├── app.py # Streamlit UI and Main Logic
│ └── utils.py # All Backend/Helper Functions
├── venv/
├── logs/
├── .env # Environment variables (API keys)
├── .gitignore
├── README.md
└── requirements.txt
- Configure Environment Variables Create a file named .env in the project root (AUDIO-RAG/) and populate it with your keys.
GROQ_API_KEY="your_groq_api_key_here"
QDRANT_URL="your_qdrant_cloud_url"
QDRANT_API_KEY="your_qdrant_cloud_key"
- Run the Application Because your application files are inside the src folder, you must tell Python where to find them. The easiest way to run the Streamlit app is to change the working directory:
# 1. Change to the directory containing app.py
cd src
# 2. Run the Streamlit application
streamlit run app.py
The application will open in your browser, usually at http://localhost:8501.
Step 1: Configure & Embed Documents
- In the Streamlit sidebar, enter your Groq API Key.
- Navigate to 📂 Document Ingestion.
- Upload one or more PDF documents.
- Click the 🚀 Embed & Store button. Wait for the success message.
Step 2: Voice Interaction
- Ensure you are in 🎙️ Voice Mode Active (uncheck "Switch to Text Chat Mode" in the sidebar).
- Click the Click to record microphone button under the chat window.
- Speak your question, referencing the content of your uploaded PDFs.
- Click the microphone again to stop recording. The system will automatically transcribe, run the RAG process, and speak the answer.
Step 3: View Quality Scores
- If Enable Evaluation is checked in the sidebar, you will see a detailed breakdown of the RAG response quality, including Relevance, Faithfulness, Completeness, and Retrieval similarity.