Gaia PDF RAG is a Retrieval-Augmented Generation (RAG) application that allows users to ask questions about PDF documents using a local Gaia node and Qdrant vector database. It combines the power of local LLMs with efficient vector search to provide accurate, context-aware answers.
![image](https://private-user-images.githubusercontent.com/4999463/399790322-ca930e4a-b2dc-43bf-9fbe-c127c76e83d2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzQwNzUsIm5iZiI6MTczOTEzMzc3NSwicGF0aCI6Ii80OTk5NDYzLzM5OTc5MDMyMi1jYTkzMGU0YS1iMmRjLTQzYmYtOWZiZS1jMTI3Yzc2ZTgzZDIucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDlUMjA0MjU1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MzE2YWZhYTYzYmU3MTNjOTQ5NWRmMjdjYmFhMjkyNWY2MDdhZjQ4MjFjNmQyODBhNzc2OTQ1MWI1YTQ4ZTdlMCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.2UPCsBfMXE3d35vlRxgzsWnkReAl7wOJywgVPMNiEZQ)
![image](https://private-user-images.githubusercontent.com/4999463/399790369-63c111b6-3f26-4626-93c7-a74a8b36907f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzQwNzUsIm5iZiI6MTczOTEzMzc3NSwicGF0aCI6Ii80OTk5NDYzLzM5OTc5MDM2OS02M2MxMTFiNi0zZjI2LTQ2MjYtOTNjNy1hNzRhOGIzNjkwN2YucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDlUMjA0MjU1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MGUwYTk0ZTZmODgzZDEyODFiN2IyMzQ0MDY3MGE1MDljMDg3OWQ2MWI0N2QyYjMwZDIxZGUxYjIyNjM2ODQwMiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.-BM4NlD-jMLdJCBvTS7CepC6zbbhgSRoPwoC_gsKOXg)
![image](https://private-user-images.githubusercontent.com/4999463/399790552-aff37638-139c-4e8e-abda-2d9260f5cf7b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzQwNzUsIm5iZiI6MTczOTEzMzc3NSwicGF0aCI6Ii80OTk5NDYzLzM5OTc5MDU1Mi1hZmYzNzYzOC0xMzljLTRlOGUtYWJkYS0yZDkyNjBmNWNmN2IucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDlUMjA0MjU1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9N2JkYjZmOTk0ZTU2ZTFkZTBmZjVkMTIzNzE5YmI2MTcwY2QwNWU2NGFmZGY1ZDZlYzA0NTA1ZWU3YmQ4MDY0NiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.YGQFjz49Y9AolhC00ZyMRpUkSa5wxInyxUPGoWdHPrI)
![image](https://private-user-images.githubusercontent.com/4999463/399790618-730451da-0b20-40eb-b16e-7b3bd3c010ac.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzQwNzUsIm5iZiI6MTczOTEzMzc3NSwicGF0aCI6Ii80OTk5NDYzLzM5OTc5MDYxOC03MzA0NTFkYS0wYjIwLTQwZWItYjE2ZS03YjNiZDNjMDEwYWMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDlUMjA0MjU1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZTIxODkwNDI5NWUyNmYyNzMyZGNjNDkwMTg5MWYwZDkzOWU2Njc4ZGRjMWM4NzgxMjY1ZTllMThjMjdmY2U0OCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.1vGax2C2mgqfyhAQj18D-iXFOak7u17nNOy56To6WfQ)
![image](https://private-user-images.githubusercontent.com/4999463/399790664-e8dc4721-bcfd-4e6b-a925-f9280bc3bfbc.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzQwNzUsIm5iZiI6MTczOTEzMzc3NSwicGF0aCI6Ii80OTk5NDYzLzM5OTc5MDY2NC1lOGRjNDcyMS1iY2ZkLTRlNmItYTkyNS1mOTI4MGJjM2JmYmMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDlUMjA0MjU1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZDdmNDk2YjEyMzkzNjVjOGNmNTA1ZWE3MDc2ZWU3NjBmY2FlMTE0OTQzN2Y0N2ZkMGU1YzA5OGYzNjIxNThlNSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.Wu8K9NxUduAbUWRPNipDaWsMggouMHyCewmSGXG2vNw)
![image](https://private-user-images.githubusercontent.com/4999463/400816007-4b5b6f08-e58d-4d04-8dcc-d29c1af78fe1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzQwNzUsIm5iZiI6MTczOTEzMzc3NSwicGF0aCI6Ii80OTk5NDYzLzQwMDgxNjAwNy00YjViNmYwOC1lNThkLTRkMDQtOGRjYy1kMjljMWFmNzhmZTEucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDlUMjA0MjU1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MGQ5MWE2MTQxNTc5N2JmN2U3YTZjODUxZjc4NDM4NjBmNDU5MGE2NzZkMWVkYjc5MzNjNjBkNTllZTYxM2M0NyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.L0slrpM5SAwYombhQCL25B-Ej6G7CDNaFAH9M6g9m24)
- 📑 PDF document processing and chunking
- 🔍 Semantic search using Qdrant vector database
- 🤖 Local LLM integration through Gaia node
↗️ Cross-encoder reranking for improved relevance- 💨 Streaming responses for better UX
- 🎯 Smart source citation
- ⚡ Relevance filtering to prevent hallucinations
Before running GaiaRAG, ensure you have:
- A local Gaia node running (Check this link to learn how to run your own local LLM: https://docs.gaianet.ai/node-guide/quick-start)
- Qdrant server running
- Python 3.8+
- Required system libraries for PDF processing
- Clone the repository:
git clone https://github.com/harishkotra/gaia-pdf-rag.git
cd gaiarag
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
Start your local Gaia node:
gaianet init
gaianet start
Start Qdrant using Docker:
docker run -d -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant
-
Make sure both Gaia node and Qdrant are running
-
Start the Streamlit app:
streamlit run app.py
- Open your browser at
http://localhost:8501
- Upload a PDF document using the sidebar
- Click "Process Document" to index it
- Ask questions in the main input field
- View answers and relevant source documents
You can modify the following parameters in app.py
:
GAIA_NODE_URL
: URL of your local Gaia nodeQDRANT_HOST
: Qdrant server hostQDRANT_PORT
: Qdrant server portVECTOR_SIZE
: Embedding dimension sizeCOLLECTION_NAME
: Name for vector database collection
gaia-pdf-rag/
├── app.py # Main Streamlit application
├── requirements.txt # Python dependencies
├── .gitignore # Gitignore file
├── README.md # This file
Contributions are welcome! Please feel free to submit a Pull Request.
Inspired by this example.