A powerful Streamlit-Based web application that allows users to upload PDF documents and interact with an AI chatbot powered by a multi-hop retrieval-augmented generation (RAG) technique. Designed for researchers, students, and knowledge enthusiasts, this chatbot processes complex queries by breaking them into sub-questions, retrieving relevant document chunks, and generating detailed, conversational answers.
- PDF Document Upload: Upload multiple PDF files for processing and indexing.
- Multi-Hop RAG: Breaks down complex queries into sub-questions for precise information retrieval.
- Conversational AI: Delivers detailed, conversational answers based on document content.
- Interactive Interface: Built with Streamlit for a seamless and intuitive user experience.
- Document Management: Easily clear uploaded documents to start fresh.
- Retrieval Details: Displays sub-questions and retrieved document snippets for transparency.
Technology | Description |
---|---|
Python 3.8+ | Core programming language for the application. |
Streamlit | Creates the interactive web-based user interface. |
LangChain | Framework for building applications with language models. |
LangChain-Ollama | Integrates Ollama for text generation and embeddings. |
Chroma | Vector database for storing and retrieving document chunks. |
PyPDFLoader | Loads and parses PDF documents. |
RecursiveCharacterTextSplitter | Splits documents into manageable chunks for indexing. |
- Python 3.8 or later: Download from python.org.
- Ollama: Local AI server for running language models. Install from ollama.com.
- Internet Access: Required for downloading Ollama models and accessing the app.
-
Install Python:
- Ensure Python 3.8 or later is installed. Verify with:
python --version
- Ensure Python 3.8 or later is installed. Verify with:
-
Install Ollama:
- Follow instructions at ollama.com.
- Pull the required models:
ollama pull qwen2.5:latest ollama pull nomic-embed-text:latest
- Start the Ollama server:
ollama serve
-
Clone the Repository:
git clone https://github.com/armanjscript/Multi-Hop-RAG-Chatbot.git cd Multi-Hop-RAG-Chatbot
-
Install Python Libraries:
pip install streamlit langchain langchain-ollama langchain-chroma langchain-community
-
Run the Application:
- Ensure the Ollama server is running:
ollama serve
- Start the Streamlit app:
streamlit run multi_hop_rag.py
- Open your browser and navigate to
http://localhost:8501
.
- Ensure the Ollama server is running:
-
Upload Documents:
- In the sidebar, use the file uploader to select PDF files.
- Click "Process Documents" to load and index the documents into the vector store.
-
Chat with the AI:
- Enter a question in the chat input box (e.g., "What is the capital of France and its population?").
- The chatbot will break the question into sub-questions, retrieve relevant document chunks, and generate a detailed answer.
- View retrieval details (sub-questions and document snippets) in the expandable section.
-
Clear Documents:
- Click "Clear Documents" in the sidebar to remove all uploaded files and reset the vector store.
The multi-hop RAG technique enhances the chatbot’s ability to handle complex queries by breaking them into smaller, manageable sub-questions. Here’s how it works:
-
Question Decomposition:
- The user’s query is analyzed by the
qwen2.5:latest
model to generate 2-3 sub-questions. - Example: For "What is the capital of France and its population?", sub-questions might be:
- "What is the capital of France?"
- "What is the population of Paris?"
- The user’s query is analyzed by the
-
Document Retrieval:
- For each sub-question, the Chroma vector store retrieves the top 3 relevant document chunks using the
nomic-embed-text:latest
embeddings. - Duplicates are removed to ensure efficiency.
- For each sub-question, the Chroma vector store retrieves the top 3 relevant document chunks using the
-
Answer Generation:
- The retrieved documents are combined and passed to the
qwen2.5:latest
model along with the original question. - The model generates a detailed, conversational answer based on the context.
- The retrieved documents are combined and passed to the
[User Question] --> [Generate Sub-Questions] --> [Retrieve Docs for Each Sub-Question] --> [Combine Unique Docs] --> [Generate Answer]
Example Workflow:
- Input: "What is the capital of France and its population?"
- Sub-Questions:
- "What is the capital of France?"
- "What is the population of Paris?"
- Retrieval: Fetch relevant document chunks for each sub-question.
- Combination: Merge unique document chunks.
- Output: Generate a response like: "The capital of France is Paris, with a population of approximately 2.2 million."
- Document Quality: Answer accuracy depends on the quality and relevance of uploaded PDFs.
- Model Performance: The effectiveness of sub-question generation and answer quality relies on the
qwen2.5:latest
model. - File Handling: Uploaded PDFs are saved locally and deleted when cleared, which may affect other files with the same name.
- Processing Time: Large PDFs or complex queries may take longer to process.
Contributions are welcome! To contribute:
- Fork the repository.
- Make changes, ensuring they align with the project’s coding style.
- Submit a pull request with a clear description of your changes.
- Include tests to maintain quality.
This project is licensed under the MIT License. See the LICENSE file for details.
This tool is for informational purposes only. Answers depend on the quality of uploaded documents and model performance. Always verify critical information with reliable sources.
For questions or feedback, contact [Arman Daneshdoost] at [armannew73@gmail.com].