📚 KIIT LLM Chatbot with RAG: Multi-File Data Query System

Welcome to the KIIT LLM Chatbot, an AI-powered system designed to retrieve and generate relevant information from PDFs, CSVs, and DOCs using Retrieval-Augmented Generation (RAG). This project leverages the power of Streamlit for the UI, providing an interactive experience for users to query documents with ease.

🌟 Features

Multi-Source Document Support: Integrates PDFs, CSVs, and DOC files, allowing users to query across multiple file types.
Advanced RAG Model: Combines retrieval and generation techniques for highly relevant and accurate answers.
Streamlit Interface: Simple, user-friendly UI for easy interaction with the system.
Scalable Design: Built to handle a wide variety of document types and queries.
Customizable LLM Options: Users can choose between Gemini 1.5 Flash, Hugging Face models (via serverless inference endpoint), or Ollama on-device models (currently using Deepseek-R1-1.5b).
Ensemble Retriever: Combines PDF, CSV, and TXT retrievers using LangChain's EnsembleRetriever with equal weights (0.33 each).
History-Aware Retrieval: Implements LangChain's create_history_aware_retriever for context-aware querying.

🛠️ Technologies Used

Python: Core language for the system.
LangChain: Framework used for managing retrieval and generation.
Streamlit: Provides the interactive frontend for users.
FAISS: Vector store for efficient document retrieval.
Gemini API / Hugging Face Models: To handle the LLM-based query generation.
PyPDF2, pandas, docx2txt: Used to parse PDFs, CSVs, and DOC files.

🚀 Getting Started

Prerequisites

Make sure you have the following installed:

Python 3.8+
Required packages (found in requirements.txt)

Installation

Clone the repository:

git clone https://github.com/Manodeepray/kiit-chatbot-llm.git
cd kiit-chatbot-llm

Install dependencies:
```
pip install -r requirements.txt
```
Add your documents in the data folder:
Create the vectorstores:
```
python vector_db.py
```
Run the application:
```
streamlit run app.py
```

File Structure

📂 kiit-llm-rag-chatbot/
│
├── 📁 src/
|   |
|   ├──📁 data/
│       ├── example.pdf
│       ├── example.csv
│       └── example.txt
│
├── app.py              # Main Streamlit app
├── main_rag.py         # RAG logic and document processing
├── requirements.txt    # Python package requirements
├── README.md           # Project documentation
└── LICENSE

⚙️ How It Works

Upload Documents: Users can upload PDFs, CSVs, and TXT files to the interface.
Query the System: After the files are processed, users can input their queries.
Retrieval & Generation:
- The system uses an Ensemble Retriever to fetch information from PDFs, CSVs, and TXT files.
- Retrieval incorporates history awareness for more contextually relevant results.
Response Generation: The RAG system retrieves relevant information and generates a coherent, human-like response using the selected LLM.
Display Answers: Answers are displayed interactively within the Streamlit app.

📝 Example Query

Once the system is running, upload your documents and enter a query such as:

"What are the key highlights from the meeting notes in the uploaded TXT file?"

The system will return a detailed response based on the contents of the uploaded TXT file.

📈 Future Improvements

Multi-Language Support: Expand capabilities to support document querying in multiple languages.
Real-Time Document Updating: Allow dynamic updates when documents are added or modified.
Improved Performance: Further optimize the RAG pipeline for faster responses.
Implement Advance RAG Techniques: Add MM-RAG and Cached-Augmented generation to the pipeline.

Here is an awesome image of my project:

📜 License

This project is licensed under the MIT License. See the LICENSE file for more details.

🙌 Contributing

Contributions are welcome! To get started:

Fork the repository.
Create your feature branch (git checkout -b feature/AmazingFeature).
Commit your changes (git commit -m 'Add some AmazingFeature').
Push to the branch (git push origin feature/AmazingFeature).
Open a pull request.

📧 Contact

For any inquiries or questions, feel free to reach out at:
Manodeep Ray
Email: Manodeep

Happy Querying!

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.devcontainer		.devcontainer
__pycache__		__pycache__
app		app
artifacts		artifacts
imgs		imgs
logger		logger
misc		misc
pipeline		pipeline
preprocessing		preprocessing
rag		rag
scraping_tools		scraping_tools
src		src
static		static
templates		templates
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
app.py		app.py
git_push.sh		git_push.sh
keys.py		keys.py
llm_gemini.py		llm_gemini.py
main_rag.py		main_rag.py
rag_workflow.canvas		rag_workflow.canvas
requirements.txt		requirements.txt
vector_db.py		vector_db.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📚 KIIT LLM Chatbot with RAG: Multi-File Data Query System

🌟 Features

🛠️ Technologies Used

🚀 Getting Started

Prerequisites

Installation

File Structure

⚙️ How It Works

📝 Example Query

📈 Future Improvements

📜 License

🙌 Contributing

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Manodeepray/kiit-chatbot-llm

Folders and files

Latest commit

History

Repository files navigation

📚 KIIT LLM Chatbot with RAG: Multi-File Data Query System

🌟 Features

🛠️ Technologies Used

🚀 Getting Started

Prerequisites

Installation

File Structure

⚙️ How It Works

📝 Example Query

📈 Future Improvements

📜 License

🙌 Contributing

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages