Retrieval Augmented Generation (RAG) application that allows you to chat with any of your local documents in disparate formats e.g., .txt,.pdf, .md, .docx, .doc, .json,.geojson using Ollama LLMs and LangChain. Upload your document in the Streamlit Web UI for Q&A interaction. Have fun
βββ .streamlit/
β βββ config.toml # Streamlit configuration (OPTIONAL)
βββ assets/
β βββ ui.png # Streamlit UI image
βββ components/
β βββ __init__.py
β βββ chat.py # Chat interface implementation
β βββ upload.py # Document upload handling
βββ core/
β βββ __init__.py
β βββ embeddings.py # Vector embeddings configuration
β βββ llm.py # Language model setup
βββ data/
β βββ vector_store/ # To store vector embeddings in chromadb
β βββ sample_docs/ # Sample documents for testing
βββ utils/
β βββ __init__.py
β βββ helpers.py # Utility functions
βββ main.py # Application entry point
- π Multi document (
.txt,.pdf,.md,.docx,.doc,.json) processing with intelligent chunking - π§ Multi-query retrieval for better context understanding
- π― Advanced RAG implementation using LangChain and Ollama
- π Complete local data processing - no data leaves your machine
- π Jupyter notebook for experimentation
- π₯οΈ Clean Streamlit UI
-
Visit Ollama.ai to download Ollama and install
-
Open
cmdorterminaland runollama -
Install LLM models (locally):
-
Start with
ollama pull llama3.2as it's low sized (4GB) basic llm model tailored for general usecases -
For vector embeddings pull the following,
ollama pull mxbai-embed-large # or `nomic-embed-text` -
Chat with the model in
terminal,ollama run llama3.2 # or your preferred model -
Go to Ollama Models to search and pull other famous models as follows,
ollama pull dolphin3 ollama pull deepseek-r1:8b ollama pull mistral
-
Check the list of locally available ollama models:
ollama list
-
Open
cmdorterminaland navigate to your preferred directory, then run the following,git clone https://github.com/aghoshpro/ChatDocument.git
-
Go to the ChatDocument folder using
cd ChatDocument
-
Create a virtual environment
myvenvinside the./ChatDocumentfolder and activate it:python -m venv myvenv
# Windows .\myvenv\Scripts\activate # OR source myvenv/bin/activate (in Linux or Mac)
-
Install dependencies:
pip install --upgrade -r requirements.txt
-
π§ͺ Experiment with code in
*.ipynbjupyter notebook
streamlit run main.py- Ensure Ollama is running in the background
- GPU preferred for good performance if not CPU (will be slower)
./data/sample_docscontains few sample documents for you to test- Use
pip listorpip freezeto check currently installed packages
-
Edit
.streamlit/config.tomlfor your color preferences[theme] primaryColor = "#FF4B4B" backgroundColor = "#0E1117" secondaryBackgroundColor = "#262730" textColor = "#FAFAFA" font = "sans serif"
- Open issues for bugs or suggestions
- Submit pull requests
- LangChain
- Ollama
- ChromaDB
- Streamlit
- Folium
- Unstructured
- ChromaDB Tutorial Step by Step Guide
- ChromaDB Collections



