pixegami · Jeewantha97Rashmika · Sep 9, 2025
diff --git a/README.md b/README.md
@@ -1 +1,196 @@
-# rag-tutorial-v2
+# RAG Tutorial v2
+
+This project demonstrates a Retrieval-Augmented Generation (RAG) pipeline using LangChain, ChromaDB, and Ollama for local LLM and embedding inference. It features both a command-line interface and a **minimalistic web-based chat UI** built with Streamlit.
+
+## 🚀 Quick Start - Web Interface
+
+1. Make sure you have followed all setup steps in the sections below.
+2. Start the Ollama server (if not already running):
+	```
+	ollama serve
+	```
+3. Launch the minimalistic chat interface:
+	```
+	streamlit run app.py
+	```
+4. Open your browser to the provided URL (usually `http://localhost:8501`)
+5. Upload PDF documents directly through the web interface or use existing files
+6. Start chatting with your documents!
+
+## 🖥️ Command Line Interface
+
+For traditional command-line usage:
+
+1. (Optional) Add or update PDF files in the `data/` directory.
+2. Populate the database:
+	```
+	python populate_database.py
+	```
+3. Run a query:
+	```
+	python query_data.py "Your question here"
+	```
+
+## ✨ Features
+
+### Web Interface (app.py)
+- **Minimalistic Design**: Clean, distraction-free chat interface
+- **File Upload**: Drag-and-drop PDF upload with real-time processing
+- **Interactive Chat**: Conversation history with user-friendly message bubbles
+- **Collapsible Upload**: Hide/show document upload section to focus on chat
+- **Real-time Processing**: Instant document ingestion and querying
+- **Settings Panel**: Clear chat history and reset database
+- **Responsive Design**: Works well on desktop and mobile devices
+
+### Core RAG Pipeline
+- **Document Processing**: Automatic PDF text extraction and chunking
+- **Vector Database**: ChromaDB for efficient similarity search
+- **Local LLM**: Mistral model via Ollama (no API keys required)
+- **Local Embeddings**: Nomic-embed-text model for text embeddings
+- **Smart Retrieval**: Context-aware document retrieval for accurate answers
+
+## 📁 Project Structure
+
+```
+rag-tutorial-v2/
+├── app.py                      # Streamlit web interface (minimalistic design)
+├── query_data.py              # Command-line querying
+├── populate_database.py       # Database population utilities
+├── get_embedding_function.py  # Embedding model configuration
+├── test_rag.py               # Testing utilities
+├── requirements.txt          # Python dependencies
+├── data/                     # PDF documents directory
+│   └── ath.pdf              # Example document
+└── chroma/                   # ChromaDB vector store
+    └── chroma.sqlite3       # Vector database file
+```
+
+## 🎨 UI Design Features
+
+The Streamlit interface features a **minimalistic design philosophy**:
+
+- **Clean Layout**: Centered layout with optimal reading width
+- **Minimal Visual Clutter**: Hidden Streamlit branding and unnecessary elements
+- **Modern Chat Bubbles**: Distinct styling for user and assistant messages
+- **Smooth Interactions**: Hover effects and transitions for better UX
+- **Collapsible Sections**: Upload area can be hidden to focus on conversation
+- **Color Scheme**: Subtle grays and blues for a professional appearance
+- **Typography**: Clean, readable fonts with proper spacing
+- **Responsive**: Adapts to different screen sizes
+
+## Prerequisites
+- Python 3.10+
+- [Ollama](https://ollama.com/) installed and running locally
+- (Optional) AWS credentials if using Bedrock embeddings
+
+## Setup Instructions
+
+### 1. Clone the Repository
+```
+git clone <your-repo-url>
+cd rag-tutorial-v2
+```
+
+### 2. Install Python Dependencies
+```
+pip install -r requirements.txt
+```
+
+### 3. Download Ollama Models
+You need the following models:
+- `nomic-embed-text` (for embeddings)
+- `mistral` (for LLM)
+
+Pull them using:
+```
+ollama pull nomic-embed-text
+ollama pull mistral
+```
+
+### 4. Add Your PDF Files
+Place your PDF files in the `data/` directory. Example:
+```
+data/
+	monopoly.pdf
+	ticket_to_ride.pdf
+	your_file.pdf
+```
+
+### 5. Populate the Vector Database
+This step processes all PDFs and creates the Chroma vector store:
+```
+python populate_database.py
+```
+
+### 6. Query the Data
+Ask questions using the RAG pipeline:
+```
+python query_data.py "Your question here"
+```
+Example:
+```
+python query_data.py "How much total money does a player start with in Monopoly?"
+```
+
+## Usage Examples
+
+### Web Interface
+1. **Upload a Document**: Click "Add Document" and select a PDF file
+2. **Ask Questions**: Type your question in the chat input
+3. **View Responses**: See AI responses with context from your documents
+4. **Manage Chat**: Use sidebar to clear chat or reset database
+
+### Command Line
+```bash
+# Process documents
+python populate_database.py
+
+# Ask questions
+python query_data.py "What is the main topic of the document?"
+python query_data.py "How much money does each player start with?"
+python query_data.py "What are the rules for passing GO?"
+```
+
+## Updating Data
+
+### Web Interface
+- Simply upload new PDF files through the web interface
+- Documents are automatically processed and added to the vector database
+- No manual database population needed
+
+### Command Line
+If you add new PDFs to the `data/` directory, run:
+```bash
+python populate_database.py
+```
+
+## Notes
+- If you see deprecation warnings, consider updating imports as suggested in the warnings.
+- To use AWS Bedrock embeddings, update `get_embedding_function.py` and configure your AWS credentials.
+
+## Troubleshooting
+
+### Common Issues
+- **Ollama Connection**: Ensure Ollama is running (`ollama serve`)
+- **Model Not Found**: Pull required models (`ollama pull mistral` and `ollama pull nomic-embed-text`)
+- **Streamlit Port**: If port 8501 is busy, Streamlit will suggest an alternative
+- **File Upload**: Ensure PDF files are valid and not password-protected
+- **Memory Issues**: For large documents, consider splitting them into smaller files
+
+### Performance Tips
+- Use the web interface for better user experience
+- Upload smaller PDF files for faster processing
+- Clear the database periodically if it becomes too large
+- Close the upload section to focus on chat interface
+
+## Tech Stack
+
+- **Frontend**: Streamlit with custom CSS for minimalistic design
+- **Backend**: Python with LangChain framework
+- **Vector Database**: ChromaDB for similarity search
+- **LLM**: Mistral via Ollama (local inference)
+- **Embeddings**: Nomic-embed-text (local embeddings)
+- **Document Processing**: PyPDF for PDF text extraction
+
+## License
+MIT
diff --git a/app.py b/app.py
@@ -0,0 +1,192 @@
+
+import streamlit as st
+import os
+import tempfile
+from populate_database import ingest_file, clear_database
+from query_data import query_rag
+
+# Minimalistic page config
+st.set_page_config(
+    page_title="RAG Chat",
+    page_icon="💬",
+    layout="centered",
+    initial_sidebar_state="collapsed"
+)
+
+# Custom CSS for minimalistic design
+st.markdown("""
+<style>
+    /* Hide Streamlit default elements */
+    #MainMenu {visibility: hidden;}
+    .stDeployButton {display:none;}
+    footer {visibility: hidden;}
+    header {visibility: hidden;}
+
+    /* Main container styling */
+    .main .block-container {
+        padding-top: 2rem;
+        padding-bottom: 1rem;
+        max-width: 700px;
+    }
+
+    /* Chat message styling */
+    .chat-message {
+        padding: 1rem;
+        margin: 0.5rem 0;
+        border-radius: 10px;
+        border-left: 3px solid #e0e0e0;
+    }
+
+    .user-message {
+        background-color: #f8f9fa;
+
+    }
+
+    .assistant-message {
+        background-color: #ffffff;
+        border-left-color: #28a745;
+        border: 1px solid #e9ecef;
+    }
+
+    /* Input styling */
+    .stTextInput > div > div > input {
+
+        border: 1px solid #e0e0e0;
+        padding: 0.5rem 1rem;
+
+    }
+
+    /* Button styling */
+    .stButton > button {
+        border-radius: 20px;
+        border: none;
+        background-color: #007bff;
+        color: white;
+        transition: all 0.3s ease;
+    }
+
+    .stButton > button:hover {
+        background-color: #0056b3;
+        transform: translateY(-1px);
+    }
+
+    /* Upload area styling */
+    .upload-section {
+        background-color: #f8f9fa;
+        padding: 1.5rem;
+        border-radius: 10px;
+        border: 2px dashed #e0e0e0;
+        text-align: center;
+        margin: 1rem 0;
+    }
+
+    /* Hide upload section when not needed */
+    .minimize-upload {
+        padding: 0.5rem;
+        background-color: transparent;
+        border: 1px solid #e0e0e0;
+    }
+</style>
+""", unsafe_allow_html=True)
+
+# Initialize session state
+if 'chat_history' not in st.session_state:
+    st.session_state['chat_history'] = []
+if 'show_upload' not in st.session_state:
+    st.session_state['show_upload'] = True
+
+# App title - minimalistic
+st.markdown("<h1 style='text-align: center; color: #333; font-weight: 300; margin-bottom: 2rem;'>💬 RAG Chat</h1>", unsafe_allow_html=True)
+
+# Toggle upload section
+col1, col2, col3 = st.columns([1, 2, 1])
+with col2:
+    if st.button("📄 " + ("Hide Upload" if st.session_state['show_upload'] else "Add Document"), 
+                 use_container_width=True, type="secondary"):
+        st.session_state['show_upload'] = not st.session_state['show_upload']
+
+# Upload section - collapsible
+if st.session_state['show_upload']:
+    with st.container():
+        st.markdown('<div class="upload-section">', unsafe_allow_html=True)
+        st.markdown("##### 📄 Upload PDF Document")
+        uploaded_file = st.file_uploader("", type=["pdf"], label_visibility="collapsed")
+
+        if uploaded_file is not None:
+            temp_dir = tempfile.gettempdir()
+            file_path = os.path.join(temp_dir, uploaded_file.name)
+            with open(file_path, "wb") as f:
+                f.write(uploaded_file.getbuffer())
+
+            with st.spinner("Processing document..."):
+                result = ingest_file(file_path)
+            st.success(f"✅ {uploaded_file.name} added successfully")
+            # Auto-hide upload section after successful upload
+            st.session_state['show_upload'] = False
+            st.rerun()
+
+        st.markdown('</div>', unsafe_allow_html=True)
+
+# Chat Interface
+st.markdown("---")
+
+# Chat input at the bottom
+with st.form(key="chat_form", clear_on_submit=True):
+    col1, col2 = st.columns([4, 1])
+    with col1:
+        user_input = st.text_input("", placeholder="Ask a question about your documents...", label_visibility="collapsed")
+    with col2:
+        submit_button = st.form_submit_button("Send", use_container_width=True)
+
+if submit_button and user_input:
+    with st.spinner("Thinking..."):
+        try:
+            answer = query_rag(user_input)
+        except Exception as e:
+            answer = f"I encountered an error: {str(e)}"
+
+    st.session_state['chat_history'].append((user_input, answer))
+    st.rerun()
+
+# Display chat history with minimalistic design
+if st.session_state['chat_history']:
+    st.markdown("### Conversation")
+
+    # Reverse order to show latest messages first
+    for i, (question, answer) in enumerate(reversed(st.session_state['chat_history'])):
+        # User message
+        st.markdown(f"""
+        <div class="chat-message user-message">
+            <strong>You:</strong> {question}
+        </div>
+        """, unsafe_allow_html=True)
+
+        # Assistant message
+        st.markdown(f"""
+        <div class="chat-message assistant-message">
+            <strong>Assistant:</strong> {answer}
+        </div>
+        """, unsafe_allow_html=True)
+
+        if i < len(st.session_state['chat_history']) - 1:
+            st.markdown("<br>", unsafe_allow_html=True)
+
+# Settings in sidebar for advanced users
+with st.sidebar:
+    st.markdown("### ⚙️ Settings")
+
+    if st.button("🗑️ Clear Chat", use_container_width=True):
+        st.session_state['chat_history'] = []
+        st.rerun()
+
+    st.markdown("---")
+
+    if st.button("�️ Clear Database", use_container_width=True, type="secondary"):
+        with st.spinner("Clearing database..."):
+            clear_database()
+            st.session_state['chat_history'] = []
+        st.success("Database cleared!")
+        st.rerun()
+
+    st.markdown("---")
+    st.markdown("*Made with Streamlit*")
diff --git a/data/ath.pdf b/data/ath.pdf
diff --git a/data/monopoly.pdf b/data/monopoly.pdf
diff --git a/data/ticket_to_ride.pdf b/data/ticket_to_ride.pdf