A Retrieval-Augmented Generation (RAG) powered document Q&A application. Upload PDF or TXT documents and ask questions about them using semantic search + LLM reasoning.
The system retrieves relevant document chunks using FAISS vector search, generates grounded answers using the Groq LLM API, and maintains conversation memory across turns within a session.
π https://huggingface.co/spaces/GoldSharon/docchat-ai
β οΈ Hosted on Hugging Face Spaces (free CPU tier). The app may take 20β60 seconds to start if the Space is sleeping.
| Layer | Technology |
|---|---|
| Backend | FastAPI (Python) |
| LLM | Groq API |
| Vector Database | FAISS |
| Embeddings | sentence-transformers |
| Memory | In-process session store |
| Frontend | HTML + CSS + Vanilla JS |
| Deployment | Hugging Face Spaces (Docker) |
- Upload PDF or TXT documents
- Semantic search using FAISS vector embeddings
- Context-aware answers via RAG pipeline
- Conversation memory β the LLM remembers prior Q&A turns within a session
- Semantic summary intent detection β automatically fetches more chunks when you ask for an overview
- Strict grounded responses β answers are constrained to document content, no hallucination
- General chat mode when no document is selected
- Markdown formatted responses
- Automatic session reset on restart
User uploads document
β
Text extracted and split into chunks
β
Chunks converted to embeddings (sentence-transformers)
β
Stored in FAISS vector index
β
User asks a question
β
Semantic intent check (summary vs. factual)
β
Question converted to embedding
β
FAISS retrieves top-k relevant chunks
β
Session memory (prior Q&A turns) retrieved
β
Context + memory + question sent to Groq LLM
β
LLM generates a grounded final answer
β
Answer stored in session memory for future turns
Each chat session is identified by a session_id. As you ask questions, the prior exchanges are stored in memory and injected into the LLM prompt on each subsequent turn. This enables follow-up questions like:
"Who is mentioned in section 2?" (next turn) "What did you say about that person?"
Memory is in-process and session-scoped β it resets when the server restarts.
When your question semantically matches phrases like:
- "summarize this document"
- "give me an overview"
- "what topics are covered"
...the system automatically retrieves 10 chunks with no relevance threshold, enabling a broad document summary rather than a narrow factual lookup.
git clone https://github.com/YOUR_USERNAME/docchat.git
cd docchatpython -m venv venv
source venv/bin/activateWindows:
venv\Scripts\activatepip install -r requirements.txtCreate .env:
GROQ_API_KEY=your_api_key_here
GROQ_MODEL=llama3-70b-8192
MIN_RELEVANCE_SCORE=1.0
CHUNK_SIZE=500
CHUNK_OVERLAP=50Get a free key at π https://console.groq.com
uvicorn app.main:app --reloadhttp://localhost:8000
docchat/
βββ app/
β βββ api/
β β βββ __init__.py
β β βββ models.py
β β βββ routes.py
β β βββ upload_routes.py
β βββ core/
β β βββ __init__.py
β β βββ config.py
β βββ services/
β β βββ __init__.py
β β βββ document_services.py
β β βββ faiss_services.py
β β βββ groq_service.py
β β βββ memory_service.py β NEW: session memory store
β β βββ ollama_service.py
β β βββ rag_service.py β updated: memory + intent detection
β βββ static/
β β βββ app.js
β β βββ index.html
β β βββ style.css
β βββ main.py
βββ Dockerfile
βββ requirements.txt
βββ README.md
| Variable | Description |
|---|---|
GROQ_API_KEY |
API key for Groq LLM |
GROQ_MODEL |
Model used for generation |
MIN_RELEVANCE_SCORE |
FAISS similarity distance threshold |
CHUNK_SIZE |
Document chunk size (characters) |
CHUNK_OVERLAP |
Overlap between chunks (characters) |
| Method | Endpoint | Description |
|---|---|---|
| POST | /chat |
Ask a question (RAG or general) |
| POST | /upload |
Upload a PDF or TXT document |
| GET | /health |
Health check |
| GET | /stats |
FAISS index statistics |
Chat request body:
{
"question": "What is the main topic of this document?",
"document_id": "abc123",
"session_id": "user-session-xyz"
}This project is deployed on Hugging Face Spaces using Docker.
Steps:
- Create a Space and select Docker SDK
- Add
Dockerfileto the repo root - Push project via Git
- Add secrets in Space settings:
GROQ_API_KEY
GROQ_MODEL
The Space automatically builds and deploys the application.
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to your fork
- Open a Pull Request
MIT License β free to use and modify.


