A Retrieval-Augmented Generation (RAG) system that answers natural language questions about uploaded PDF documents, strictly grounded in document content. No hallucinations.
Built with LlamaIndex, LlamaParse, Groq, and Neon PostgreSQL (pgvector).
PDF Upload → LlamaParse (Cloud Markdown extraction)
→ Metadata tagging (document_name injected per chunk)
→ SentenceSplitter (512 tokens, 64 overlap)
→ HuggingFace BGE-small (384-dim, LOCAL — no API calls) → Neon pgvector DB
─────────────────────────────────────────────────────────────────────────────────────
User Question → HuggingFace BGE-small (LOCAL embed) → Cosine Search
→ Top-5 Chunks → Strict Grounded Prompt
→ Groq (temp=0.1) → Answer + Page Citations
See ARCHITECTURE.md for a detailed component diagram.
- Python 3.11+
- A Neon.tech PostgreSQL database (free tier works)
- A LlamaCloud API key (for LlamaParse)
- A Groq API key (for llm only)
Note: No embedding API key needed — embeddings run locally via HuggingFace. The model (
BAAI/bge-small-en-v1.5, ~133 MB) is downloaded automatically on first run.
cd ai-doc-rag
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txtcp .env.example .env
# Fill in GOOGLE_API_KEY, NEON_DATABASE_URL, LLAMA_CLOUD_API_KEY| Variable | Source |
|---|---|
GROQ_API_KEY |
Groq Console — LLM only |
NEON_DATABASE_URL |
Neon.tech — free PostgreSQL + pgvector |
LLAMA_CLOUD_API_KEY |
LlamaCloud — for LlamaParse |
streamlit run app/main.py- Upload any PDF document in the sidebar (e.g., Annual-Report-FY-2023-24.pdf)
- Click "Process Document" — LlamaParse extracts tables + text as Markdown
- Ask questions — e.g. "What was the total revenue in FY2024?"
- Upload additional documents — they are added to the same index with metadata tagging
All documents share the same pgvector table (document_chunks_llama). Each chunk
is tagged with a document_name in its metadata during ingestion, so:
- Source chunks display which document they came from
- The LLM is instructed to mention the source document when relevant
- Duplicate uploads (same file hash) are rejected automatically
docker build -t doc-rag .
docker run -p 8501:8501 --env-file .env doc-ragpytest tests/ -vAll tests run fully offline (external APIs mocked).
| Technique | Setting |
|---|---|
| Low temperature | 0.1 |
| Strict system prompt | Answer ONLY from context |
| Refusal instruction | Say "not available" if absent |
| Citation enforcement | Always cite page numbers |
| Table fidelity | LlamaParse preserves Markdown tables |
| Limited context | Top-5 chunks only |
| Document source | Prompt mentions document_name metadata |
ai-doc-rag/
├── app/
│ ├── main.py # Streamlit UI (chat + upload + multi-doc)
│ ├── config.py # Config, env vars, URL helpers
│ ├── llm/
│ │ └── __init__.py # Centralized LlamaIndex Settings
│ ├── ingestion/
│ │ └── pipeline.py # LlamaParse → tag → chunk → embed → Neon
│ ├── retrieval/
│ │ └── query_engine.py # Load index → QueryEngine + strict prompt
│ └── utils/
│ └── logger.py # Structured logging
├── tests/
│ ├── test_ingestion.py # Ingestion pipeline + metadata tests
│ └── test_retrieval.py # Query engine + prompt tests
├── Dockerfile
├── requirements.txt
├── .env.example
├── ARCHITECTURE.md
└── README.md
| Layer | Technology |
|---|---|
| PDF Parsing | LlamaParse (cloud) via llama-index-readers-llama-parse |
| Chunking | LlamaIndex SentenceSplitter (512 tokens / 64 overlap) |
| Embeddings | HuggingFace BAAI/bge-small-en-v1.5 (384-dim, LOCAL — no API key) via llama-index-embeddings-huggingface |
| Vector DB | Neon PostgreSQL + pgvector via llama-index-vector-stores-postgres |
| LLM | groq (temp=0.1) via llama_index.llms.groq |
| Framework | LlamaIndex Core |
| UI | Streamlit |
| Deployment | Docker |