To build a real-world RAG (Retrieval-Augmented Generation) system that can:
-
Index a collection of PDF documents
-
Allow users to ask natural language questions
-
Retrieve relevant context and generate coherent, grounded answers using an LLM
-
Accepts PDF uploads via /upload
-
Extracts and chunks text into smaller segments
-
Embeds each chunk using all-MiniLM-L6-v2 from sentence-transformers
-
Stores vectors in a FAISS index (inner product for cosine similarity)
-
Metadata (chunk, source filename, etc.) saved in JSON
-
/ask endpoint performs:
-
Query embedding
-
FAISS retrieval of top-k chunks
-
Prompt construction with context
-
LLM call via local llama-simple binary (gguf model)
-
Return of answer
-
-
Simple interface for:
-
Typing a question
-
Viewing a loading state
-
Displaying final answers and sources
-
-
Calls /ask endpoint from backend via lib/api.ts
-
β Document ingestion works for PDFs.
-
β Chunking and embedding pipeline is functional.
-
β FAISS retrieval gives relevant chunks.
-
β llama-simple can generate responses based on prompt.
-
β Frontend successfully queries backend and shows answers.
-
πΈ Prompt sometimes misleads the LLM into hallucinations.
-
πΈ Answers were occasionally not grounded in retrieved content.
-
πΈ LLM was not reliably using source context until we structured the prompt better.
-
πΈ The local model (llama-simple) isn't fine-tuned for QA.
-
πΈ No reranking or source-based highlighting yet.
-
πΈ Some responses were poorly formatted (e.g., odd punctuation, clipped endings).
-
β Step-by-step RAG architecture implemented
-
β Added prompt engineering to improve grounding
-
β Began parsing structured output (JSON-formatted answers)
-
β Embedded chunk metadata into prompts for explainability
-
β Clear separation of backend (retrieval & LLM) vs frontend
-
Model: all-MiniLM-L6-v2 (for embeddings)
-
LLM: Quantized LLaMA-2 7B (via llama.cpp)
-
Indexing: FAISS (flat inner product)
-
Docs: Local PDF files (manually uploaded)
-
**Reranking using BAAI/bge-m3 or Cohere Reranker
** -
Better structured output from LLM (JSON + citations)
-
Citation highlighting on frontend (clickable chunks)
-
Multi-doc support UI (PDF-specific filtering)
-
Model fallback or hybrid LLM options (e.g., OpenAI for better answers)
-
Upload interface & file explorer