Skip to content

brownsloth/smart-document-qa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“„ Smart Document Q&A – Technical Overview & Progress Log

βœ… Objective

To build a real-world RAG (Retrieval-Augmented Generation) system that can:

  • Index a collection of PDF documents

  • Allow users to ask natural language questions

  • Retrieve relevant context and generate coherent, grounded answers using an LLM

πŸ—οΈ System Architecture

1. Backend (FastAPI)

  • Accepts PDF uploads via /upload

  • Extracts and chunks text into smaller segments

  • Embeds each chunk using all-MiniLM-L6-v2 from sentence-transformers

  • Stores vectors in a FAISS index (inner product for cosine similarity)

  • Metadata (chunk, source filename, etc.) saved in JSON

  • /ask endpoint performs:

    • Query embedding

    • FAISS retrieval of top-k chunks

    • Prompt construction with context

    • LLM call via local llama-simple binary (gguf model)

    • Return of answer

2. Frontend (Next.js + Tailwind)

  • Simple interface for:

    • Typing a question

    • Viewing a loading state

    • Displaying final answers and sources

  • Calls /ask endpoint from backend via lib/api.ts

🧠 What Works Well

  • βœ… Document ingestion works for PDFs.

  • βœ… Chunking and embedding pipeline is functional.

  • βœ… FAISS retrieval gives relevant chunks.

  • βœ… llama-simple can generate responses based on prompt.

  • βœ… Frontend successfully queries backend and shows answers.

⚠️ Issues & Limitations

  • πŸ”Έ Prompt sometimes misleads the LLM into hallucinations.

  • πŸ”Έ Answers were occasionally not grounded in retrieved content.

  • πŸ”Έ LLM was not reliably using source context until we structured the prompt better.

  • πŸ”Έ The local model (llama-simple) isn't fine-tuned for QA.

  • πŸ”Έ No reranking or source-based highlighting yet.

  • πŸ”Έ Some responses were poorly formatted (e.g., odd punctuation, clipped endings).

πŸ”§ Recent Improvements

  • βœ… Step-by-step RAG architecture implemented

  • βœ… Added prompt engineering to improve grounding

  • βœ… Began parsing structured output (JSON-formatted answers)

  • βœ… Embedded chunk metadata into prompts for explainability

  • βœ… Clear separation of backend (retrieval & LLM) vs frontend

πŸ“¦ Data & Models Used

  • Model: all-MiniLM-L6-v2 (for embeddings)

  • LLM: Quantized LLaMA-2 7B (via llama.cpp)

  • Indexing: FAISS (flat inner product)

  • Docs: Local PDF files (manually uploaded)

πŸ“… Next Steps (Planned)

  1. **Reranking using BAAI/bge-m3 or Cohere Reranker
    **

  2. Better structured output from LLM (JSON + citations)

  3. Citation highlighting on frontend (clickable chunks)

  4. Multi-doc support UI (PDF-specific filtering)

  5. Model fallback or hybrid LLM options (e.g., OpenAI for better answers)

  6. Upload interface & file explorer

About

A toy RAG to index, chunk and rank documents and respond to natural language queries.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published