An AI-powered full-stack application that enables users to upload and process multiple documents, then generates concise summaries or context-aware answers to queries using OpenAI’s LLMs.
The system integrates FastAPI on the backend, React on the frontend for a seamless user experience, and FAISS for efficient vector search—ensuring accurate retrieval and scalable performance even with large document collections.
- 📂 Multi-format Uploads – Supports PDF, DOCX, and TXT files
- 🧩 Chunking & Embeddings – Splits documents into chunks and embeds them using OpenAI models
- ⚡ FAISS Vector Search – Fast similarity search across document chunks
- 🤖 AI Summarization & Q&A – Generate summaries and answer user questions with context
- 🌐 Frontend + Backend – Modern React-based frontend powered by FastAPI
- 💾 Persistent Storage – Save FAISS indices and re-use across sessions
- 📑 Export Results – Summaries and Q&A can be saved as JSON or PDF
ai_doc_summarizer/
│
├─ app/ # Backend (FastAPI)
│ ├─ main.py # FastAPI entry point
│ ├─ models/
│ │ ├─ embeddings.py # FAISS embedding and indexing
│ │ └─ llm.py # LLM summarization/Q&A
│ ├─ core/
│ │ └─ utils.py # Text extraction, chunking
│ └─ static/uploads/ # Uploaded documents
│
├─ frontend/ # Frontend (React app)
│ ├─ src/
│ │ ├─ components/ # Upload UI, Results display
│ │ ├─ pages/ # Summarizer, Q&A pages
│ │ └─ App.js # Main React app
│ └─ package.json
│
├─ faiss_index/ # Saved FAISS indices
├─ outputs/ # JSON / PDF results
├─ requirements.txt # Backend dependencies
├─ package.json # Frontend dependencies
└─ README.md
-
Backend: FastAPI (Python)
-
Frontend: React 18 + Vite, Custom CSS
-
LLM Provider: OpenAI GPT models (for summarization & Q&A)
-
Vector Database: FAISS
-
Utilities:
- PyMuPDF / PyPDF2 → PDF extraction
- docx2txt → Word file parsing
- dotenv → API key management
- pickle → Metadata persistence
- Upload Documents – User uploads PDFs/DOCX/TXT through the frontend.
- Text Extraction – Backend extracts text and converts to chunks.
- Embeddings & Indexing – Each chunk is embedded using OpenAI embeddings and stored in a FAISS index.
- Vector Search – On summarization or Q&A request, FAISS retrieves top relevant chunks.
- LLM Summarization / Q&A – OpenAI GPT generates summary or answers based on retrieved context.
- Output Delivery – Results are shown on frontend and stored in
/outputsas JSON/PDF.
git clone https://github.com/divyeshmutha12/AI_Document_Summarizer.git
cd ai-doc-summarizerpython -m venv venv
source venv/bin/activate # On Linux/Mac
venv\Scripts\activate # On Windowspip install -r requirements.txtCreate a .env file in the root directory:
OPENAI_API_KEY=your_openai_api_key
uvicorn app.main:app --reloadNavigate to: http://127.0.0.1:8000
- Generate executive summaries for multiple research papers
- Answer questions based on uploaded compliance/legal docs
- Summarize meeting notes from multiple files
- Knowledge extraction from large document collections
- ✅ Support for audio/video transcription (via Whisper)
- ✅ Multi-language summarization
- ✅ Fine-tuned LLM integration for domain-specific tasks
MIT License – Free to use and modify.