LLM-Powered Knowledge Extraction & Retrieval (RAG)
🔗 Live API (Swagger UI):
👉 https://enterprise-docs-ai-production.up.railway.app/docs
The Enterprise Document Intelligence System is a cloud-native, multi-tenant platform that enables organizations to ingest documents, extract knowledge, and perform secure, explainable, LLM-powered question answering over their internal data.
This system is designed for real enterprise use cases such as:
- Internal knowledge search
- Policy & compliance analysis
- Research intelligence
- Document-heavy decision support
- JWT-based authentication
- Role-based access control (Admin / User)
- Strict per-user document isolation (multi-tenancy)
- PDF upload with automatic parsing
- Adaptive chunking strategy based on:
- Document structure
- End-use (Q&A focused retrieval)
- Rich metadata captured for traceability
- High-performance vector search using Pinecone
- Retrieval-Augmented Generation (RAG)
- Groq-powered LLM inference for low-latency responses
- Answers strictly grounded in retrieved document context
- Every response includes:
- Source document name
- Chunk index
- Relevance score
- Enables explainability and enterprise trust
- Per-user query tracking
- Usage logging for monitoring and future billing
- Designed for scalability and observability
- Fully Dockerized backend
- Deployed on Railway
- External managed services for reliability and scale
Client (Swagger / UI)
↓
FastAPI Backend (Railway)
├── Auth & RBAC
├── Document Upload & Parsing
├── Adaptive Chunking Engine
├── Vector Store (Pinecone)
├── RAG Pipeline
└── Usage Analytics
↓
External Services
├── Supabase Postgres (Metadata + Analytics)
├── Pinecone (Vector Search)
└── Groq (LLM Inference)
backend/
├── app/
│ ├── api/
│ │ ├── auth.py # Authentication endpoints
│ │ ├── upload.py # Document ingestion
│ │ ├── query.py # RAG query endpoint
│ │ ├── admin.py # Admin operations
│ │ └── dependencies.py # Security dependencies
│ │
│ ├── core/
│ │ ├── config.py # Centralized configuration
│ │ ├── database.py # SQLAlchemy setup
│ │ └── security.py # JWT utilities
│ │
│ ├── models/
│ │ ├── user.py # User schema
│ │ ├── document.py # Document metadata
│ │ └── usage.py # Query analytics
│ │
│ ├── services/
│ │ ├── auth.py # Auth logic
│ │ ├── chunking.py # Adaptive chunking
│ │ ├── vector_store.py # Pinecone upsert logic
│ │ ├── retriever.py # Semantic retrieval
│ │ ├── rag.py # RAG orchestration
│ │ ├── llm.py # Groq integration
│ │ └── analytics.py # Usage tracking
│ │
│ ├── utils/
│ │ └── file_loader.py # PDF text extraction
│ │
│ └── main.py # Application entry point
│
├── Dockerfile
└── requirements.txt
- User uploads a document
- Document is parsed and chunked intelligently
- Chunks are embedded and stored in Pinecone
- User submits a query
- Relevant chunks are retrieved semantically
- Retrieved context is passed to the LLM
- Final answer is generated with citations
{
"answer": "The internship duration mentioned in the document is 6 months.",
"sources": [
{
"filename": "Internship_Description.pdf",
"chunk_index": 3,
"score": 0.91
}
]
}- Namespace-based vector isolation per user
- JWT validation on every request
- No cross-user data leakage
- LLM responses restricted to retrieved context only
- Stateless backend
- External vector and database services
- Ready for hybrid dense + sparse search
- Easy extension to streaming and reranking