A powerful, production-ready document analysis system that combines Retrieval Augmented Generation (RAG) with a modern web interface. Upload PDF documents and chat with them using advanced AI capabilities.
- Advanced PDF Processing: Better text extraction with element-type handling (text, tables, images)
- Intelligent Chunking: Optimized text segmentation with overlap for better context preservation
- Semantic Search: ChromaDB vector store with Google AI embeddings for precise document retrieval
- Context-Aware Responses: Improved prompt engineering with source citations
- FastAPI with comprehensive error handling and validation
- Rate Limiting to prevent abuse
- Session Management with persistent storage
- Health Monitoring with detailed service status
- Structured Logging for better debugging
- File Upload Validation with size and type checking
- CORS Configuration for secure cross-origin requests
- Next.js 15 with App Router and TypeScript
- ShadCN UI Components for consistent, accessible design
- Complete Black Theme with subtle accents
- Mobile-Responsive design with drawer navigation
- Real-time Progress indicators for uploads
- Toast Notifications for better UX
- Keyboard Shortcuts (Ctrl+Enter to send, Ctrl+L to clear)
- Docker Support for easy deployment
- Environment Configuration with validation
- API Documentation with OpenAPI/Swagger
- Type Safety throughout the application
- Error Boundaries for graceful error handling
Frontend (Next.js) Backend (FastAPI) Vector Store (ChromaDB)
β β β
ββ Upload Interface ββ PDF Processing ββ Document Embeddings
ββ Chat Interface ββ Text Chunking ββ Similarity Search
ββ Session Management ββ OpenRouter Integrationββ Metadata Storage
ββ Progress Tracking ββ Response Generation ββ Session Isolation
- Node.js 18+ and npm
- Python 3.11+
- OpenRouter API Key (Get one here)
- Google AI API Key for embeddings (Get one here)
git clone <your-repo-url>
cd multimodal-ragcd backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r Requirements.txt
# Configure environment
cp env.example .env
# Edit .env with your OpenRouter and Google AI API keys
# Run the backend
python main.pycd frontend
# Install dependencies
npm install
# Start development server
npm run dev- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
# Create environment file
cp backend/env.example .env
# Add your OpenRouter and Google AI API keys to .env
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f# Backend
cd backend
docker build -t multimodal-rag-backend .
# Frontend
cd frontend
docker build -t multimodal-rag-frontend .POST /upload- Upload and process PDF documentsGET /sessions- List all document sessionsDELETE /sessions/{id}- Delete a specific session
POST /chat- Send questions about uploaded documentsPOST /visualize-embeddings- Generate embeddings for visualization
GET /health- Health check with service status
# OpenRouter Configuration
OPENROUTER_API_KEY=your_openrouter_api_key_here
OPENROUTER_API_BASE=https://openrouter.ai/api/v1/
SITE_URL=http://localhost:3000
SITE_NAME=Multimodal RAG Assistant
# Model Configuration (OpenRouter format)
CHAT_MODEL=google/gemini-2.5-flash-lite-preview-06-17
# Google AI for Embeddings
GOOGLE_API_KEY=your_google_api_key_here
EMBEDDING_MODEL=models/text-embedding-004
# File Processing
PDF_UPLOAD_DIR=./uploads
CHROMA_PATH=./chroma
CHUNK_SIZE=1000
CHUNK_OVERLAP=100
MAX_FILE_SIZE=50
# Server
CORS_ORIGINS=http://localhost:3000
LOG_LEVEL=INFO
RATE_LIMIT_REQUESTS=100
RATE_LIMIT_PERIOD=60NEXT_PUBLIC_API_URL=http://localhost:8000multimodal-rag/
βββ backend/
β βββ app/
β β βββ services/
β β β βββ chroma_service.py # Vector database management
β β βββ schemas/
β β βββ schemas.py # API data models
β βββ processing/
β β βββ pdf_processor.py # Enhanced PDF processing
β βββ main.py # FastAPI application
β βββ config.py # Configuration management
β βββ Requirements.txt # Python dependencies
β βββ Dockerfile # Backend container
βββ frontend/
β βββ src/
β β βββ app/ # Next.js App Router
β β βββ components/ # React components
β β β βββ ui/ # ShadCN UI components
β β βββ context/ # React context providers
β β βββ hooks/ # Custom hooks
β β βββ utils/ # Utility functions
β βββ package.json # Node.js dependencies
β βββ Dockerfile # Frontend container
βββ docker-compose.yml # Orchestration
βββ README.md # This file
- Supports PDF files up to 50MB
- Real-time progress tracking during processing
- Automatic text chunking and embedding generation
- Session-based isolation for multiple documents
- Ask questions about your uploaded documents
- Responses include source citations
- Context-aware answers based on document content
- Session persistence across browser refreshes
Ctrl + Enter: Send messageCtrl + L: Clear chat history
- Responsive design with mobile-optimized navigation
- Drawer-style sidebar for small screens
- Touch-friendly interface elements
- Add new endpoints in
main.py - Create data models in
app/schemas/schemas.py - Add business logic in
app/services/ - Update configuration in
config.py
- Create components in
src/components/ - Add pages in
src/app/ - Update API calls in
src/utils/api.ts - Style with Tailwind CSS and ShadCN components
# Backend tests
cd backend
python -m pytest tests/
# Frontend tests
cd frontend
npm test- Backend health:
GET /health - Service status monitoring included
- Docker health checks configured
- Structured logging with configurable levels
- Request/response tracking
- Error monitoring with stack traces
-
OpenRouter API Errors
- Verify OpenRouter API key is set correctly
- Check OpenRouter credits and billing
- Ensure API base URL is correct
-
Google AI Embedding Errors
- Verify Google AI API key is set correctly
- Check Google AI API quota
- Ensure embedding model is available
-
File Upload Failures
- Check file size (max 50MB)
- Ensure PDF format
- Verify sufficient disk space
-
ChromaDB Issues
- Clear
chroma/directory to reset - Check file permissions
- Verify sufficient memory
- Clear
# Clear all data
rm -rf backend/chroma/ backend/uploads/
rm -rf vector_store/ uploads/
# Restart services
docker-compose down -v
docker-compose up -d- Rate limiting prevents API abuse
- File type validation prevents malicious uploads
- CORS properly configured for production
- Environment variables for sensitive data
- Input validation and sanitization
- Chunked file processing for large documents
- Efficient vector similarity search
- Connection pooling for database operations
- Static asset optimization in production
- Caching strategies for repeated queries
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenRouter for unified LLM API access
- Google AI for Gemini models and embeddings
- ChromaDB for vector database capabilities
- FastAPI for the robust backend framework
- Next.js for the modern frontend framework
- ShadCN for beautiful UI components