Synapse AI is an advanced multimodal Retrieval-Augmented Generation (RAG) system with a React frontend, enabling intelligent querying across documents, images, and audio files.
- Multimodal Document Processing: Ingest and index PDFs, Word documents, text files, images, and audio
- Advanced Text Extraction: Smart chunking with semantic coherence
- Image Captioning: Automatic caption generation using Vision models
- Audio Transcription: Speech-to-text using Whisper
- Vector Search: Efficient similarity search using ChromaDB
- LLM-Powered Answers: Context-grounded responses with source citations
- 4-bit Quantization: Efficient model loading for resource-constrained environments
- RESTful API: FastAPI-based endpoints for ingestion and querying
- Web Interface: Modern React-based frontend for easy file upload and querying
├── api/ # FastAPI routes and endpoints
├── core/ # Core components (model loader, vector DB)
├── processing/ # File processing modules
├── services/ # Business logic (ingestion, query)
├── static/ # React frontend static files
├── data/ # Data storage
│ ├── ingested/ # Uploaded files
│ └── chromadb/ # Vector database
├── main.py # Application entry point
├── schemas.py # Pydantic models
├── config.py # Configuration settings
└── requirements.txt # Python dependencies
- Python 3.9+
- CUDA-compatible GPU (recommended for optimal performance)
- 16GB+ RAM (32GB recommended for larger models)
- 10GB+ disk space for models and data
- Install dependencies:
pip install -r requirements.txt- Configure environment variables (optional):
Create a
.envfile:
LLM_MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2
EMBEDDING_MODEL_NAME=sentence-transformers/all-MiniLM-L6-v2
VISION_MODEL_NAME=Salesforce/blip-image-captioning-base
WHISPER_MODEL_NAME=openai/whisper-base
USE_4BIT_QUANTIZATION=true
python main.pyThe server will start on http://0.0.0.0:8000 and automatically open your default web browser to the React frontend interface.
curl -X POST "http://localhost:8000/api/ingest" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@/path/to/your/document.pdf"Response:
{
"status": "success",
"filename": "document.pdf",
"message": "File indexed successfully.",
"chunks_created": 15
}curl -X POST "http://localhost:8000/api/query" \
-H "Content-Type: application/json" \
-d '{
"query_text": "What are the main findings?",
"top_k": 5
}'Response:
{
"query_text": "What are the main findings?",
"answer": "The main findings include... [1][2]",
"sources": [
{
"citation_number": 1,
"filename": "report.pdf",
"content_type": "pdf_page",
"details": {
"page_number": 5,
"text_snippet": "..."
}
}
]
}curl http://localhost:8000/api/healthcurl http://localhost:8000/api/statsVisit http://localhost:8000/docs for interactive Swagger UI documentation.
The system includes a modern, responsive React-based web interface that provides an intuitive way to interact with the RAG system. Features include:
- File upload interface for documents, images, and audio files
- Real-time query interface with source citations
- Responsive design optimized for desktop and mobile
- Automatic browser launch upon server startup
Access the web interface at http://localhost:8000 after starting the server.
- Documents: PDF, DOCX, DOC, TXT
- Images: PNG, JPG, JPEG, GIF, BMP
- Audio: MP3, WAV, M4A, FLAC, OGG
Key configuration parameters in config.py:
LLM_MODEL_NAME: Language model for generationEMBEDDING_MODEL_NAME: Model for text embeddingsVISION_MODEL_NAME: Model for image captioningWHISPER_MODEL_NAME: Model for audio transcriptionCHUNK_SIZE: Text chunk size (default: 500)CHUNK_OVERLAP: Overlap between chunks (default: 50)DEFAULT_TOP_K: Default number of results (default: 5)MAX_NEW_TOKENS: Maximum tokens in LLM response (default: 512)TEMPERATURE: LLM temperature (default: 0.7)
On startup, the system loads:
- Embedding model (sentence-transformers)
- LLM (Mistral-7B with 4-bit quantization)
- Vision model (BLIP for image captioning)
- Whisper model (for audio transcription)
Models are cached locally after first download.
- 4-bit Quantization: Reduces memory usage by ~4x
- GPU Acceleration: Automatic GPU detection and usage
- Batch Processing: Efficient embedding generation
- ChromaDB: Fast vector similarity search
The system includes comprehensive error handling:
- Invalid file type detection
- File size validation
- Model loading failures
- Vector database errors
- LLM generation timeouts
Logs are written to:
- Console (stdout)
rag_system.logfile
Configure log level in config.py or via LOG_LEVEL environment variable.
core/model_loader.py: ML model managementcore/vector_db.py: ChromaDB abstractionprocessing/document_parser.py: Document text extractionprocessing/image_analyzer.py: Image caption generationprocessing/audio_transcriber.py: Audio transcriptionservices/ingestion_service.py: File processing pipelineservices/query_service.py: RAG query pipelineapi/routes.py: FastAPI endpoints
- Add extension to
ALLOWED_FILE_EXTENSIONSinconfig.py - Implement parser in
processing/module - Add processing method in
ingestion_service.py - Update
get_file_type()indocument_parser.py
- Enable 4-bit quantization
- Use smaller models
- Reduce batch size
- Use CPU instead of GPU
- Check internet connection
- Verify Hugging Face access
- Manually download models to cache
- Ensure GPU is being used
- Check CUDA installation
- Reduce
MAX_NEW_TOKENS - Use smaller embedding model
This project is part of SIH2523 Problem Statement implementation.
1.0.0