The project consists of a Python FastAPI backend and a Vite + React frontend.
The backend contains the API server and uses the Qwen 3.5 model for medical responses.
Prerequisites: Python 3.10+
Setup:
- Navigate to the
backend/directory:cd backend - Create and activate a virtual environment (recommended):
python -m venv venv # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activate
- Install the unified dependencies from the root directory:
pip install -r ../requirements.txt
Qwen Model Setup:
Make sure you have Ollama installed, then pull the base model:
ollama pull qwen3.5:4bRunning the API Server:
uvicorn main:app --reload --host 0.0.0.0 --port 8000Data Pipeline: The project includes a standalone data extraction pipeline to download Kaggle, Roboflow, and Hugging Face image datasets.
-
Navigate to the data pipeline directory:
cd backend/data_pipeline -
The script will automatically load the required API keys (KAGGLE_DATASETS, ROBOFLOW_DATASETS, etc.) from the
.envfile at the root of the project. -
The dependencies are already included in the unified root
requirements.txt. -
Run the stream & purge pipeline script:
python pipeline.py
Two-Tower RAG Engine (Local & Multimodal): The project features a custom RAG engine built on ChromaDB designed to run smoothly on both heavy GPUs (RTX 5070 Ti) and power-efficient NPU architectures (Snapdragon X Elite). Instead of relying on external services or massive embedding models, it splits the vectorization into two lightweight, highly specialized towers:
SigLIP(Image Tower): A multimodal model that embeds the 2k+ medical images from the pipeline, allowing retrieval of reference images via text symptoms.PubMedBERT(Text Tower): A tiny but clinically accurate model that embeds the HuggingFace medical facts dataset.
To pre-compute the vector database from scratch:
Requires pipeline.py to be run completely first.
cd backend
python rag.py --ingest-images --ingest-hfThis generates the chroma_db folder locally. You can commit/share this folder so other machines (like a Snapdragon laptop) can run the RAG instantly without needing to calculate embeddings.
The frontend is a web app built using React and Vite.
Prerequisites: Node.js (v18+)
Setup and Run:
- Navigate to the
frontend/directory:cd frontend - Install dependencies (you can use
npm,yarn, orpnpm):npm install
- Start the Vite development server:
npm run dev
Generating the vector database was performed locally on the following system:
| Component | Specification |
|---|---|
| CPU | Intel Core i7-265KF |
| RAM | 32 GB |
| GPU | NVIDIA GeForce RTX 5070 Ti (16 GB VRAM) |
- FastAPI: Fast, high-performance web framework used for the backend API. Created by Sebastián Ramírez (tiangolo).
- Uvicorn: ASGI web server implementation for Python. Created by Tom Christie / Encode OSS.
- React: JavaScript library for building the frontend user interfaces. Created by Meta / Facebook.
- Vite: Next-generation frontend tooling used for fast compilation and serving. Created by Evan You.
- Tailwind CSS: Utility-first CSS framework for styling the frontend. Created by Adam Wathan.
- Radix UI: Unstyled, accessible UI components used in the frontend. Created by Modulz / WorkOS.
- Hugging Face Ecosystem: Includes Transformers, PEFT, TRL, and Datasets used to load and fine-tune models. Created by the Hugging Face Team.
- SentenceTransformers: Python framework for state-of-the-art text and image embeddings. Created by Nils Reimers and UKPLab.
- SentencePiece: Unsupervised text tokenizer for neural network-based text generation. Created by Taku Kudo and John Richardson (Google).
The sophisticated local processing of this application is only possible thanks to the open-sourcing of several state-of-the-art models and datasets:
- Qwen: The open-weights foundation model that powers the medical emergency reasoning pipeline. Created by Alibaba Cloud.
- CLIP (
sentence-transformers/clip-ViT-B-32): Multimodal image-text embedding model used in the image tower of our RAG engine. Created by OpenAI. - PubMedBERT (
pritamdeka/S-PubMedBert-MS-MARCO): Clinical text embedding model used in the text tower of our RAG engine. Original architecture created by Yu Gu et al. (Microsoft Research); fine-tuned MS-MARCO version uploaded by Pritam Deka. - MedRescue Dataset: Clinical emergency response dataset used to feed medical facts into our RAG store. Curated and published by Eric Risco.
- Data Pipeline Integrations: Sourced datasets and tooling are integrated through platforms provided by Kaggle (Google) and Roboflow (Roboflow Inc.).
- RAG System with Two Embedding Models: We built a Retrieval-Augmented Generation pipeline using two specialized embedding models (SigLIP for images and PubMedBERT for text) to ground the LLM's responses in real medical data. While the vector database and retrieval logic were fully implemented, we were unable to integrate the RAG system into the final product within the hackathon timeframe.
- Fine-Tuned Gemma 4 Model: We fine-tuned Google's Gemma 4 model on medical emergency data using QLoRA (4-bit quantization with LoRA adapters). However, Gemma 4 is too new to have stable LoRA support — the adapter export and inference pipeline broke due to incomplete upstream tooling, making the fine-tuned model unusable in production.