A full-stack Retrieval-Augmented Generation (RAG) application for searching and querying SIGGRAPH 2025 research papers. This project combines a Next.js frontend with a FastAPI backend to provide an intelligent search interface powered by AI.
Build a production-ready RAG application that allows users to:
- Search through 11,000+ SIGGRAPH 2025 paper chunks
- Get AI-generated answers with inline citations
- View source papers with links to PDFs, GitHub repos, and videos
- Experience real-time streaming responses
This is a full-stack application with separate frontend and backend:
-
Frontend: Next.js app running on
http://localhost:3000- Modern React UI with Tailwind CSS
- Real-time streaming via Server-Sent Events (SSE)
- Responsive design with progress indicators
-
Backend: FastAPI server running on
http://localhost:8082- RESTful API with SSE streaming support
- Hybrid search (semantic + keyword)
- AI-powered answer generation with citations
The frontend communicates with the backend via HTTP API calls.
Chunked dataset in json form: https://drive.google.com/drive/folders/1-NaRLrjlMMW56ATTwXB5FYV84EjTOALT
- Python 3.8+
- Node.js 18+ and npm
- Git
- GitHub account
IMPORTANT: You must fork this repository to your own GitHub account to work on the assignment.
- Click the "Fork" button at the top right of this repository
- Select your GitHub account as the destination
- Clone your forked repository:
git clone https://github.com/YOUR_USERNAME/siggraph-rag.git cd siggraph-rag
You will be working in your fork, not the original repository. This allows you to:
- ✅ Push your code changes
- ✅ Track your progress with commits
- ✅ Deploy your own version
- ✅ Submit your work via your repository link
Follow these steps to set up your local environment.
cd backend# Create a virtual environment
python3 -m venv venv
# Activate it
# On macOS/Linux:
source venv/bin/activate
# On Windows:
.\venv\Scripts\activateInstall the required Python packages using the requirements.txt file.
pip install -r requirements.txtYou will need to provide API keys for the services used by the RAG pipeline. All models (embeddings and LLMs) are accessed via OpenRouter API for simplicity and cost-effectiveness.
Create a .env file in the backend directory:
cp .env.example .envNow, edit the .env file and add your API keys:
# Required: OpenRouter API (for all models)
OPENROUTER_API_KEY="sk-or-v1-..."
# Required: Qdrant Cloud
QDRANT_URL="https://your-cluster.qdrant.io"
QDRANT_API_KEY="your-qdrant-api-key"
# Model Configuration
EMBEDDING_MODEL="baai/bge-large-en-v1.5"
LLM_MODEL="gpt-4-turbo-preview"
# Reranker Configuration (choose one)
RERANKER_TYPE="cohere"
COHERE_API_KEY="your-cohere-key"Get Your API Keys:
- Required: OpenRouter: https://openrouter.ai/keys
- Required: Qdrant Cloud: https://cloud.qdrant.io
- Optional: Cohere (only if using Cohere reranker): https://dashboard.cohere.com/api-keys
Reranker Options:
cross-encoder(default): Local model, no API key needed, ~400MB download on first usecohere: Cloud API, requires API key, faster for large batchesnone: Disable reranking
Once the setup is complete, you can run the FastAPI server.
python api_server.pyThe server will start on http://localhost:8082.
╔════════════════════════════════════════════════════════════════╗
║ SIGGRAPH 2025 RAG API Server ║
╠════════════════════════════════════════════════════════════════╣
║ ║
║ Starting server on http://localhost:8082 ║
║ ║
╚════════════════════════════════════════════════════════════════╝
This server exposes the following key endpoints for the frontend:
GET /health: Health check to verify the server is running.GET /api/info: Provides information about the API.GET /api/stream: The main endpoint for handling RAG queries via Server-Sent Events (SSE).POST /api/query: An optional non-streaming endpoint for queries.
API documentation is automatically generated by FastAPI and can be viewed at http://localhost:8082/docs.
For the frontend to communicate with this backend, you must configure its environment.
-
Navigate to the root of the frontend project .
-
Create a file named
.env.localif it doesn't exist. -
Add the following line to specify the backend API URL:
# week3_assignment/.env.local NEXT_PUBLIC_API_URL=http://localhost:8082 -
Run the frontend development server from the
week3_assignmentdirectory:npm install npm run dev
Now, when you access the frontend at http://localhost:3000, it will make API calls to your running backend at http://localhost:8082.
This project includes a mock RAG implementation (test_backend_integration.py) that allows you to test the API integration without needing a fully functional RAG pipeline. The api_server.py is currently configured to use this mock.
To implement the actual RAG logic, you will need to:
- Create
rag_generate.pyandretrieval_pipeline.py. - Update
api_server.pyto importRAGGeneratorfromrag_generateinstead oftest_backend_integration.
The api_server.py is already complete. Your job is to implement the RAG logic in 3 files.
Download chunks.json from the Google Drive link. Each chunk contains:
chunk_id- Unique ID (use as vector ID in Qdrant)text- Content to embed and searchtitle,authors,pdf_url,github_link,video_link- Metadata for citations
Starter files are already created in backend/ with TODO comments. Open each file and implement the functions:
| File | Purpose |
|---|---|
upload_from_npz.py |
Script to upload embeddings from .npz file to Qdrant cloud |
retrieval_pipeline.py |
Hybrid search (semantic + BM25 + reranking) |
rag_generate.py |
Generate answers using LLM |
We provide pre-computed embeddings using BAAI/bge-large-en-v1.5 model (1024 dimensions) to avoid rate limiting issues and speed up the process. drive link: https://drive.google.com/file/d/1kJwXIe5mc-nI_ga-b6KFAW33-MHucSqE/view?usp=drive_link
File: backend/upload_from_npz.py
This script is already implemented - it loads pre-computed embeddings from embeddings_BAAI_bge_large_en_v1.5.npz and uploads them to Qdrant Cloud.
Run once to upload:
cd backend
python upload_from_npz.py --recreateThis takes ~5 minutes (no API calls for embeddings). The --recreate flag deletes any existing collection and starts fresh.
What it does:
- Loads 11,008 chunks from
chunks.json - Loads pre-computed embeddings from
.npzfile - Creates Qdrant collection with 1024-dimensional vectors
- Uploads all chunks with embeddings in batches
Use upload_to_qdrant.py to upload chunks to Qdrant Cloud by creating embedding using openrouter BAAI_bge_large_en_v1.5 model. Make sure the embedding model is BAAI_bge_large_en_v1.5.
Run once to upload:
cd backend
python upload_to_qdrant.py --recreateFile: backend/retrieval_pipeline.py
Implement these classes/functions:
| Class/Function | What it does |
|---|---|
OpenRouterEmbedder.__init__() |
Store API key, model (baai/bge-large-en-v1.5), base URL |
OpenRouterEmbedder.embed_query() |
Get embedding for a query via OpenRouter API (must match upload model) |
BM25Index.__init__() |
Tokenize chunks and build BM25 index |
BM25Index._tokenize() |
Convert text to lowercase word tokens |
BM25Index.search() |
Return top-k BM25 matches |
RetrievalPipeline.__init__() |
Initialize Qdrant client, embedder, BM25 index |
RetrievalPipeline.semantic_search() |
Query Qdrant with embedded query |
RetrievalPipeline.bm25_search() |
Run BM25 keyword search |
RetrievalPipeline.hybrid_search() |
Combine semantic + BM25 with weighted scores |
RetrievalPipeline.rerank() |
(Optional) Rerank using Cohere API |
RetrievalPipeline.retrieve() |
Full pipeline → returns RetrievalResult list |
Test:
python retrieval_pipeline.py "3D Gaussian Splatting"File: backend/rag_generate.py
Implement these methods:
| Method | What it does |
|---|---|
RAGGenerator.__init__() |
Initialize config, retrieval pipeline, API key |
RAGGenerator.refine_query() |
(Optional) Use LLM to improve search query |
RAGGenerator._format_context() |
Format retrieved chunks into context string |
RAGGenerator._build_sources_metadata() |
Build unique sources list for citations |
RAGGenerator._call_llm() |
Call OpenRouter chat API to generate answer |
RAGGenerator.generate() |
Full RAG pipeline → returns answer + sources |
Test:
python rag_generate.py "What is 3D Gaussian Splatting?"Once everything works, update api_server.py to use your implementation:
Change this line (around line 36):
from test_backend_integration import RAGGenerator, GenerationConfig, SYSTEM_PROMPTTo:
from rag_generate import RAGGenerator, GenerationConfig, SYSTEM_PROMPTThen run:
python api_server.pyTest with curl:
curl -N "http://localhost:8082/api/stream?query=3D%20Gaussian%20Splatting&top_k=5"Setup (do once):
- Download
chunks.jsonfrom Google Drive - Get OpenRouter API key: https://openrouter.ai/keys
- Create Qdrant Cloud account: https://cloud.qdrant.io
- Create a Qdrant cluster (free tier is fine)
- Get Qdrant URL and API key from dashboard
- (Optional) Get Cohere API key for reranking: https://dashboard.cohere.com
Step 1 - Upload to Qdrant:
- Create
upload_to_qdrant.py - Implement
load_chunks() - Implement
get_embeddings_batch() - Implement
create_qdrant_collection() - Implement
upload_chunks_to_qdrant() - Run the script:
python upload_to_qdrant.py - Verify in Qdrant dashboard that vectors are uploaded
Step 2 - Retrieval Pipeline:
- Create
retrieval_pipeline.py - Implement
OpenRouterEmbedderclass - Implement
BM25Indexclass - Implement
RetrievalPipeline.__init__() - Implement
semantic_search() - Implement
bm25_search() - Implement
hybrid_search() - Implement
rerank()(optional) - Implement
retrieve() - Test: retrieval returns results
Step 3 - RAG Generator:
- Create
rag_generate.py - Implement
RAGGenerator.__init__() - Implement
refine_query()(optional) - Implement
_format_context() - Implement
_build_sources_metadata() - Implement
_call_llm() - Implement
generate() - Test:
python rag_generate.py "test query"
Step 4 - Integration:
- Update import in
api_server.py - Run
python api_server.py - Test with curl
- Test with frontend UI
- Start with Step 1 - You can't do retrieval without vectors in Qdrant
- Test each function individually before moving on
- Use print statements to debug API responses
- Check API costs - OpenRouter shows usage at https://openrouter.ai/activity
- Use cheaper models for testing (
gpt-3.5-turboinstead ofgpt-4) - Read error messages carefully - they usually tell you what's wrong
This section covers deploying your full-stack RAG application to production using free hosting platforms.
- Frontend: Deploy to Vercel (free tier)
- Backend: Deploy to Render (free tier)
- Vector Database: Qdrant Cloud (free tier)
- APIs: OpenRouter (pay-as-you-go)
Before deploying, ensure you have:
- ✅ Working local development setup
- ✅ All API keys configured in
.env - ✅ GitHub account for code hosting
- ✅ Vercel account (sign up at https://vercel.com)
- ✅ Render account (sign up at https://render.com)
Create a .gitignore file in your project root to exclude sensitive files:
# Dependencies
node_modules/
.next/
venv/
__pycache__/
# Environment variables
.env
.env.local
.env*.local
# Build outputs
/build
/dist
/.next/
/out/
# OS files
.DS_Store
*.log
# IDE
.vscode/
.idea/Ensure your api_server.py reads the port from environment variables (already configured):
# This is already in your api_server.py
port = int(os.getenv("PORT", 8082))-
Initialize Git Repository:
cd /path/to/your/project git init git add . git commit -m "Initial commit: SIGGRAPH RAG application"
-
Create GitHub Repository:
- Go to https://github.com/new
- Create a new repository (e.g.,
siggraph-rag) - Don't initialize with README (you already have one)
-
Push to GitHub:
git remote add origin https://github.com/YOUR_USERNAME/siggraph-rag.git git branch -M main git push -u origin main
-
Click "New +" → "Web Service"
-
Connect your GitHub repository
-
Configure the service:
- Name:
siggraph-rag-backend - Region: Choose closest to you
- Branch:
main - Root Directory:
backend - Runtime:
Python 3 - Build Command:
pip install -r requirements.txt - Start Command:
uvicorn api_server:app --host 0.0.0.0 --port $PORT - Instance Type:
Free
- Name:
In the Render dashboard, add these environment variables:
OPENROUTER_API_KEY=sk-or-v1-...
QDRANT_URL=https://your-cluster.qdrant.io
QDRANT_API_KEY=your-qdrant-key
EMBEDDING_MODEL=baai/bge-large-en-v1.5
LLM_MODEL=gpt-4-turbo-preview
RERANKER_TYPE=cohere
COHERE_API_KEY=your-cohere-key
Memory Optimization Tips:
- Use
RERANKER_TYPE=cohereinstead ofcross-encoderto avoid downloading large models - If you must use local reranking, use smaller models like
cross-encoder/ms-marco-TinyBERT-L-2-v2
- Click "Create Web Service"
- Wait 5-10 minutes for deployment
- Your backend will be live at:
https://siggraph-rag-backend.onrender.com
Test your backend:
curl https://siggraph-rag-backend.onrender.com/healthExpected response:
{"status":"healthy","rag_initialized":true,"timestamp":1234567890}Important: Render's free tier spins down after 15 minutes of inactivity. First request after sleep takes ~30 seconds to wake up.
npm install -g vercelcd frontendCreate .env.production file:
NEXT_PUBLIC_API_URL=https://siggraph-rag-backend.onrender.comvercel login
vercel --prodFollow the prompts
-
Select your project
-
Go to Settings → Environment Variables
-
Add:
- Key:
NEXT_PUBLIC_API_URL - Value:
https://siggraph-rag-backend.onrender.com - Environments: Production, Preview, Development
- Key:
-
Redeploy:
vercel --prod
Your frontend will be live at: https://siggraph-rag.vercel.app
- Open your frontend URL in a browser
- Submit a test query: "What is 3D Gaussian Splatting?"
- Verify:
- ✅ Connection established
- ✅ Progress updates appear
- ✅ Answer streams in real-time
- ✅ Source papers displayed with links
Problem: Backend returns 404
- Solution: Check
Start Commandis correct:uvicorn api_server:app --host 0.0.0.0 --port $PORT
Problem: Backend crashes on startup
- Solution: Check logs in Render dashboard, verify all environment variables are set
Problem: "No open ports detected"
- Solution: Ensure your server binds to
0.0.0.0and uses$PORTenvironment variable
Problem: "Connection error" in browser
- Solution: Verify
NEXT_PUBLIC_API_URLis set correctly in Vercel environment variables
Problem: Frontend shows old version
- Solution: Clear Vercel cache and redeploy:
vercel --prod --force
Problem: CORS errors
- Solution: Verify backend has CORS middleware configured (already in
api_server.py)
Problem: Slow first request (30+ seconds)
- Solution: Normal for Render free tier - backend is waking up from sleep
Problem: API costs too high
- Solution: Use cheaper models for testing (e.g.,
gpt-3.5-turboinstead ofgpt-4)
- Vercel: Unlimited deployments, 100GB bandwidth/month
- Render: 750 hours/month (enough for one service), spins down after 15min inactivity
- Qdrant Cloud: 1GB storage, 1M vectors
- OpenRouter: Pay-as-you-go (no free tier)
-
Use cheaper models for development:
LLM_MODEL=gpt-3.5-turbo # ~$0.001/1K tokens vs $0.01/1K for GPT-4 # Note: Keep EMBEDDING_MODEL=baai/bge-large-en-v1.5 to match uploaded embeddings -
Reduce retrieval size:
retrieval_top_k=5 # Instead of 8
-
Disable query refinement for testing:
refine_query=False
-
Cache common queries (advanced):
- Implement Redis caching for frequent questions
Render Dashboard → Your Service → Logs
Vercel Dashboard → Your Project → Deployments → View Logs
- OpenRouter: https://openrouter.ai/activity
- Cohere: https://dashboard.cohere.com/billing
- Qdrant: https://cloud.qdrant.io
When you make code changes:
# Push to GitHub
git add .
git commit -m "Your changes"
git push origin main
# Render will auto-deploy backend
# Redeploy frontend
cd frontend
vercel --prod- Environment Variables: Never commit API keys to Git
- Error Handling: Add comprehensive error messages for debugging
- Rate Limiting: Implement rate limiting to prevent API abuse
- Monitoring: Set up uptime monitoring (e.g., UptimeRobot)
- Backups: Regularly backup your Qdrant database
- Documentation: Keep README updated with deployment changes
- Testing: Test on staging environment before production deployment
Your final project structure should look like this:
siggraph-rag/
├── backend/
│ ├── api_server.py # FastAPI server (provided)
│ ├── rag_generate.py # RAG orchestration (you implement)
│ ├── retrieval_pipeline.py # Retrieval logic (you implement)
│ ├── requirements.txt # Python dependencies
│ ├── .env # Environment variables (not in git)
│ ├── .env.example # Example env file
│ └── chunks.json # Paper data
├── frontend/
│ ├── src/
│ │ ├── app/
│ │ │ ├── page.tsx # Main page
│ │ │ └── layout.tsx # Root layout
│ │ ├── components/
│ │ │ └── rag/ # RAG UI components
│ │ └── hooks/
│ │ ├── useRAGStream.ts # SSE hook (recommended)
│ │ └── useRAGWebSocket.ts # WebSocket hook
│ ├── package.json
│ ├── .env.local # Local env (not in git)
│ └── .env.production # Production env (not in git)
├── .gitignore
└── README.md
- FastAPI: https://fastapi.tiangolo.com
- Next.js: https://nextjs.org/docs
- Qdrant: https://qdrant.tech/documentation
- OpenRouter: https://openrouter.ai/docs
- OpenAI API: https://platform.openai.com/docs/api-reference
- Cohere Rerank: https://docs.cohere.com/reference/rerank
- Qdrant Client: https://qdrant.tech/documentation/interfaces
- RAG Tutorial: https://www.pinecone.io/learn/retrieval-augmented-generation
- Vector Search: https://www.pinecone.io/learn/vector-search
- Semantic Search: https://www.sbert.net/examples/applications/semantic-search/README.html
- FastAPI Discord: https://discord.gg/fastapi
- Next.js Discord: https://discord.gg/nextjs
- Qdrant Discord: https://discord.gg/qdrant
Before submitting your project, ensure:
- Backend runs locally without errors
- Frontend runs locally and connects to backend
- All API endpoints return correct responses
- SSE streaming works properly
- Source citations appear correctly
- Code is well-documented with comments
- Environment variables are documented in
.env.example -
.gitignoreexcludes sensitive files - README is updated with your implementation notes
- Backend is deployed to Render (or alternative)
- Frontend is deployed to Vercel (or alternative)
- Deployed app is fully functional
If you encounter issues:
- Check the logs: Backend (Render) and Frontend (Vercel) dashboards
- Review this README: Most common issues are covered
- Search GitHub Issues: Check if others had similar problems
- Ask for help: Post in circle or as your TA with:
- Error message
- What you tried
- Relevant code snippets
- Environment (local/deployed)
Good luck building your RAG application! 🚀