SIGGRAPH 2025 RAG Search Application

A full-stack Retrieval-Augmented Generation (RAG) application for searching and querying SIGGRAPH 2025 research papers. This project combines a Next.js frontend with a FastAPI backend to provide an intelligent search interface powered by AI.

🎯 Project Overview

Build a production-ready RAG application that allows users to:

Search through 11,000+ SIGGRAPH 2025 paper chunks
Get AI-generated answers with inline citations
View source papers with links to PDFs, GitHub repos, and videos
Experience real-time streaming responses

🏗️ Architecture

This is a full-stack application with separate frontend and backend:

Frontend: Next.js app running on http://localhost:3000
- Modern React UI with Tailwind CSS
- Real-time streaming via Server-Sent Events (SSE)
- Responsive design with progress indicators
Backend: FastAPI server running on http://localhost:8082
- RESTful API with SSE streaming support
- Hybrid search (semantic + keyword)
- AI-powered answer generation with citations

The frontend communicates with the backend via HTTP API calls.

Dataset

Chunked dataset in json form: https://drive.google.com/drive/folders/1-NaRLrjlMMW56ATTwXB5FYV84EjTOALT

Prerequisites

Python 3.8+
Node.js 18+ and npm
Git
GitHub account

🚀 Getting Started

Step 0: Fork This Repository

IMPORTANT: You must fork this repository to your own GitHub account to work on the assignment.

Click the "Fork" button at the top right of this repository
Select your GitHub account as the destination

Clone your forked repository:

git clone https://github.com/YOUR_USERNAME/siggraph-rag.git
cd siggraph-rag

You will be working in your fork, not the original repository. This allows you to:

✅ Push your code changes
✅ Track your progress with commits
✅ Deploy your own version
✅ Submit your work via your repository link

1. Setup Instructions

Follow these steps to set up your local environment.

a. Navigate to the Backend Directory

cd backend

b. Create and Activate a Virtual Environment (Recommended)

# Create a virtual environment
python3 -m venv venv

# Activate it
# On macOS/Linux:
source venv/bin/activate

# On Windows:
.\venv\Scripts\activate

c. Install Dependencies

Install the required Python packages using the requirements.txt file.

pip install -r requirements.txt

d. Set Up Environment Variables

You will need to provide API keys for the services used by the RAG pipeline. All models (embeddings and LLMs) are accessed via OpenRouter API for simplicity and cost-effectiveness.

Create a .env file in the backend directory:

cp .env.example .env

Now, edit the .env file and add your API keys:

# Required: OpenRouter API (for all models)
OPENROUTER_API_KEY="sk-or-v1-..."

# Required: Qdrant Cloud
QDRANT_URL="https://your-cluster.qdrant.io"
QDRANT_API_KEY="your-qdrant-api-key"

# Model Configuration
EMBEDDING_MODEL="baai/bge-large-en-v1.5"
LLM_MODEL="gpt-4-turbo-preview"

# Reranker Configuration (choose one)
RERANKER_TYPE="cohere"  
COHERE_API_KEY="your-cohere-key"

Get Your API Keys:

Required: OpenRouter: https://openrouter.ai/keys
Required: Qdrant Cloud: https://cloud.qdrant.io
Optional: Cohere (only if using Cohere reranker): https://dashboard.cohere.com/api-keys

Reranker Options:

cross-encoder (default): Local model, no API key needed, ~400MB download on first use
cohere: Cloud API, requires API key, faster for large batches
none: Disable reranking

2. Running the Backend Server

Once the setup is complete, you can run the FastAPI server.

python api_server.py

The server will start on http://localhost:8082.

╔════════════════════════════════════════════════════════════════╗
║           SIGGRAPH 2025 RAG API Server                        ║
╠════════════════════════════════════════════════════════════════╣
║                                                                ║
║  Starting server on http://localhost:8082                      ║
║                                                                ║
╚════════════════════════════════════════════════════════════════╝

3. API Endpoints

This server exposes the following key endpoints for the frontend:

GET /health: Health check to verify the server is running.
GET /api/info: Provides information about the API.
GET /api/stream: The main endpoint for handling RAG queries via Server-Sent Events (SSE).
POST /api/query: An optional non-streaming endpoint for queries.

API documentation is automatically generated by FastAPI and can be viewed at http://localhost:8082/docs.

4. Integrating with the Next.js Frontend

For the frontend to communicate with this backend, you must configure its environment.

Navigate to the root of the frontend project .
Create a file named .env.local if it doesn't exist.

Add the following line to specify the backend API URL:

# week3_assignment/.env.local
NEXT_PUBLIC_API_URL=http://localhost:8082

Run the frontend development server from the week3_assignment directory:
```
npm install
npm run dev
```

Now, when you access the frontend at http://localhost:3000, it will make API calls to your running backend at http://localhost:8082.

5. Development and Testing

This project includes a mock RAG implementation (test_backend_integration.py) that allows you to test the API integration without needing a fully functional RAG pipeline. The api_server.py is currently configured to use this mock.

To implement the actual RAG logic, you will need to:

Create rag_generate.py and retrieval_pipeline.py.
Update api_server.py to import RAGGenerator from rag_generate instead of test_backend_integration.

6. Building the RAG Pipeline (Student Implementation)

The api_server.py is already complete. Your job is to implement the RAG logic in 3 files.

📁 Understanding `chunks.json`

Download chunks.json from the Google Drive link. Each chunk contains:

chunk_id - Unique ID (use as vector ID in Qdrant)
text - Content to embed and search
title, authors, pdf_url, github_link, video_link - Metadata for citations

🔧 Files to Implement

Starter files are already created in backend/ with TODO comments. Open each file and implement the functions:

File	Purpose
`upload_from_npz.py`	Script to upload embeddings from .npz file to Qdrant cloud
`retrieval_pipeline.py`	Hybrid search (semantic + BM25 + reranking)
`rag_generate.py`	Generate answers using LLM

Choose one of the following options:

STEP 1 a: Upload Pre-computed Embeddings to Qdrant Cloud

We provide pre-computed embeddings using BAAI/bge-large-en-v1.5 model (1024 dimensions) to avoid rate limiting issues and speed up the process. drive link: https://drive.google.com/file/d/1kJwXIe5mc-nI_ga-b6KFAW33-MHucSqE/view?usp=drive_link

File: backend/upload_from_npz.py

This script is already implemented - it loads pre-computed embeddings from embeddings_BAAI_bge_large_en_v1.5.npz and uploads them to Qdrant Cloud.

Run once to upload:

cd backend
python upload_from_npz.py --recreate

This takes ~5 minutes (no API calls for embeddings). The --recreate flag deletes any existing collection and starts fresh.

What it does:

Loads 11,008 chunks from chunks.json
Loads pre-computed embeddings from .npz file
Creates Qdrant collection with 1024-dimensional vectors
Uploads all chunks with embeddings in batches

Step 1 b: Upload Chunks to Qdrant Cloud by creating embedding using openrouter

Use upload_to_qdrant.py to upload chunks to Qdrant Cloud by creating embedding using openrouter BAAI_bge_large_en_v1.5 model. Make sure the embedding model is BAAI_bge_large_en_v1.5.

Run once to upload:

cd backend
python upload_to_qdrant.py --recreate

STEP 2: Implement Retrieval Pipeline

File: backend/retrieval_pipeline.py

Implement these classes/functions:

Class/Function	What it does
`OpenRouterEmbedder.__init__()`	Store API key, model (`baai/bge-large-en-v1.5`), base URL
`OpenRouterEmbedder.embed_query()`	Get embedding for a query via OpenRouter API (must match upload model)
`BM25Index.__init__()`	Tokenize chunks and build BM25 index
`BM25Index._tokenize()`	Convert text to lowercase word tokens
`BM25Index.search()`	Return top-k BM25 matches
`RetrievalPipeline.__init__()`	Initialize Qdrant client, embedder, BM25 index
`RetrievalPipeline.semantic_search()`	Query Qdrant with embedded query
`RetrievalPipeline.bm25_search()`	Run BM25 keyword search
`RetrievalPipeline.hybrid_search()`	Combine semantic + BM25 with weighted scores
`RetrievalPipeline.rerank()`	(Optional) Rerank using Cohere API
`RetrievalPipeline.retrieve()`	Full pipeline → returns `RetrievalResult` list

Test:

python retrieval_pipeline.py "3D Gaussian Splatting"

STEP 3: Implement RAG Generator

File: backend/rag_generate.py

Implement these methods:

Method	What it does
`RAGGenerator.__init__()`	Initialize config, retrieval pipeline, API key
`RAGGenerator.refine_query()`	(Optional) Use LLM to improve search query
`RAGGenerator._format_context()`	Format retrieved chunks into context string
`RAGGenerator._build_sources_metadata()`	Build unique sources list for citations
`RAGGenerator._call_llm()`	Call OpenRouter chat API to generate answer
`RAGGenerator.generate()`	Full RAG pipeline → returns answer + sources

Test:

python rag_generate.py "What is 3D Gaussian Splatting?"

STEP 4: Connect to api_server.py

Once everything works, update api_server.py to use your implementation:

Change this line (around line 36):

from test_backend_integration import RAGGenerator, GenerationConfig, SYSTEM_PROMPT

To:

from rag_generate import RAGGenerator, GenerationConfig, SYSTEM_PROMPT

Then run:

python api_server.py

Test with curl:

curl -N "http://localhost:8082/api/stream?query=3D%20Gaussian%20Splatting&top_k=5"

📋 Implementation Checklist

Setup (do once):

Download chunks.json from Google Drive
Get OpenRouter API key: https://openrouter.ai/keys
Create Qdrant Cloud account: https://cloud.qdrant.io
Create a Qdrant cluster (free tier is fine)
Get Qdrant URL and API key from dashboard
(Optional) Get Cohere API key for reranking: https://dashboard.cohere.com

Step 1 - Upload to Qdrant:

Create upload_to_qdrant.py
Implement load_chunks()
Implement get_embeddings_batch()
Implement create_qdrant_collection()
Implement upload_chunks_to_qdrant()
Run the script: python upload_to_qdrant.py
Verify in Qdrant dashboard that vectors are uploaded

Step 2 - Retrieval Pipeline:

Step 3 - RAG Generator:

Create rag_generate.py
Implement RAGGenerator.__init__()
Implement refine_query() (optional)
Implement _format_context()
Implement _build_sources_metadata()
Implement _call_llm()
Implement generate()
Test: python rag_generate.py "test query"

Step 4 - Integration:

Update import in api_server.py
Run python api_server.py
Test with curl
Test with frontend UI

💡 Tips for Success

Start with Step 1 - You can't do retrieval without vectors in Qdrant
Test each function individually before moving on
Use print statements to debug API responses
Check API costs - OpenRouter shows usage at https://openrouter.ai/activity
Use cheaper models for testing (gpt-3.5-turbo instead of gpt-4)
Read error messages carefully - they usually tell you what's wrong

7. Deployment Guide

This section covers deploying your full-stack RAG application to production using free hosting platforms.

Deployment Architecture

Frontend: Deploy to Vercel (free tier)
Backend: Deploy to Render (free tier)
Vector Database: Qdrant Cloud (free tier)
APIs: OpenRouter (pay-as-you-go)

Prerequisites for Deployment

Before deploying, ensure you have:

✅ Working local development setup
✅ All API keys configured in .env
✅ GitHub account for code hosting
✅ Vercel account (sign up at https://vercel.com)
✅ Render account (sign up at https://render.com)

Step 1: Prepare Your Code for Deployment

a. Create `.gitignore` File

Create a .gitignore file in your project root to exclude sensitive files:

# Dependencies
node_modules/
.next/
venv/
__pycache__/

# Environment variables
.env
.env.local
.env*.local

# Build outputs
/build
/dist
/.next/
/out/

# OS files
.DS_Store
*.log

# IDE
.vscode/
.idea/

b. Update Backend for Production

Ensure your api_server.py reads the port from environment variables (already configured):

# This is already in your api_server.py
port = int(os.getenv("PORT", 8082))

Step 2: Push Code to GitHub

Initialize Git Repository:

cd /path/to/your/project
git init
git add .
git commit -m "Initial commit: SIGGRAPH RAG application"

Create GitHub Repository:
- Go to https://github.com/new
- Create a new repository (e.g., siggraph-rag)
- Don't initialize with README (you already have one)

Push to GitHub:

git remote add origin https://github.com/YOUR_USERNAME/siggraph-rag.git
git branch -M main
git push -u origin main

Step 3: Deploy Backend to Render

a. Create New Web Service

Go to https://dashboard.render.com
Click "New +" → "Web Service"
Connect your GitHub repository
Configure the service:
- Name: siggraph-rag-backend
- Region: Choose closest to you
- Branch: main
- Root Directory: backend
- Runtime: Python 3
- Build Command: pip install -r requirements.txt
- Start Command: uvicorn api_server:app --host 0.0.0.0 --port $PORT
- Instance Type: Free

b. Configure Environment Variables

In the Render dashboard, add these environment variables:

OPENROUTER_API_KEY=sk-or-v1-...
QDRANT_URL=https://your-cluster.qdrant.io
QDRANT_API_KEY=your-qdrant-key
EMBEDDING_MODEL=baai/bge-large-en-v1.5
LLM_MODEL=gpt-4-turbo-preview
RERANKER_TYPE=cohere
COHERE_API_KEY=your-cohere-key

Memory Optimization Tips:

Use RERANKER_TYPE=cohere instead of cross-encoder to avoid downloading large models
If you must use local reranking, use smaller models like cross-encoder/ms-marco-TinyBERT-L-2-v2

c. Deploy

Click "Create Web Service"
Wait 5-10 minutes for deployment
Your backend will be live at: https://siggraph-rag-backend.onrender.com

d. Verify Deployment

Test your backend:

curl https://siggraph-rag-backend.onrender.com/health

Expected response:

{"status":"healthy","rag_initialized":true,"timestamp":1234567890}

Important: Render's free tier spins down after 15 minutes of inactivity. First request after sleep takes ~30 seconds to wake up.

Step 4: Deploy Frontend to Vercel

a. Install Vercel CLI

npm install -g vercel

b. Navigate to Frontend Directory

cd frontend

c. Configure Environment Variable

Create .env.production file:

NEXT_PUBLIC_API_URL=https://siggraph-rag-backend.onrender.com

d. Deploy to Vercel

vercel login
vercel --prod

Follow the prompts

e. Add Environment Variable in Vercel Dashboard

Go to https://vercel.com/dashboard
Select your project
Go to Settings → Environment Variables
Add:
- Key: NEXT_PUBLIC_API_URL
- Value: https://siggraph-rag-backend.onrender.com
- Environments: Production, Preview, Development
Redeploy:
```
vercel --prod
```

Your frontend will be live at: https://siggraph-rag.vercel.app

Step 5: Test Full-Stack Deployment

Open your frontend URL in a browser
Submit a test query: "What is 3D Gaussian Splatting?"
Verify:
- ✅ Connection established
- ✅ Progress updates appear
- ✅ Answer streams in real-time
- ✅ Source papers displayed with links

Troubleshooting Deployment Issues

Backend Issues

Problem: Backend returns 404

Solution: Check Start Command is correct: uvicorn api_server:app --host 0.0.0.0 --port $PORT

Problem: Backend crashes on startup

Solution: Check logs in Render dashboard, verify all environment variables are set

Problem: "No open ports detected"

Solution: Ensure your server binds to 0.0.0.0 and uses $PORT environment variable

Frontend Issues

Problem: "Connection error" in browser

Solution: Verify NEXT_PUBLIC_API_URL is set correctly in Vercel environment variables

Problem: Frontend shows old version

Solution: Clear Vercel cache and redeploy: vercel --prod --force

Problem: CORS errors

Solution: Verify backend has CORS middleware configured (already in api_server.py)

General Issues

Problem: Slow first request (30+ seconds)

Solution: Normal for Render free tier - backend is waking up from sleep

Problem: API costs too high

Solution: Use cheaper models for testing (e.g., gpt-3.5-turbo instead of gpt-4)

Cost Optimization

Free Tier Limits

Vercel: Unlimited deployments, 100GB bandwidth/month
Render: 750 hours/month (enough for one service), spins down after 15min inactivity
Qdrant Cloud: 1GB storage, 1M vectors
OpenRouter: Pay-as-you-go (no free tier)

Reducing API Costs

Use cheaper models for development:

LLM_MODEL=gpt-3.5-turbo  # ~$0.001/1K tokens vs $0.01/1K for GPT-4
# Note: Keep EMBEDDING_MODEL=baai/bge-large-en-v1.5 to match uploaded embeddings

Reduce retrieval size:
```
retrieval_top_k=5  # Instead of 8
```
Disable query refinement for testing:
```
refine_query=False
```
Cache common queries (advanced):
- Implement Redis caching for frequent questions

Monitoring and Maintenance

Check Backend Logs

Render Dashboard → Your Service → Logs

Check Frontend Logs

Vercel Dashboard → Your Project → Deployments → View Logs

Monitor API Usage

OpenRouter: https://openrouter.ai/activity
Cohere: https://dashboard.cohere.com/billing
Qdrant: https://cloud.qdrant.io

Update Deployment

When you make code changes:

# Push to GitHub
git add .
git commit -m "Your changes"
git push origin main

# Render will auto-deploy backend
# Redeploy frontend
cd frontend
vercel --prod

Production Best Practices

Environment Variables: Never commit API keys to Git
Error Handling: Add comprehensive error messages for debugging
Rate Limiting: Implement rate limiting to prevent API abuse
Monitoring: Set up uptime monitoring (e.g., UptimeRobot)
Backups: Regularly backup your Qdrant database
Documentation: Keep README updated with deployment changes
Testing: Test on staging environment before production deployment

8. Project Structure

Your final project structure should look like this:

siggraph-rag/
├── backend/
│   ├── api_server.py           # FastAPI server (provided)
│   ├── rag_generate.py         # RAG orchestration (you implement)
│   ├── retrieval_pipeline.py   # Retrieval logic (you implement)
│   ├── requirements.txt        # Python dependencies
│   ├── .env                    # Environment variables (not in git)
│   ├── .env.example            # Example env file
│   └── chunks.json             # Paper data
├── frontend/
│   ├── src/
│   │   ├── app/
│   │   │   ├── page.tsx        # Main page
│   │   │   └── layout.tsx      # Root layout
│   │   ├── components/
│   │   │   └── rag/            # RAG UI components
│   │   └── hooks/
│   │       ├── useRAGStream.ts # SSE hook (recommended)
│   │       └── useRAGWebSocket.ts # WebSocket hook
│   ├── package.json
│   ├── .env.local              # Local env (not in git)
│   └── .env.production         # Production env (not in git)
├── .gitignore
└── README.md

9. Additional Resources

10. Submission Checklist

Before submitting your project, ensure:

11. Support

If you encounter issues:

Check the logs: Backend (Render) and Frontend (Vercel) dashboards
Review this README: Most common issues are covered
Search GitHub Issues: Check if others had similar problems
Ask for help: Post in circle or as your TA with:
- Error message
- What you tried
- Relevant code snippets
- Environment (local/deployed)

Good luck building your RAG application! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation