Footnote

Chat with your Google Drive folders using RAG (Retrieval-Augmented Generation).

Footnote indexes your Google Drive documents and lets you have conversations with them. It uses hybrid search (vector + keyword + recency) to find relevant context and Claude to generate responses with citations.

Key Features

Google Drive Integration - OAuth login, folder picker, automatic sync
Multi-format Support - Google Docs, PDFs (with OCR), images (with vision)
Hybrid Search - Combines semantic similarity, keyword matching, and recency scoring
Two Chat Modes:
- Simple RAG - Fast single-pass retrieval
- Agentic RAG - Iterative tool-use for complex queries
Citations - Responses include clickable references to source documents

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                              Frontend                                    │
│                    React + Vite + TypeScript + Tailwind                 │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                           Backend (FastAPI)                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌─────────────┐ │
│  │  /api/auth   │  │ /api/folders │  │  /api/chat   │  │ /api/health │ │
│  └──────────────┘  └──────────────┘  └──────────────┘  └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
         │                    │                    │
         ▼                    ▼                    ▼
┌─────────────┐      ┌─────────────┐      ┌─────────────────────────────┐
│ Google APIs │      │   Celery    │      │       Hybrid Search         │
│  (OAuth +   │      │   Worker    │      │  Vector (60%) + Keyword     │
│   Drive)    │      │  Indexing   │      │  (20%) + Recency (20%)      │
└─────────────┘      └─────────────┘      └─────────────────────────────┘
                            │                          │
                            ▼                          ▼
                     ┌─────────────┐           ┌─────────────┐
                     │  Fireworks  │           │   Claude    │
                     │ (Embeddings │           │ (Generation)│
                     │  + Rerank)  │           └─────────────┘
                     └─────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                     PostgreSQL + pgvector + Redis                       │
│  users, sessions, folders, files, chunks (768-dim vectors), messages    │
└─────────────────────────────────────────────────────────────────────────┘

Indexing Pipeline

User selects a Google Drive folder
Celery worker fetches file list and creates indexing jobs
For each file:
- Download/export from Drive
- Extract text (HTML parsing for Docs, Mistral OCR for PDFs)
- Split into chunks with heading-aware boundaries
- Generate embeddings via Fireworks (Nomic 768-dim)
- Store chunks with vectors and tsvectors for hybrid search

Retrieval Pipeline

Query → Embed with same model
Hybrid search: vector similarity + keyword tsvector + recency decay
Optional reranking for final top-k
Format context → Claude generates response with [N] citations
Stream response via SSE

Setup

Prerequisites

Python 3.11+
Node.js 18+
Docker & Docker Compose
API keys (see below)

1. Clone and configure environment

git clone <repo>
cd footnote

# Copy environment template
cp .env.example .env

2. Get API credentials

Edit .env with your credentials:

Variable	Where to get it
`GOOGLE_CLIENT_ID`	Google Cloud Console - Create OAuth 2.0 Client
`GOOGLE_CLIENT_SECRET`	Same as above
`FIREWORKS_API_KEY`	Fireworks AI
`ANTHROPIC_API_KEY`	Anthropic Console
`MISTRAL_API_KEY`	Mistral Console
`SECRET_KEY`	Generate: `python -c "import secrets; print(secrets.token_urlsafe(32))"`

Google OAuth Setup:

Create a project in Google Cloud Console
Enable Google Drive API and Google Picker API
Configure OAuth consent screen (add ../auth/drive.readonly scope)
Create OAuth 2.0 Client ID (Web application)
Add http://localhost:8000/api/auth/google/callback to authorized redirect URIs

3. Start with Docker (recommended)

# Start all services
docker-compose up

# Frontend: http://localhost:3000
# Backend:  http://localhost:8000
# API docs: http://localhost:8000/docs

4. Or run locally

# Terminal 1: Start PostgreSQL + Redis
docker-compose up db redis

# Terminal 2: Backend
cd backend
uv sync
uv run uvicorn main:app --reload

# Terminal 3: Celery worker
cd backend
uv run celery -A app.celery_app worker --loglevel=info --pool=threads -Q celery,indexing

# Terminal 4: Frontend
cd frontend
npm install
npm run dev

Development

Running Tests

# Backend
cd backend && uv run pytest
cd backend && uv run pytest tests/unit/test_hybrid_search.py -v  # single file

# Frontend
cd frontend && npm test
cd frontend && npm run test:e2e  # Playwright

Linting

# Backend
cd backend && uv run ruff check . && uv run ruff format .

# Frontend
cd frontend && npm run lint

Useful Commands

# View worker logs
docker-compose logs -f worker

# Reset database
docker-compose down -v && docker-compose up

# Shell into backend container
docker-compose exec backend bash

Database Migrations

Migrations live in backend/database/migrations/ as numbered SQL files. Use the migrate script to apply them:

cd backend

# Run pending migrations
uv run bin/migrate

# Check migration status
uv run bin/migrate --status

The script tracks applied migrations in a schema_migrations table. To create a new migration, add a SQL file with the next number prefix (e.g., 003_add_feature.sql).

Deployment

See render.yaml for Render deployment configuration. After deploying:

Set environment variables in Render dashboard
Enable pgvector extension on database:
```
CREATE EXTENSION IF NOT EXISTS vector;
```
Update GOOGLE_REDIRECT_URI to your production callback URL

Tech Stack

Backend: FastAPI, SQLAlchemy (async), Celery, Redis
Frontend: React 18, Vite, TypeScript, Tailwind CSS, Radix UI
Database: PostgreSQL 16 with pgvector
AI: Fireworks (Nomic embeddings), Anthropic Claude, Mistral (OCR)

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
.github/workflows		.github/workflows
backend		backend
bin		bin
database		database
docs		docs
frontend		frontend
plans		plans
scripts		scripts
.env.docker		.env.docker
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
README.md		README.md
docker-compose.yml		docker-compose.yml
render.yaml		render.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Footnote

Key Features

Architecture

Indexing Pipeline

Retrieval Pipeline

Setup

Prerequisites

1. Clone and configure environment

2. Get API credentials

3. Start with Docker (recommended)

4. Or run locally

Development

Running Tests

Linting

Useful Commands

Database Migrations

Deployment

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Footnote

Key Features

Architecture

Indexing Pipeline

Retrieval Pipeline

Setup

Prerequisites

1. Clone and configure environment

2. Get API credentials

3. Start with Docker (recommended)

4. Or run locally

Development

Running Tests

Linting

Useful Commands

Database Migrations

Deployment

Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages