Gmail-style custom category builder with a CLI, HTTP API, and lightweight web UI.
From the spec, there are three core requirements:
- Ingest messages from a JSONL file.
- Let a user define categories with natural language descriptions.
- Decide, for each message, whether it belongs in each category, and explain why.
This repo implements that end to end:
- Messages and categories are stored in a SQLite database.
- Both are turned into embeddings, so they can be compared in vector space.
- A pluggable classification strategy decides category membership. The default is an LLM-based strategy that evaluates all categories for a message in a single call and returns:
- which categories match,
- a confidence score for each match,
- and an explanation.
There are three ways to interact:
- CLI (
uv run extra <args>) for local workflows and demos - HTTP API (FastAPI) for programmatic access
- Web UI (Preact + Vite) for browsing messages and categories
At the core there are three tables (see models.py):
Messageid,subject,sender,to[],snippet,body,dateembedding: list[float](JSON) for semantic representation
Categoryid,name,descriptionembedding: list[float](JSON)
MessageCategory- Association table (
message_id,category_id) - Also stores:
score(0–1 confidence or similarity)explanation(plain English)classified_at(timestamp)
- Association table (
There are two classification strategies implemented behind a common interface:
Classes: LLMClassificationStrategy, ClassificationService
-
Input representation
-
The message is formatted as text:
Subject: … From: … To: … Date: … Preview: … Body: … -
All categories are represented as:
[0] Work Travel Description: Work-related travel receipts from airlines, hotels, etc. [1] AI Research Newsletters Description: … …
-
-
Single LLM call per message
-
The strategy uses pydantic-ai for our agent loop with a typed output schema:
class CategoryMatchOutput(BaseModel): category_index: int is_in_category: bool explanation: str confidence: float
-
The agent is instructed to, for each individual message:
- Evaluate every category simultaneously.
- Return
is_in_category,confidence, and an explanation for each.
-
Classes: EmbeddingSimilarityStrategy, EmbeddingService
This is a pure cosine-similarity strategy that uses the stored embeddings:
-
EmbeddingServicebuilds embeddings using the OpenAI embeddings API:- Messages:
"Subject + From + Snippet + Body"combined to a single text span. - Categories:
"Category name + description"combined similarly.
- Messages:
-
At classification time:
- Compute cosine similarity between
message.embeddingand eachcategory.embedding. - Filter to similarities ≥
threshold. - Sort by similarity and take
top_n.
- Compute cosine similarity between
-
The resulting scores and explanations are persisted the same way as with the LLM strategy.
Entry point: cli.py exposed as the extra command via pyproject.toml.
Key commands:
# One-shot end-to-end demo: bootstrap messages + categories & classify with LLM
uv run extra bootstrap \
--messages sample-messages.jsonl \
--categories sample-categories.jsonl \
--drop \
--classify \
--top-n 3 \
--threshold 0.5Other useful commands:
# Import messages only (embedding + optional auto-classification)
uv run extra messages import sample-messages.jsonl --drop --classify
# List messages with their assigned categories
uv run extra messages list
# Inspect a single message in detail
uv run extra messages get a3a67dc0
# Classify a specific message on demand
uv run extra messages classify a3a67dc0 --top-n 3 --threshold 0.5
# Manage categories
uv run extra category create "Work Travel" \
"Work-related travel receipts from airlines, hotels, and travel agencies"
uv run extra category listThe CLI is tuned for the interview: you can show ingestion, category creation, and classification in a few self-contained commands, with nicely formatted table output and explanations.
Entry point: api.py → app: FastAPI.
Start the server:
uv run uvicorn api:app --host 0.0.0.0 --port 8000
# or
uv run python api.pyCore endpoints:
GET /health– health checkPOST /bootstrap/– upload messages and categories and optionally auto-classifyGET /categories/– list categoriesPOST /categories/– create a categoryGET /messages/– list messages with their categoriesPOST /messages/import– upload messages JSONLPOST /messages/{message_id}/classify– classify a message
POST /messages/{message_id}/classify returns:
{
"message_id": "a3a67dc0",
"classifications": [
{
"category_id": 1,
"category_name": "Work Travel",
"score": 0.93,
"is_in_category": true,
"explanation": "This email is a flight receipt for a work trip."
}
]
}This is a small extension of the spec’s suggested shape:
-
For each (message, category) pair that passes the decision rule, you get:
message_idis_in_category(alwaystruein this list)explanation- plus
category_id,category_name, andscorefor debugging and UI.
If you want an array exactly of the spec’s form, you can flatten this response to something like:
[
{
"message_id": "a3a67dc0",
"is_in_category": true,
"explanation": "This email is a flight receipt for a work trip."
}
]The UI in ui/ is intentionally minimal but demonstrates the full loop:
- Shows all categories and their descriptions.
- Click a category to see messages currently assigned to it (using the persisted
MessageCategoryrows). - Shows sender, subject, snippet, date, and assigned categories.
- Expand a row to see full body, category explanations, and scores.
The UI talks to the backend via the same API (proxied to /api in dev).
Input format: messages.jsonl where each line is:
{
"id": "174a9",
"subject": "Your Delta eTicket Receipt",
"from": "Delta <no-reply@delta.com>",
"to": ["sam@example.com"],
"snippet": "Thanks for flying with us",
"body": "PGh0bWw+CiAgPGhlYWQ+Li4uPC9oZWFkPgogIDxib2R5Pi4uLjwvYm9keT4KPC9odG1sPg==",
"date": "2025-08-12T14:33:22Z"
}The ingestion pipeline in MessagesService does:
- Base64 decode of
body. - HTML to text if the body looks like HTML (via BeautifulSoup).
- Embedding generation using
EmbeddingService.embed_message. - Persist the
Messagewith its embedding in SQLite.
All plain-text normalization lives in MessagesService.parse_message_content, so the same logic is reused for both CLI and API ingestion.
Input format: sample-categories.jsonl where each line is:
{"name": "Work Travel", "description": "Work-related travel receipts, bookings, and itineraries from airlines, hotels, and travel agencies"}The category pipeline in CategoriesService:
- Creates a
Categorywith the givennameanddescription. - Calls
EmbeddingService.embed_categoryon"Category: {name}\nDescription: {description}". - Stores the embedding alongside the category.
Names are unique; attempts to create duplicates raise a 400 error in the API or a ValueError in the service layer.
This is the “show me it all works” path for reviewers.
# Install deps
uv sync
# Bootstrap sample data and classify with LLM
uv run extra bootstrap \
--messages sample-messages.jsonl \
--categories sample-categories.jsonl \
--drop \
--classify \
--top-n 3 \
--threshold 0.5The CLI prints:
- A summary of how many categories, messages, and classifications were created.
- A table of sample categories.
- A table of sample messages, including assigned categories.
- For the first few messages, the matched categories with scores and explanations.
-
Start the API:
uv run uvicorn api:app --reload --port 8000
-
Bootstrap via HTTP:
curl -X POST http://localhost:8000/bootstrap/ \ -F "messages_file=@sample-messages.jsonl" \ -F "categories_file=@sample-categories.jsonl" \ -F "drop_existing=true" \ -F "auto_classify=true" \ -F "classification_top_n=3" \ -F "classification_threshold=0.5"
-
Inspect classifications for a single message:
curl -X POST "http://localhost:8000/messages/a3a67dc0/classify?top_n=3&threshold=0.5"Example response shape:
{ "message_id": "a3a67dc0", "classifications": [ { "category_id": 1, "category_name": "Work Travel", "score": 0.94, "is_in_category": true, "explanation": "This email is a flight receipt from an airline for a work trip." } ] } -
Browse messages and categories in the built-in docs:
- Open
http://localhost:8000/docsin a browser.
- Open
With the backend running:
cd ui
pnpm install
pnpm devThen open http://localhost:5173:
- You should see the categories from
sample-categories.jsonl. - The messages from
sample-messages.jsonlappear with their assigned categories. - Expanding a message shows the same explanation text persisted from the classification step.
- Python 3.12 (see
.python-version) uvfor Python env and packaging- Node.js 20+ and
pnpmfor the UI (only if you want to run the UI) - SQLite (embedded, no separate service)
If you have Tilt installed:
brew install tilt-dev/tap/tilt # on macOS
tilt upTilt will:
-
start the FastAPI backend on port 8000,
-
start the Preact UI on port 5173,
-
give you clickable buttons to:
- bootstrap sample data,
- run tests,
- lint and format,
- build the UI.
See Tiltfile for details.
# Install Python deps
uv sync
# Start API
uv run uvicorn api:app --reload --port 8000
# (Optional) install CLI globally in your environment
uv pip install -e .UI (optional):
cd ui
pnpm install
pnpm dev.
├── api.py # FastAPI app entry point
├── cli.py # Typer CLI ("extra")
├── models.py # SQLAlchemy ORM models (Message, Category, MessageCategory)
├── app/
│ ├── config.py # Config (DB URL, thresholds, model names)
│ ├── deps.py # FastAPI dependency wiring
│ ├── controllers/ # FastAPI routers (bootstrap, messages, categories)
│ ├── services/ # Orchestration: embedding, classification, messages, categories, bootstrap
│ ├── managers/ # Thin CRUD wrappers over SQLAlchemy sessions
│ ├── stores/ # SQLiteStore: engine and session management
│ └── utils/ # JSONL parsing, HTML handling
├── ui/ # Preact + Vite front-end
├── tests/ # Unit and integration tests
└── sample-*.jsonl # Sample messages and categories for the demo
The layering is:
- Controllers: HTTP request / response translation, validation.
- Services: business logic and orchestration (classification, ingestion).
- Managers: direct DB operations for a single entity.
- Store: DB engine and session lifetime management.
Tests are structured to avoid hitting real OpenAI APIs:
- Embeddings are replaced with deterministic random vectors via
MockEmbeddingService. - LLM calls are replaced with
FunctionModelinstances that return structured JSON.
To run tests:
uv run pytest -v