Local RAG in one command. Index your files, ask questions, get answers β no cloud, no config.
pip install distill-rag# Index a codebase or docs folder
distill index .
# Ask questions
distill ask "how does authentication work?"
distill ask "what are the main API endpoints?"
distill ask "summarize the database schema"
# Interactive chat mode
distill chatThat's it. No vector databases to set up. No cloud accounts. Everything runs locally.
RAG tools are either too complex (LangChain, LlamaIndex β pages of boilerplate) or too limited (can't handle code). distill is the middle ground:
- π Fully local β SQLite + Ollama. Your data never leaves your machine.
- β‘ Zero config β
distill index . && distill ask "..." - π» Code-aware β Understands Python, JS/TS, Rust, Go, Java, C/C++, and more
- π Doc-aware β Markdown, RST, TXT, PDF, HTML
- π Incremental β Re-index only changed files
- π Multi-LLM β Works with Ollama (default), OpenAI, Anthropic, or any OpenAI-compatible API
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β Files ββββββΆβ Chunk ββββββΆβ Embed ββββββΆβ SQLite β
β on disk β β & parse β β vectors β β + vec β
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β
ββββββββββββ ββββββββββββ ββββββββββββ β
β Answer βββββββ LLM βββββββ Retrieve βββββββββββββ
β β β generateβ β top-k β
ββββββββββββ ββββββββββββ ββββββββββββ
- Index β Files are chunked (respecting code boundaries), embedded, and stored in SQLite with
sqlite-vec - Query β Your question is embedded, top-k similar chunks are retrieved
- Answer β Retrieved chunks + your question go to the LLM for a grounded answer
| Command | Description |
|---|---|
distill index <path> |
Index files in a directory |
distill ask "<question>" |
Ask a question about indexed files |
distill chat |
Interactive chat mode with context |
distill search "<query>" |
Raw similarity search (no LLM) |
distill status |
Show index stats |
distill forget <path> |
Remove a path from the index |
--model TEXT LLM model (default: ollama/llama3)
--embed-model TEXT Embedding model (default: ollama/nomic-embed-text)
--top-k INT Number of chunks to retrieve (default: 5)
--chunk-size INT Target chunk size in tokens (default: 512)
--include TEXT File patterns to include (e.g., "*.py,*.md")
--exclude TEXT File patterns to exclude
--db PATH Database path (default: .distill/index.db)
--verbose Show retrieved chunks and scores
cd my-project
distill index .
distill ask "how is error handling done?"export OPENAI_API_KEY=sk-...
distill ask "explain the main architecture" --model gpt-4o-mini --embed-model text-embedding-3-smalldistill search "database migration" --top-k 10Output:
[0.92] src/db/migrations.py:45-78
def run_migrations(engine):
"""Run all pending migrations in order..."""
[0.87] docs/database.md:12-34
## Migrations
We use alembic for database migrations...
[0.81] src/db/models.py:1-23
"""SQLAlchemy models for the application."""
distill chatdistill> what does the auth middleware do?
Based on src/auth/middleware.py, the auth middleware:
1. Extracts the JWT token from the Authorization header
2. Validates it against the secret key
3. Attaches the user object to the request context
...
distill> what about rate limiting?
The rate limiter in src/middleware/ratelimit.py uses a sliding window...
| Category | Extensions |
|---|---|
| Code | .py, .js, .ts, .jsx, .tsx, .rs, .go, .java, .c, .cpp, .h, .rb, .php, .swift, .kt |
| Docs | .md, .rst, .txt, .html, .pdf |
| Config | .yaml, .yml, .toml, .json, .ini |
| Data | .csv, .sql |
Everything stays on your machine:
- Embeddings stored in local SQLite
- Default LLM is Ollama (fully offline)
- No telemetry, no cloud calls (unless you choose a cloud model)
- Python 3.10+
- Ollama (for default local mode) β or an OpenAI/Anthropic API key
MIT