Skip to content

πŸ§ͺ Local RAG in one command β€” index files, ask questions, no cloud required

License

Notifications You must be signed in to change notification settings

nightcityblade/distill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

distill πŸ§ͺ

Local RAG in one command. Index your files, ask questions, get answers β€” no cloud, no config.

pip install distill-rag
# Index a codebase or docs folder
distill index .

# Ask questions
distill ask "how does authentication work?"
distill ask "what are the main API endpoints?"
distill ask "summarize the database schema"

# Interactive chat mode
distill chat

That's it. No vector databases to set up. No cloud accounts. Everything runs locally.

Why distill?

RAG tools are either too complex (LangChain, LlamaIndex β€” pages of boilerplate) or too limited (can't handle code). distill is the middle ground:

  • 🏠 Fully local β€” SQLite + Ollama. Your data never leaves your machine.
  • ⚑ Zero config β€” distill index . && distill ask "..."
  • πŸ’» Code-aware β€” Understands Python, JS/TS, Rust, Go, Java, C/C++, and more
  • πŸ“„ Doc-aware β€” Markdown, RST, TXT, PDF, HTML
  • πŸ”„ Incremental β€” Re-index only changed files
  • πŸ”Œ Multi-LLM β€” Works with Ollama (default), OpenAI, Anthropic, or any OpenAI-compatible API

How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Files    │────▢│  Chunk   │────▢│ Embed    │────▢│ SQLite   β”‚
β”‚  on disk  β”‚     β”‚  & parse β”‚     β”‚ vectors  β”‚     β”‚  + vec   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                          β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
β”‚  Answer   │◀────│  LLM     │◀────│ Retrieve β”‚β—€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚           β”‚     β”‚  generateβ”‚     β”‚ top-k    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  1. Index β€” Files are chunked (respecting code boundaries), embedded, and stored in SQLite with sqlite-vec
  2. Query β€” Your question is embedded, top-k similar chunks are retrieved
  3. Answer β€” Retrieved chunks + your question go to the LLM for a grounded answer

Commands

Command Description
distill index <path> Index files in a directory
distill ask "<question>" Ask a question about indexed files
distill chat Interactive chat mode with context
distill search "<query>" Raw similarity search (no LLM)
distill status Show index stats
distill forget <path> Remove a path from the index

Options

--model TEXT        LLM model (default: ollama/llama3)
--embed-model TEXT  Embedding model (default: ollama/nomic-embed-text)
--top-k INT         Number of chunks to retrieve (default: 5)
--chunk-size INT    Target chunk size in tokens (default: 512)
--include TEXT      File patterns to include (e.g., "*.py,*.md")
--exclude TEXT      File patterns to exclude
--db PATH           Database path (default: .distill/index.db)
--verbose           Show retrieved chunks and scores

Examples

Index and query a codebase

cd my-project
distill index .
distill ask "how is error handling done?"

Use with OpenAI instead of Ollama

export OPENAI_API_KEY=sk-...
distill ask "explain the main architecture" --model gpt-4o-mini --embed-model text-embedding-3-small

Search without LLM generation

distill search "database migration" --top-k 10

Output:

[0.92] src/db/migrations.py:45-78
  def run_migrations(engine):
      """Run all pending migrations in order..."""

[0.87] docs/database.md:12-34
  ## Migrations
  We use alembic for database migrations...

[0.81] src/db/models.py:1-23
  """SQLAlchemy models for the application."""

Interactive chat

distill chat
distill> what does the auth middleware do?

Based on src/auth/middleware.py, the auth middleware:
1. Extracts the JWT token from the Authorization header
2. Validates it against the secret key
3. Attaches the user object to the request context
...

distill> what about rate limiting?

The rate limiter in src/middleware/ratelimit.py uses a sliding window...

Supported File Types

Category Extensions
Code .py, .js, .ts, .jsx, .tsx, .rs, .go, .java, .c, .cpp, .h, .rb, .php, .swift, .kt
Docs .md, .rst, .txt, .html, .pdf
Config .yaml, .yml, .toml, .json, .ini
Data .csv, .sql

Privacy

Everything stays on your machine:

  • Embeddings stored in local SQLite
  • Default LLM is Ollama (fully offline)
  • No telemetry, no cloud calls (unless you choose a cloud model)

Requirements

  • Python 3.10+
  • Ollama (for default local mode) β€” or an OpenAI/Anthropic API key

License

MIT

About

πŸ§ͺ Local RAG in one command β€” index files, ask questions, no cloud required

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages