Local RAG (Retrieval-Augmented Generation) system using Docling for document parsing, LanceDB for vector storage, and Snowflake Arctic Embed for embeddings.
- Python 3.11+
- uv package manager
git clone https://github.com/d3group/doclingTemplate.git
cd doclingTemplate
uv syncDefaults work out of the box. Copy rag.toml.example to rag.toml to customize:
data_dir = "data"
lancedb_dir = "lancedb_data"
embed_model = "Snowflake/snowflake-arctic-embed-m-v2.0"
max_tokens = 1500
enable_reranking = true
enable_hybrid_search = trueContextual retrieval uses a local LLM to enrich each chunk with document context before embedding. This significantly improves search quality (Anthropic reports 35-67% fewer retrieval failures).
Requires Ollama running locally:
brew install ollama
brew services start ollama
ollama pull qwen2.5:1.5bThen enable in rag.toml:
enable_contextual_retrieval = true
ollama_model = "qwen2.5:1.5b"Re-ingest your documents after enabling. If Ollama isn't running, ingestion falls back to raw chunks automatically.
# Add your documents
cp your-documents/* data/
# Ingest them into the knowledge base
uv run docling-rag ingest
# Query
uv run docling-rag query "What is this document about?"
# Check what's indexed
uv run docling-rag statsThe MCP server lets Claude Code, Claude Desktop, or any MCP-compatible tool query your knowledge base directly.
The included .mcp.json auto-configures the server. Run /mcp in Claude Code or restart the session.
Create a .mcp.json in the other project pointing back to this one:
{
"mcpServers": {
"docling-rag": {
"command": "uv",
"args": ["--directory", "/absolute/path/to/doclingTemplate", "run", "docling-rag-mcp"]
}
}
}Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"docling-rag": {
"command": "/opt/homebrew/bin/uv",
"args": ["--directory", "/absolute/path/to/doclingTemplate", "run", "docling-rag-mcp"]
}
}
}Use the full path to uv (find with which uv) — GUI apps don't share your terminal's PATH.
| Tool | Description |
|---|---|
query_knowledge |
Search the knowledge base |
ingest_documents_tool |
Ingest documents from a directory |
ingest_file_tool |
Ingest a single file from any path |
get_database_stats |
Get database statistics |
list_indexed_sources |
List all indexed documents |
delete_document |
Remove a document from the index |
uv run docling-rag ingest # Ingest data/ directory
uv run docling-rag ingest --file /path/to/doc # Ingest a single file from anywhere
uv run docling-rag query "your question" # Search the knowledge base
uv run docling-rag query "..." -n 10 # Return more results (default: 5)
uv run docling-rag stats # Show statistics
uv run docling-rag init <path> # Create new project from templateCreate separate knowledge bases for different topics:
uv run docling-rag init ~/projects/my-topic
cd ~/projects/my-topic
uv sync
cp ~/Documents/relevant-files/* data/
uv run docling-rag ingestEach project gets its own vector database, config, and document store.
- Ingest — Documents are parsed with Docling, chunked semantically, and stored as embeddings in LanceDB
- Search — Queries use hybrid search (vector similarity + BM25 keyword matching) with cross-encoder reranking
- Results — Returns the most relevant chunks with source file and page references
PDF, DOCX, PPTX, XLSX, HTML/HTM, images (PNG, JPG, JPEG — OCR), Markdown, LaTeX, plain text, and code files (.py, .js, .ts, .json, .yaml, .yml, .toml, .sh, .css).