SmartGrep is a local, semantic code search tool that understands what your code does, not just what it says. It uses advanced embeddings and AST parsing to find relevant code snippets based on natural language queries.
- 🧠 Semantic Understanding: Uses
jina-embeddings-v2-base-code(8k context) to understand long functions and complex logic. - 🌳 Smart Chunking: Uses
tree-sitterto parse code into meaningful blocks (functions, classes) rather than arbitrary text lines. - 🚀 Fast & Local: Built on LanceDB (serverless vector DB) and SentenceTransformers. Runs entirely on your CPU/GPU. No data leaves your machine.
- 🔍 Gitignore Support: Respects
.gitignorepatterns to exclude files from indexing. - 💻 CLI Interface: Simple, developer-friendly command-line interface.
This project uses uv for dependency management.
-
Clone the repository:
git clone https://github.com/yourusername/SmartGrep.git cd SmartGrep -
Install SmartGrep as a CLI tool:
chmod +x install.sh ./install.sh
Or manually:
uv pip install -e .
After installation, you can use smartgrep or sgrep commands directly!
Alternative: Run with uv (without installation):
uv sync # Install dependencies first
uv run python -m smartgrep.cli index .
uv run python -m smartgrep.cli search "your query"- Increasing WSL2 memory limit in
.wslconfig - Indexing smaller directories first
- Using the uv run method which may have better memory management
Run the indexer to scan your files, parse them, and build the vector index.
smartgrep index .
# or use the short alias
sgrep index .Note: The first run will download the embedding model (~0.5GB).
Incremental Indexing: The indexer automatically detects changes using file hashing and only reprocesses new or modified files. Simply run the same command again to update the index.
Search using natural language.
smartgrep search "how to connect to the database"
# or
sgrep search "how to connect to the database"Auto-Index (Default): By default, the search command automatically checks for code changes and updates the index before searching. This ensures results are always up-to-date.
# Auto-index enabled (default)
sgrep search "authentication middleware"
# Disable auto-index for faster search
sgrep search "authentication middleware" --no-auto-indexAdjust Relevance Threshold:
# Lower threshold = stricter matching (default: 1.3)
sgrep search "auth middleware" --threshold 1.0Limit Results:
# Show only top 3 results (default: 5)
sgrep search "database query" --limit 3Disable Hybrid Search:
# Use vector search only, skip keyword search and reranking
sgrep search "error handling" --no-hybridCombine Options:
sgrep search "logging utility" --threshold 1.0 --limit 3 --no-hybrid --no-auto-indexsmartgrep index [PATH]PATH: Directory to index (default: current directory.)
smartgrep search QUERY [OPTIONS]Arguments:
QUERY: Natural language search query (required)
Options:
--threshold FLOAT: Distance threshold for relevance (default: 1.3, range: 0.0-2.0, lower = stricter)--limit INT: Maximum number of results to show (default: 5, range: 1-50)--hybrid / --no-hybrid: Enable/disable hybrid search with reranking (default: enabled)--auto-index / --no-auto-index: Auto-update index before search (default: enabled)--qa: Enable Code QA with a local LLM (Ollama) to get a natural language answer.--ollama-model TEXT: The Ollama model to use for Code QA (default:llama3).
For faster searches, SmartGrep runs a background daemon to keep the AI models loaded in memory. The search command will start it automatically.
sgrep daemon start: Start the daemon manually.sgrep daemon stop: Stop the daemon.sgrep daemon status: Check if the daemon is running.
- Python (
.py) - JavaScript/TypeScript (
.js,.ts) - Go (
.go) - Rust (
.rs) - C (
.c,.h) - C++ (
.cpp,.cc,.cxx,.hpp) - Java (
.java)
- Embeddings:
jinaai/jina-embeddings-v2-base-code - Vector Store:
lancedb - Parsing:
tree-sitter - CLI:
typer+rich