Documentation: Getting Started · README · Configuration · IDE Clients · MCP API · ctx CLI · Memory Guide · Architecture · Multi-Repo · Kubernetes · VS Code Extension · Troubleshooting · Development
Context-Engine is a plug-and-play MCP retrieval stack that unifies code indexing, hybrid search, and optional llama.cpp decoding so product teams can ship context-aware agents in minutes, not weeks.
Key differentiators
- One-command bring-up delivers dual SSE/RMCP endpoints, seeded Qdrant, and live watch/reindex loops
- ReFRAG-inspired micro-chunking, token budgeting, and gate-first filtering surface precise spans
- Shared memory/indexer schema and reranker tooling for dense, lexical, and semantic signals
- ctx CLI prompt enhancer with multi-pass unicorn mode for code-grounded prompt rewriting
- VS Code extension with Prompt+ button and automatic workspace sync
- Kubernetes deployment with Kustomize for remote/scalable setups
- Performance optimizations: connection pooling, caching, deduplication, async subprocess management
Built for
- AI platform and IDE tooling teams needing an MCP-compliant context layer
- DevEx groups standing up internal assistants for large or fast-changing codebases
| Client | Transport | Notes |
|---|---|---|
| Roo | SSE/RMCP | Both SSE and RMCP connections |
| Cline | SSE/RMCP | Both SSE and RMCP connections |
| Windsurf | SSE/RMCP | Both SSE and RMCP connections |
| Zed | SSE | Uses mcp-remote bridge |
| Kiro | SSE | Uses mcp-remote bridge |
| Qodo | RMCP | Direct HTTP endpoints |
| OpenAI Codex | RMCP | TOML config |
| Augment | SSE | Simple JSON configs |
| AmpCode | SSE | Simple URL for SSE endpoints |
| Claude Code CLI | SSE / HTTP (RMCP) | Simple JSON configs via .mcp.json |
See docs/IDE_CLIENTS.md for detailed configuration examples.
If you're a VS Code user trying Context-Engine locally, start with the low-friction dev-remote + extension guide:
The options below describe the docker compose + CLI workflows.
Deploy Context-Engine once, connect any IDE. No need to clone this repo into your project.
1. Start the stack (on your dev machine or a server):
git clone https://github.com/m1rl0k/Context-Engine.git && cd Context-Engine
docker compose up -d2. Index your codebase (point to any project):
HOST_INDEX_PATH=/path/to/your/project docker compose run --rm indexer3. Connect your IDE — add to your MCP config:
{
"mcpServers": {
"context-engine": { "url": "http://localhost:8001/sse" }
}
}See docs/IDE_CLIENTS.md for Cursor, Windsurf, Cline, Codex, and other client configs.
Run Context-Engine on a server and connect from anywhere.
Docker on a server:
# On server (e.g., context.yourcompany.com)
git clone https://github.com/m1rl0k/Context-Engine.git && cd Context-Engine
docker compose up -dIndex from your local machine:
# VS Code extension (recommended) - install, set server URL, click "Upload Workspace"
# Or CLI:
scripts/remote_upload_client.py --server http://context.yourcompany.com:9090 --path /your/projectConnect IDE to remote:
{ "mcpServers": { "context-engine": { "url": "http://context.yourcompany.com:8001/sse" } } }Kubernetes: See deploy/kubernetes/README.md for Kustomize deployment.
For contributors or advanced customization with LLM decoder:
INDEX_MICRO_CHUNKS=1 MAX_MICRO_CHUNKS_PER_FILE=200 make reset-dev-dual| Service | Port | Use |
|---|---|---|
| Indexer MCP | 8001 (SSE), 8003 (RMCP) | Code search, context retrieval |
| Memory MCP | 8000 (SSE), 8002 (RMCP) | Knowledge storage |
| Qdrant | 6333 | Vector database |
| llama.cpp | 8080 | Local LLM decoder |
Stack behavior:
- Single
codebasecollection — search across all indexed repos - Health checks auto-detect and fix cache/collection sync
- Live file watching with automatic reindexing
- SSE (default):
http://localhost:8001/sse— Cursor, Cline, Windsurf, Augment - RMCP:
http://localhost:8003/mcp— Codex, Qodo - Dual: Both SSE + RMCP simultaneously (
make reset-dev-dual)
cp .env.example .env # Copy template on first runKey settings (see docs/CONFIGURATION.md for full reference):
| Setting | Purpose | Default |
|---|---|---|
INDEX_MICRO_CHUNKS=1 |
Enable micro-chunking | 0 |
REFRAG_DECODER=1 |
Enable LLM decoder | 1 |
REFRAG_RUNTIME |
Decoder backend | llamacpp |
COLLECTION_NAME |
Qdrant collection | codebase |
GPU acceleration (Apple Silicon):
scripts/gpu_toggle.sh gpu # Switch to native Metal
scripts/gpu_toggle.sh start # Start GPU decoder- Bring the stack up with the reset target that matches your client (
make reset-dev,make reset-dev-codex, ormake reset-dev-dual). - When you need a clean ingest (after large edits or when the
qdrant_statustool/make qdrant-statusreports zero points), runmake reindex-hard. This clears.codebase/cache.jsonbefore recreating the collection so unchanged files cannot be skipped. - Confirm collection health with
make qdrant-status(calls the MCP router to print counts and timestamps). - Iterate using search helpers such as
make hybrid ARGS="--query 'async file watcher'"or invoke the MCP tools directly from your client.
On Apple Silicon you can run the llama.cpp decoder natively with Metal while keeping the rest of the stack in Docker:
- Install the Metal-enabled llama.cpp binary (e.g.
brew install llama.cpp). - Flip to GPU mode and start the native server:
The toggle updates
scripts/gpu_toggle.sh gpu scripts/gpu_toggle.sh start # launches llama-server on localhost:8081 docker compose up -d --force-recreate mcp_indexer mcp_indexer_http docker compose stop llamacpp # optional once the native server is healthy
.envto point athttp://host.docker.internal:8081so containers reach the host process. - Run
scripts/gpu_toggle.sh statusto confirm the native server is healthy. All MCPcontext_answercalls will now use the Metal-backed decoder.
Want the original dockerised decoder (CPU-only or x86 GPU fallback)? Swap back with:
scripts/gpu_toggle.sh docker
docker compose up -d --force-recreate mcp_indexer mcp_indexer_http llamacppThis re-enables the llamacpp container and resets .env to http://llamacpp:8080.
- Setup:
reset-dev,reset-dev-codex,reset-dev-dual- Full stack with SSE, RMCP, or both - Lifecycle:
up,down,logs,ps,restart,rebuild - Indexing:
index,reindex,reindex-hard,index-here,index-path - Watch:
watch(local),watch-remote(upload to remote server) - Maintenance:
prune,prune-path,warm,health,decoder-health - Search:
hybrid,rerank,rerank-local - LLM:
llama-model,tokenizer,llamacpp-up,setup-reranker,quantize-reranker - MCP Tools:
qdrant-status,qdrant-list,qdrant-prune,qdrant-index-root - Remote:
dev-remote-up,dev-remote-down,dev-remote-bootstrap - Router:
route-plan,route-run,router-eval,router-smoke - CLI:
ctx Q="your question"- Prompt enhancement with repo context
A CLI that retrieves code context and rewrites your input into a better, code-grounded prompt using the local LLM decoder.
Features:
- Unicorn mode (
--unicorn): Multi-pass enhancement with 2-3 refinement stages - Detail mode (
--detail): Include compact code snippets for richer context - Memory blending: Falls back to stored memories when code search returns no hits
- Streaming: Real-time token output for instant feedback
- Filters:
--language,--under,--limitto scope retrieval
scripts/ctx.py "What is ReFRAG?" # Basic question
scripts/ctx.py "Refactor ctx.py" --unicorn # Multi-pass enhancement
scripts/ctx.py "Add error handling" --detail # With code snippets
make ctx Q="Explain caching" # Via Make targetSee docs/CTX_CLI.md for full documentation.
# Index a specific path
make index-path REPO_PATH=/path/to/repo [RECREATE=1]
# Index current directory
cd /path/to/repo && make -C /path/to/Context-Engine index-here
# Raw docker compose
docker compose run --rm -v /path/to/repo:/work indexer --root /work --recreateSee docs/MULTI_REPO_COLLECTIONS.md for multi-repo architecture and remote deployment.
curl -sSf http://localhost:6333/readyz && echo "Qdrant OK"
curl -sI http://localhost:8001/sse | head -n1 # SSE
curl -sI http://localhost:8003/mcp | head -n1 # RMCP| Topic | Description |
|---|---|
| Configuration | Complete environment variable reference |
| IDE Clients | Setup for Roo, Cline, Windsurf, Zed, Kiro, Qodo, Codex, Augment |
| MCP API | Full API reference for all MCP tools |
| ctx CLI | Prompt enhancer CLI with unicorn mode |
| Memory Guide | Memory patterns and metadata schema |
| Architecture | System design and component interactions |
| Multi-Repo | Multi-repository indexing and remote deployment |
| Kubernetes | Kubernetes deployment with Kustomize |
| VS Code Extension | Workspace uploader and Prompt+ integration |
| Troubleshooting | Common issues and solutions |
| Development | Contributing and development setup |
Memory MCP (port 8000 SSE, 8002 RMCP):
store— save memories with metadatafind— hybrid memory searchset_session_defaults— set default collection for session
Indexer MCP (port 8001 SSE, 8003 RMCP):
- Search:
repo_search,code_search,context_search,context_answer - Specialized:
search_tests_for,search_config_for,search_callers_for,search_importers_for - Indexing:
qdrant_index_root,qdrant_index,qdrant_prune - Status:
qdrant_status,qdrant_list,workspace_info,list_workspaces,collection_map - Utilities:
expand_query,change_history_for_path,set_session_defaults
See docs/MCP_API.md for complete API documentation.
Python, JavaScript/TypeScript, Go, Java, Rust, Shell, Terraform, PowerShell, YAML, C#, PHP
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
pytest -qSee docs/DEVELOPMENT.md for full development setup.
| Component | SSE | RMCP |
|---|---|---|
| Memory MCP | http://localhost:8000/sse | http://localhost:8002/mcp |
| Indexer MCP | http://localhost:8001/sse | http://localhost:8003/mcp |
| Qdrant DB | http://localhost:6333 | - |
| Decoder | http://localhost:8080 | - |
See docs/IDE_CLIENTS.md for client setup and docs/TROUBLESHOOTING.md for common issues.
ReFRAG background: https://arxiv.org/abs/2509.01092
flowchart LR
subgraph Host/IDE
A[IDE Agents]
end
subgraph Docker Network
B(Memory MCP :8000)
C(MCP Indexer :8001)
D[Qdrant DB :6333]
G[[llama.cpp Decoder :8080]]
E[(One-shot Indexer)]
F[(Watcher)]
end
A -- SSE /sse --> B
A -- SSE /sse --> C
B -- HTTP 6333 --> D
C -- HTTP 6333 --> D
E -- HTTP 6333 --> D
F -- HTTP 6333 --> D
C -. HTTP 8080 .-> G
classDef opt stroke-dasharray: 5 5
class G opt
See docs/ARCHITECTURE.md for detailed system design.
MIT

