ambicuity · ambicuity · Mar 13, 2026 · Mar 13, 2026 · Mar 13, 2026 · Mar 13, 2026
@@ -16,6 +16,7 @@ jobs:
     runs-on: ubuntu-latest
     if: github.actor != 'dependabot[bot]' && github.actor != 'release-please[bot]'
     permissions:
+      issues: write
       pull-requests: read
 
     steps:

@@ -0,0 +1,88 @@
+# README Additions
+
+> Suggested sections to incorporate into the main project README.
+
+---
+
+## Tagline
+
+**KINDX — The Local Memory Node for MCP Agents**
+
+---
+
+## 30-Second Quick Demo
+
+See KINDX in action with a single command:
+
+```bash
+kindx demo
+```
+
+This prints a guided walkthrough of the main KINDX workflow. When the bundled `specs/eval-docs` corpus is available, the walkthrough references that local sample corpus; otherwise it falls back to simulated sample results.
+
+What the demo does:
+1. Shows the current CLI workflow for adding a collection and generating embeddings
+2. Walks through BM25, vector, and hybrid retrieval examples
+3. Shows agent-friendly output formats and MCP configuration
+4. Ends with copy-pasteable next steps for a real collection
+
+---
+
+## Benchmark Results
+
+Evaluated on the bundled `specs/eval-docs/` corpus with 24 hand-curated queries. The numbers below match [`demo/benchmarks/eval-results.json`](demo/benchmarks/eval-results.json).
+
+| Mode              | Hit@1  | MRR    | nDCG@5 | Median Latency |
+|-------------------|--------|--------|--------|----------------|
+| BM25              | 0.625  | 0.736  | 0.711  | 3ms            |
+| Vector            | 0.708  | 0.788  | 0.763  | 28ms           |
+| Hybrid (RRF)      | 0.792  | 0.849  | 0.822  | 45ms           |
+| Hybrid + Rerank   | 0.833  | 0.896  | 0.871  | 112ms          |
+
+- **BM25** — Keyword search using Okapi BM25 scoring. Fastest mode, ideal for exact-match lookups.
+- **Vector** — Semantic search using locally-computed embeddings. Best for natural language queries.
+- **Hybrid (RRF)** — Reciprocal Rank Fusion combining BM25 and vector results. Best balance of speed and accuracy.
+- **Hybrid + Rerank** — Hybrid results re-scored by a cross-encoder reranker. Highest accuracy at modest latency cost.
+
+---
+
+## Integration Recipes
+
+Step-by-step guides for connecting KINDX to your workflow:
+
+- [Claude Desktop](demo/recipes/claude-desktop.md) — Use KINDX as a memory backend for Claude Desktop via MCP.
+- [VS Code + Continue](demo/recipes/continue-dev.md) — Add project-aware retrieval to Continue's AI assistant.
+- [Cursor](demo/recipes/cursor-integration.md) — Connect Cursor's AI features to your local KINDX index.
+- [LangChain Agent](demo/recipes/langchain-agent.md) — Use KINDX as a tool in LangChain agent pipelines.
+- [AutoGPT](demo/recipes/autogpt-integration.md) — Connect autonomous agent frameworks to KINDX.
+
+---
+
+## Performance
+
+KINDX is designed for local-first, low-latency retrieval:
+
+| Operation              | Median Latency | p99 Latency |
+|------------------------|----------------|-------------|
+| BM25 search            | 3ms            | 8ms         |
+| Vector search          | 28ms           | 52ms        |
+| Hybrid search (RRF)    | 45ms           | 89ms        |
+| Hybrid + rerank        | 112ms          | 203ms       |
+| Document ingest (single)| 15ms          | 35ms        |
+| Batch ingest (100 docs) | 1.2s          | 2.1s        |
+| Cold start             | 2295ms         | 2295ms      |
+
+The committed benchmark snapshot was captured on an Apple M2 Pro with 16 GB RAM running macOS 14.
+
+---
+
+## Why KINDX?
+
+| Concern           | KINDX                                                       |
+|-------------------|-------------------------------------------------------------|
+| **Privacy**       | Everything runs locally. Your data never leaves your machine. No telemetry, no cloud calls, no API keys required. |
+| **Speed**         | Sub-100ms hybrid search on commodity hardware. BM25 queries return in single-digit milliseconds. |
+| **Offline**       | Fully functional without an internet connection. Embeddings are computed locally. |
+| **MCP-native**    | Built from the ground up as an MCP server. Speaks the Model Context Protocol natively — no adapters or shims needed. |
+| **Zero config**   | `npx kindx` and you're running. No Docker, no databases, no environment variables required for local use. |
+| **Lightweight**   | Single Node.js process, SQLite storage, ~50 MB on disk. Runs comfortably alongside your IDE and AI tools. |
@@ -0,0 +1,225 @@
+# KINDX Retrieval Evaluation Report
+
+**Date:** 2026-03-13
+**Version:** KINDX 1.0.1
+**Author:** KINDX Benchmark Suite (automated)
+
+---
+
+## 1. Test Setup
+
+| Parameter        | Value                                              |
+| ---------------- | -------------------------------------------------- |
+| Corpus           | 6 markdown documents (specs/eval-docs/)            |
+| Chunks           | ~42 chunks (avg ~297 tokens each)                  |
+| Total tokens     | ~12,500                                            |
+| Queries          | 24 hand-curated queries                            |
+| Difficulty levels| 4 (easy, medium, hard, fusion)                     |
+| Hardware         | Apple M2 Pro, 16 GB unified RAM, macOS 14          |
+| Embedding model  | nomic-embed-text-v1.5 (768-dim, Matryoshka)        |
+| Reranker model   | bge-reranker-v2-m3 (cross-encoder)                 |
+| BM25 params      | k1=1.2, b=0.75 (default)                           |
+| RRF constant     | k=60                                               |
+| SQLite           | WAL mode, FTS5 for BM25                            |
+
+### Difficulty Levels
+
+- **Easy (6 queries):** Single-document, keyword-rich questions with exact phrase matches.
+- **Medium (6 queries):** Paraphrased questions requiring synonym matching or light inference.
+- **Hard (6 queries):** Cross-concept queries needing semantic understanding; no direct keyword overlap.
+- **Fusion (6 queries):** Multi-document reasoning; correct answer spans 2+ documents.
+
+---
+
+## 2. Aggregate Results
+
+### 2.1 Retrieval Accuracy by Mode
+
+| Mode              | Hit@1  | Hit@3  | Hit@5  | MRR    | nDCG@5 |
+| ----------------- | ------ | ------ | ------ | ------ | ------ |
+| BM25              | 0.625  | 0.833  | 0.917  | 0.736  | 0.711  |
+| Vector            | 0.708  | 0.875  | 0.958  | 0.788  | 0.763  |
+| Hybrid (RRF)      | 0.792  | 0.917  | 0.958  | 0.849  | 0.822  |
+| Hybrid + Rerank   | 0.833  | 0.958  | 1.000  | 0.896  | 0.871  |
+
+### 2.2 Performance Comparison (ASCII)
+
+```
+nDCG@5 by Retrieval Mode
+=========================
+
+Hybrid+Rerank  |████████████████████████████████████████████▏  0.871
+Hybrid (RRF)   |█████████████████████████████████████████▏     0.822
+Vector          |██████████████████████████████████████▎        0.763
+BM25            |████████████████████████████████████▋          0.711
+               +------+------+------+------+------+------+
+               0.0   0.2    0.4    0.6    0.8    1.0
+
+
+MRR by Retrieval Mode
+======================
+
+Hybrid+Rerank  |█████████████████████████████████████████████▏ 0.896
+Hybrid (RRF)   |██████████████████████████████████████████▌    0.849
+Vector          |███████████████████████████████████████▍       0.788
+BM25            |████████████████████████████████████▊          0.736
+               +------+------+------+------+------+------+
+               0.0   0.2    0.4    0.6    0.8    1.0
+```
+
+---
+
+## 3. Results by Difficulty Level
+
+### 3.1 BM25
+
+| Difficulty | Hit@1  | Hit@3  | Hit@5  | MRR    | nDCG@5 |
+| ---------- | ------ | ------ | ------ | ------ | ------ |
+| Easy       | 1.000  | 1.000  | 1.000  | 1.000  | 1.000  |
+| Medium     | 0.667  | 0.833  | 1.000  | 0.778  | 0.741  |
+| Hard       | 0.333  | 0.667  | 0.833  | 0.500  | 0.479  |
+| Fusion     | 0.500  | 0.833  | 0.833  | 0.667  | 0.623  |
+
+### 3.2 Vector
+
+| Difficulty | Hit@1  | Hit@3  | Hit@5  | MRR    | nDCG@5 |
+| ---------- | ------ | ------ | ------ | ------ | ------ |
+| Easy       | 1.000  | 1.000  | 1.000  | 1.000  | 1.000  |
+| Medium     | 0.833  | 1.000  | 1.000  | 0.889  | 0.868  |
+| Hard       | 0.500  | 0.667  | 0.833  | 0.611  | 0.583  |
+| Fusion     | 0.500  | 0.833  | 1.000  | 0.639  | 0.601  |
+
+### 3.3 Hybrid (RRF)
+
+| Difficulty | Hit@1  | Hit@3  | Hit@5  | MRR    | nDCG@5 |
+| ---------- | ------ | ------ | ------ | ------ | ------ |
+| Easy       | 1.000  | 1.000  | 1.000  | 1.000  | 1.000  |
+| Medium     | 0.833  | 1.000  | 1.000  | 0.889  | 0.868  |
+| Hard       | 0.667  | 0.833  | 0.833  | 0.750  | 0.714  |
+| Fusion     | 0.667  | 0.833  | 1.000  | 0.759  | 0.708  |
+
+### 3.4 Hybrid + Rerank
+
+| Difficulty | Hit@1  | Hit@3  | Hit@5  | MRR    | nDCG@5 |
+| ---------- | ------ | ------ | ------ | ------ | ------ |
+| Easy       | 1.000  | 1.000  | 1.000  | 1.000  | 1.000  |
+| Medium     | 0.833  | 1.000  | 1.000  | 0.917  | 0.893  |
+| Hard       | 0.667  | 0.833  | 1.000  | 0.778  | 0.753  |
+| Fusion     | 0.833  | 1.000  | 1.000  | 0.889  | 0.839  |
+
+### Difficulty Breakdown (ASCII)
+
+```
+nDCG@5 — Hybrid+Rerank by Difficulty
+======================================
+
+Easy    |██████████████████████████████████████████████████  1.000
+Medium  |████████████████████████████████████████████▋       0.893
+Hard    |█████████████████████████████████████▋              0.753
+Fusion  |██████████████████████████████████████████▏         0.839
+        +------+------+------+------+------+------+
+        0.0   0.2    0.4    0.6    0.8    1.0
+```
+
+---
+
+## 4. Latency Summary
+
+| Mode              | Median (ms) | p95 (ms) | p99 (ms) |
+| ----------------- | ----------- | -------- | -------- |
+| BM25              | 3           | 8        | 14       |
+| Vector            | 28          | 42       | 58       |
+| Hybrid (RRF)      | 45          | 68       | 89       |
+| Hybrid + Rerank   | 112         | 158      | 203      |
+
+> BM25 and vector searches run in parallel during hybrid mode; the RRF merge
+> adds < 1 ms overhead. Reranking is the dominant cost at ~65 ms median for
+> top-10 candidate re-scoring.
+
+---
+
+## 5. Comparison vs. Baselines
+
+| System                       | nDCG@5 | MRR   | p50 Latency (ms) |
+| ---------------------------- | ------ | ----- | ----------------- |
+| BM25 only (FTS5)             | 0.711  | 0.736 | 3                 |
+| Vector only (cosine)         | 0.763  | 0.788 | 28                |
+| Naive concat (BM25 + Vector) | 0.781  | 0.810 | 35                |
+| **KINDX Hybrid (RRF)**       | **0.822** | **0.849** | **45**       |
+| **KINDX Hybrid + Rerank**    | **0.871** | **0.896** | **112**      |
+
+**Naive concat** merges BM25 and vector result lists by simple interleaving
+without score normalization. RRF's rank-based fusion provides a +5.2%
+nDCG@5 improvement over naive concat, and cross-encoder reranking adds
+another +6.0%.
+
+---
+
+## 6. Analysis
+
+### Why Hybrid + Rerank Outperforms
+
+1. **Complementary recall.** BM25 excels at exact keyword matching (easy
+   queries score 1.000 across the board), while vector search captures
+   semantic similarity for paraphrased and conceptual queries. Reciprocal
+   Rank Fusion combines both signals without requiring score calibration,
+   ensuring that a document surfaced by *either* method is considered.
+
+2. **RRF normalizes heterogeneous scores.** BM25 scores are unbounded TF-IDF
+   values; cosine similarity scores fall in [-1, 1]. Rather than attempting
+   brittle min-max normalization, RRF operates on rank positions alone
+   (score = 1/(k + rank)), making it robust to score distribution differences.
+
+3. **Cross-encoder reranking refines the top-k.** The bge-reranker-v2-m3
+   cross-encoder jointly attends to the query and each candidate passage,
+   capturing fine-grained token interactions that bi-encoder dot products
+   miss. This is especially impactful for:
+   - **Hard queries** (nDCG@5 jumps from 0.714 to 0.753) where subtle
+     semantic distinctions matter.
+   - **Fusion queries** (nDCG@5 jumps from 0.708 to 0.839) where multi-hop
+     reasoning across documents benefits from contextual re-scoring.
+
+4. **Small corpus amplifies reranker gains.** With only ~42 chunks, the
+   reranker processes all plausible candidates, avoiding the recall ceiling
+   that limits reranking on larger corpora where top-k truncation discards
+   relevant passages before re-scoring.
+
+### Failure Modes
+
+- **BM25 on hard queries** (nDCG@5 = 0.479): queries deliberately avoid
+  corpus vocabulary, causing BM25 to retrieve lexically similar but
+  semantically irrelevant chunks.
+- **Vector on fusion queries** (nDCG@5 = 0.601): the embedding model
+  struggles with multi-hop queries that require combining evidence from
+  distinct documents with different topic embeddings.
+- **Hybrid without rerank on fusion queries** (nDCG@5 = 0.708): RRF
+  surfaces the right documents but in suboptimal order; the reranker
+  corrects ranking, pushing nDCG@5 to 0.839.
+
+---
+
+## 7. Conclusions
+
+1. **Hybrid retrieval is the recommended default.** RRF fusion of BM25 and
+   vector search delivers a +15.6% nDCG@5 improvement over BM25 alone at a
+   median latency cost of only +42 ms.
+
+2. **Reranking is worth the cost for quality-sensitive use cases.** Adding
+   the cross-encoder reranker brings an additional +6.0% nDCG@5 at +67 ms
+   median latency. For interactive use (< 200 ms budget), this is acceptable.
+
+3. **BM25 remains the best choice for latency-critical paths** (autocomplete,
+   incremental search) where 3 ms median response time is essential.
+
+4. **Perfect Hit@5 = 1.000 with Hybrid + Rerank** means the correct document
+   always appears in the top 5 results for this evaluation corpus, providing
+   a strong foundation for downstream LLM answer generation.
+
+5. **Scaling considerations:** These results are on a small corpus (~42 chunks).
+   As corpus size grows, reranker gains may diminish if top-k retrieval
+   truncation drops relevant passages before re-scoring. The latency report
+   (latency-report.md) provides guidance for larger corpora.
+
+---
+
+*Generated by `run-eval.sh` against KINDX 1.0.1 on 2026-03-13.*