diff --git a/README.md b/README.md index a0b44ca..e872e3c 100644 --- a/README.md +++ b/README.md @@ -144,29 +144,94 @@ Cortex Memory includes a powerful web-based dashboard (`cortex-mem-insights`) th Interactive Dashboard: Get an overview of memory usage, system health, and activity statistics

-
-
- Memory Management -

Memory Management: View and manage individual memory records

-
-
- Optimization Panel -

Optimization Tools: Analyze and optimize memory quality

-
+
+ + + + + + + + + + + + + +

Memory Management: View and manage individual memory records

+

Optimization Tools: Analyze and optimize memory quality

+
snapshot-1snapshot-2

System Monitor: Monitor memory performance and activity

+

Analytics Dashboard: Detailed insights and trends over time

+
snapshot-1snapshot-2
-
-
- System Monitor -

System Monitor: Monitor memory performance and activity

-
-
- Analytics -

Analytics Dashboard: Detailed insights and trends over time

-
+These visual tools help you understand how Cortex Memory is performing and how your AI agent's memory is evolving over time. + +# 🏆 Benchmark + +Cortex Memory has been rigorously evaluated against LangMem using the **LOCOMO dataset** (50 conversations, 150 questions) through a standardized memory system evaluation framework. The results demonstrate Cortex Memory's superior performance across multiple dimensions. + +## Performance Comparison + +

+ Cortex Memory vs LangMem Benchmark +

+ +

+ Overall Performance: Cortex Memory significantly outperforms LangMem across all key metrics +

+ +### Key Metrics + +| Metric | Cortex Memory | LangMem | Improvement | +|--------|---------------|---------|-------------| +| **Recall@1** | 93.33% | 26.32% | **+67.02pp** | +| **Recall@3** | 94.00% | 50.00% | +44.00pp | +| **Recall@5** | 94.67% | 55.26% | +39.40pp | +| **Recall@10** | 94.67% | 63.16% | +31.51pp | +| **Precision@1** | 93.33% | 26.32% | +67.02pp | +| **MRR** | 93.72% | 38.83% | **+54.90pp** | +| **NDCG@5** | 80.73% | 18.72% | **+62.01pp** | +| **NDCG@10** | 79.41% | 16.83% | **+62.58pp** | + +### Detailed Results + +
+ + + + + + + +

Cortex Memory Evaluation: Excellent retrieval performance with 93.33% Recall@1 and 93.72% MRR

+

LangMem Evaluation: Modest performance with 26.32% Recall@1 and 38.83% MRR

+
Cortex Memory EvaluationLangMem Evaluation
-These visual tools help you understand how Cortex Memory is performing and how your AI agent's memory is evolving over time. +### Key Findings + +1. **Significantly Improved Retrieval Accuracy**: Cortex Memory achieves **93.33% Recall@1**, a **67.02 percentage point improvement** over LangMem's 26.32%. This indicates Cortex is far superior at retrieving relevant memories on the first attempt. + +2. **Clear Ranking Quality Advantage**: Cortex Memory's **MRR of 93.72%** vs LangMem's **38.83%** shows it not only retrieves accurately but also ranks relevant memories higher in the result list. + +3. **Comprehensive Performance Leadership**: Across all metrics — especially **NDCG@5 (80.73% vs 18.72%)** — Cortex demonstrates consistent, significant advantages in retrieval quality, ranking accuracy, and overall performance. + +4. **Technical Advantages**: Cortex Memory's performance is attributed to: + - Efficient **Rust-based implementation** + - Powerful retrieval capabilities of **Qdrant vector database** + - Optimized memory management strategies + +### Evaluation Framework + +The benchmark uses a professional memory system evaluation framework located in `examples/lomoco-evaluation`, which includes: + +- **Professional Metrics**: Recall@K, Precision@K, MRR, NDCG, and answer quality metrics +- **Enhanced Dataset**: 50 conversations with 150 questions covering various scenarios +- **Statistical Analysis**: 95% confidence intervals, standard deviation, and category-based statistics +- **Multi-System Support**: Supports comparison between Cortex Memory, LangMem, and Simple RAG baselines + +For more details on running the evaluation, see the [lomoco-evaluation README](examples/lomoco-evaluation/README.md). # 🧠 How It Works diff --git a/assets/benchmark/cortex_mem_vs_langmem.png b/assets/benchmark/cortex_mem_vs_langmem.png new file mode 100644 index 0000000..a9f86ef Binary files /dev/null and b/assets/benchmark/cortex_mem_vs_langmem.png differ diff --git a/assets/benchmark/evaluation_cortex_mem.webp b/assets/benchmark/evaluation_cortex_mem.webp new file mode 100644 index 0000000..96692eb Binary files /dev/null and b/assets/benchmark/evaluation_cortex_mem.webp differ diff --git a/assets/benchmark/evaluation_langmem.webp b/assets/benchmark/evaluation_langmem.webp new file mode 100644 index 0000000..891ff07 Binary files /dev/null and b/assets/benchmark/evaluation_langmem.webp differ