-
-
Notifications
You must be signed in to change notification settings - Fork 144
14 Memory Quality System Guide
📌 Navigation: Home | Installation | Features | Dashboard | ONNX Deep Dive | Metadata Compression Version: 8.48.0+ (Updated Dec 7, 2025) Status: ✅ Production Ready Feature: Memento-Inspired Quality System (Issue #260) Last Updated: December 7, 2025
- Overview
- What's New in v8.45-v8.48
- How It Works
- Installation & Setup
- Using the Quality System
- Quality-Based Memory Management
- Association-Based Quality Boost
- Privacy & Cost
- Performance Benchmarks
- Platform Support
- Troubleshooting
- Best Practices
The Memory Quality System transforms MCP Memory Service from static storage to a learning memory system. It automatically evaluates memory quality using AI-driven scoring and uses these scores to improve retrieval precision, consolidation efficiency, and overall system intelligence.
- ✅ 40-70% improvement in retrieval precision (top-5 useful rate: 50% → 70-85%)
- ✅ Zero cost with local SLM (privacy-preserving, offline-capable)
- ✅ Smarter consolidation - Preserve high-quality memories longer
- ✅ Quality-boosted search - Prioritize best memories in results
- ✅ Network intelligence - Well-connected memories automatically boosted (v8.47.0+)
- ✅ Cloud-ready - 78% metadata compression for sync (v8.48.0+)
- ✅ Automatic learning - System improves from usage patterns
🎯 Problem Solved: Cloudflare D1 10KB metadata limit causing sync failures
- CSV-Based Metadata Compression: 78% size reduction (732B → 159B typical)
- 100% Sync Success: Resolved all metadata size limit errors (0 failures, down from 278)
- Transparent Operation: Automatic compression/decompression in hybrid backend
- Metadata Validation: Pre-sync size checks prevent API failures before they occur
- 3-Phase Roadmap: Phase 1 (CSV) complete, Phase 2 (binary) and Phase 3 (deduplication) available
📖 See: Metadata Compression System Guide for complete details
Critical Bug Fixes:
- ONNX Self-Match Bug: Fixed artificially inflated scores (~1.0 for all) by generating queries from tags/metadata
- Realistic Distribution: Now produces 42.9% high, 3.2% medium, 53.9% low (avg 0.468 vs 1.000)
- Association Pollution: Filters 948 system-generated memories (type='association', 'compressed_cluster')
- Sync Queue Overflow: Increased capacity 1,000→2,000, batch size 50→100 (0% failure rate)
- Batch Optimization: 50-100x speedup for consolidation relevance updates
📖 See: ONNX Quality Evaluation Deep Dive for technical details
Network Effect Intelligence:
- Well-connected memories (≥5 associations) automatically receive 20% quality boost
- Full audit trail:
quality_boost_applied,quality_boost_date,quality_boost_connection_count - Configurable: 3 environment variables with validation (boost enabled by default)
- Impact: ~4% relevance increase, potential retention tier promotion (medium→high)
- 5 comprehensive test cases (100% pass rate)
📖 See: Association-Based Quality Boost section below
3-Phase Integration:
- Phase 1: Hooks read
backendQualityfrom metadata (20% scoring weight) - Phase 2: Session-end hook triggers async quality evaluation
- Phase 3: Quality-boosted search with configurable weights
Platform Fixes:
- Windows hooks installer encoding fix (UTF-8 console configuration)
- Session-start hook crash fix (missing
queryMemoriesByTagsAndTime()function) - Quality score persistence in hybrid backend (Cloudflare metadata normalization)
📖 See: Quality + Hooks Integration Guide for workflows
Infrastructure:
- HTTP API router fix (404 errors on
/api/quality/*endpoints) - ONNX model export fix (dynamic export from transformers, offline mode support)
- Dashboard dark mode improvements (Chart.js integration, form controls)
- Quality distribution MCP tool fix (storage method call correction)
The system evaluates memory quality (0.0-1.0 score) using a multi-tier fallback chain:
| Tier | Provider | Cost | Latency | Privacy | Default |
|---|---|---|---|---|---|
| 1 | Local SLM (ONNX) | $0 | 50-100ms | ✅ Full | ✅ Yes |
| 2 | Groq API | ~$0.30/mo | 900ms | ❌ External | ❌ Opt-in |
| 3 | Gemini API | ~$0.40/mo | 2000ms | ❌ External | ❌ Opt-in |
| 4 | Implicit Signals | $0 | 10ms | ✅ Full | Fallback |
Default setup: Local SLM only (zero cost, full privacy, no external API calls)
quality_score = (
local_slm_score × 0.50 + # Cross-encoder evaluation
implicit_signals × 0.50 # Usage patterns
)
implicit_signals = (
access_frequency × 0.40 + # How often retrieved
recency × 0.30 + # When last accessed
retrieval_ranking × 0.30 # Average position in results
)
Model: ms-marco-MiniLM-L-6-v2 (23MB)
Architecture: Cross-encoder (processes query + memory together)
Performance:
- CPU: 50-100ms per evaluation (7-16ms in practice)
- GPU (CUDA/MPS/DirectML): 10-20ms per evaluation
Scoring Process:
- Tokenize:
[CLS] query [SEP] memory [SEP] - Run ONNX inference (local, private)
- Return relevance score 0.0-1.0
Important: Cross-encoders score query-memory relevance, not absolute quality. Queries are generated from tags/metadata (what memory is about) to avoid self-match bias. See ONNX Deep Dive for details.
GPU Acceleration (automatic):
- CUDA (NVIDIA)
- CoreML/MPS (Apple Silicon)
- DirectML (Windows)
- ROCm (AMD on Linux)
- CPU fallback (always works)
Zero configuration required - The quality system works out of the box with local SLM:
# Install MCP Memory Service (if not already installed)
pip install mcp-memory-service
# Quality system is enabled by default with local SLM
# No API keys needed, no external callsFirst-time model download (automatic):
- Model exports from HuggingFace transformers to ONNX on first use
- Saved to
~/.cache/mcp_memory/onnx_models/ms-marco-MiniLM-L-6-v2/ - Supports offline/air-gapped environments (
local_files_only=True)
If you want cloud-based scoring (Groq or Gemini):
# Enable Groq API (fast, cheap)
export GROQ_API_KEY="your-groq-api-key"
export MCP_QUALITY_AI_PROVIDER=groq # or "auto" to try all tiers
# Enable Gemini API (Google)
export GOOGLE_API_KEY="your-gemini-api-key"
export MCP_QUALITY_AI_PROVIDER=gemini# Quality System Core
export MCP_QUALITY_SYSTEM_ENABLED=true # Default: true
export MCP_QUALITY_AI_PROVIDER=local # local|groq|gemini|auto|none
# Local SLM Configuration (Tier 1)
export MCP_QUALITY_LOCAL_MODEL=ms-marco-MiniLM-L-6-v2 # Model name
export MCP_QUALITY_LOCAL_DEVICE=auto # auto|cpu|cuda|mps|directml|rocm
# Quality-Boosted Search (Opt-In)
export MCP_QUALITY_BOOST_ENABLED=false # Default: false (opt-in)
export MCP_QUALITY_BOOST_WEIGHT=0.3 # 0.0-1.0 (30% quality, 70% semantic)
# Quality-Based Retention (Consolidation)
export MCP_QUALITY_RETENTION_HIGH=365 # Days for quality ≥0.7
export MCP_QUALITY_RETENTION_MEDIUM=180 # Days for quality 0.5-0.7
export MCP_QUALITY_RETENTION_LOW_MIN=30 # Min days for quality <0.5
export MCP_QUALITY_RETENTION_LOW_MAX=90 # Max days for quality <0.5
# Association-Based Quality Boost (v8.47.0+)
export MCP_CONSOLIDATION_QUALITY_BOOST_ENABLED=true # Default: true
export MCP_CONSOLIDATION_MIN_CONNECTIONS_FOR_BOOST=5 # Default: 5 (range: 1-100)
export MCP_CONSOLIDATION_QUALITY_BOOST_FACTOR=1.2 # Default: 1.2 = 20% boost (range: 1.0-2.0)
# Hybrid Backend Sync (v8.47.1+)
export MCP_HYBRID_QUEUE_SIZE=2000 # Default: 2000 (was 1000)
export MCP_HYBRID_BATCH_SIZE=100 # Default: 100 (was 50)Quality scores are calculated automatically when memories are retrieved:
# Normal retrieval - quality scoring happens in background
claude /memory-recall "what did I work on yesterday"
# Quality score is updated in metadata (non-blocking)Override AI scores with manual ratings:
# Rate a memory (MCP tool)
rate_memory(
content_hash="abc123...",
rating=1, # -1 (bad), 0 (neutral), 1 (good)
feedback="This was very helpful!"
)
# Manual ratings weighted 60%, AI scores weighted 40%HTTP API:
curl -X POST http://127.0.0.1:8000/api/quality/memories/{hash}/rate \
-H "Content-Type: application/json" \
-d '{"rating": 1, "feedback": "Helpful!"}'Async Evaluation Endpoint (v8.46.0+):
# Trigger AI evaluation on specific memory
curl -X POST http://127.0.0.1:8000/api/quality/memories/{hash}/evaluate
# Returns: quality_score, quality_provider, ai_score, evaluation_time_ms
# Performance: ~355ms with ONNX rankerEnable quality-based reranking for better results:
Method 1: Global Configuration
export MCP_QUALITY_BOOST_ENABLED=true
claude /memory-recall "search query" # Uses quality boostMethod 2: Per-Query (MCP Tool)
# Search with quality boost (MCP tool)
retrieve_with_quality_boost(
query="search query",
n_results=10,
quality_weight=0.3 # 30% quality, 70% semantic
)Method 3: HTTP API (v8.46.0+)
curl -X POST http://127.0.0.1:8000/api/search \
-H "Content-Type: application/json" \
-d '{"query": "search query", "quality_boost": true, "quality_weight": 0.3}'Algorithm:
- Over-fetch 3× candidates (30 results for top 10)
- Rerank by:
(1-weight) × semantic_similarity + weight × quality_score - Return top N results with
search_type: "semantic_quality_boost"
Performance: <100ms total (50ms semantic search + 20ms reranking + 30ms quality scoring)
MCP Tool:
get_memory_quality(content_hash="abc123...")
# Returns:
# - quality_score: Current composite score (0.0-1.0)
# - quality_provider: Which tier scored it (ONNXRankerModel, etc.)
# - access_count: Number of retrievals
# - last_accessed_at: Last access timestamp
# - ai_scores: Historical AI evaluation scores
# - user_rating: Manual rating if presentHTTP API:
curl http://127.0.0.1:8000/api/quality/memories/{hash}MCP Tool:
analyze_quality_distribution(min_quality=0.0, max_quality=1.0)
# Returns:
# - total_memories: Total count
# - high_quality_count: Score ≥0.7
# - medium_quality_count: 0.5 ≤ score < 0.7
# - low_quality_count: Score < 0.5
# - average_score: Mean quality score
# - provider_breakdown: Count by provider (onnx_local, groq, gemini, implicit)
# - top_10_memories: Highest scoring
# - bottom_10_memories: Lowest scoringHTTP API:
# Distribution statistics
curl http://127.0.0.1:8000/api/quality/distribution
# Time series trends (weekly/monthly)
curl http://127.0.0.1:8000/api/quality/trendsDashboard (http://127.0.0.1:8000/) - v8.45.2+ Dark Mode Support:
- Quality badges on all memory cards (color-coded by tier: 🟢🟡🔴⚪)
- Analytics view with distribution charts (bar + pie)
- Provider breakdown visualization
- Top/bottom performers lists
- Dark mode Chart.js integration with proper contrast
- Settings panel for quality configuration
High-quality memories are preserved longer during consolidation:
| Quality Tier | Score Range | Retention Period |
|---|---|---|
| High | ≥0.7 | 365 days inactive |
| Medium | 0.5-0.7 | 180 days inactive |
| Low | <0.5 | 30-90 days inactive (scaled by score) |
How it works:
- Weekly consolidation scans inactive memories
- Applies quality-based thresholds
- Archives low-quality memories sooner
- Preserves high-quality memories longer
High-quality memories decay slower in relevance scoring:
decay_multiplier = 1.0 + (quality_score × 0.5)
# High quality (0.9): 1.45× multiplier
# Medium quality (0.5): 1.25× multiplier
# Low quality (0.2): 1.10× multiplier
final_relevance = base_relevance × decay_multiplier
Effect: High-quality memories stay relevant 3× longer in search results.
Well-connected memories automatically receive quality score boosts based on the network effect principle: frequently referenced memories are likely more valuable.
Trigger Condition: Memory has ≥5 associations (configurable) Boost Amount: 20% quality increase (configurable 1.0-2.0×) Timing: Applied during weekly consolidation Cap: Quality scores capped at 1.0 (prevents over-promotion)
Example:
Original quality: 0.65 (medium tier)
Connections: 8 associations (≥5 threshold)
Boost factor: 1.2 (20% increase)
New quality: 0.65 × 1.2 = 0.78 (high tier) ✅
Impact:
- Retention: 180 days → 365 days
- Relevance: ~4% increase in search ranking
- Tier promotion: Medium → High
# Enable/disable association boost (default: enabled)
export MCP_CONSOLIDATION_QUALITY_BOOST_ENABLED=true
# Minimum connections required (default: 5, range: 1-100)
export MCP_CONSOLIDATION_MIN_CONNECTIONS_FOR_BOOST=5
# Boost multiplier (default: 1.2 = 20%, range: 1.0-2.0)
export MCP_CONSOLIDATION_QUALITY_BOOST_FACTOR=1.2Preset Configurations:
| Profile | Min Connections | Boost Factor | Use Case |
|---|---|---|---|
| Conservative | 10 | 1.1 (10%) | Only boost highly connected memories |
| Balanced (default) | 5 | 1.2 (20%) | Recommended for most users |
| Aggressive | 3 | 1.3 (30%) | Maximize network effect influence |
Every boosted memory receives comprehensive tracking:
{
"quality_boost_applied": true,
"quality_boost_date": "2025-12-07T10:30:00Z",
"quality_boost_reason": "association_connections",
"quality_boost_connection_count": 8,
"original_quality_before_boost": 0.65
}Benefits:
- Transparency: See exactly why quality changed
- Analysis: Identify well-connected knowledge clusters
- Debugging: Understand quality score evolution
- Audit: Verify boost application during consolidation
Before Boost (quality: 0.65):
- Tier: Medium
- Retention: 180 days inactive
- Relevance decay: 1.325× multiplier
- Forgetting likelihood: Moderate
After Boost (quality: 0.78):
- Tier: High ✅
- Retention: 365 days inactive (+185 days)
- Relevance decay: 1.39× multiplier (+0.065)
- Forgetting likelihood: Low
Relevance Score Calculation:
# Quality boost applied BEFORE quality multiplier calculation
boosted_quality = original_quality * boost_factor # 0.65 → 0.78
quality_multiplier = 1.0 + (boosted_quality * 0.5) # 1.39
final_relevance = base_relevance * quality_multiplier- Knowledge Graphs: Central concepts referenced by many notes
- Code Documentation: Core functions called by multiple modules
- Research Notes: Key papers cited in multiple summaries
- Project Planning: Milestone memories linked to many tasks
- Computation Time: 5-10 microseconds per memory (negligible)
- Memory Overhead: ~200 bytes per boosted memory (5 metadata fields)
- Consolidation Impact: No measurable increase in duration
-
Integration Point:
ExponentialDecayCalculator._calculate_memory_relevance()
Check which memories received boosts:
# Search for boosted memories
curl -X POST http://127.0.0.1:8000/api/search/by-metadata \
-H "Content-Type: application/json" \
-d '{"key": "quality_boost_applied", "value": true}'
# View boost distribution
curl http://127.0.0.1:8000/api/quality/distribution | jq '.boosted_count'| Mode | Configuration | Privacy | Cost |
|---|---|---|---|
| Local Only | MCP_QUALITY_AI_PROVIDER=local |
✅ Full (no external calls) | $0 |
| Hybrid | MCP_QUALITY_AI_PROVIDER=auto |
~$0.30/mo | |
| Cloud | MCP_QUALITY_AI_PROVIDER=groq |
❌ External API | ~$0.30/mo |
| Implicit Only | MCP_QUALITY_AI_PROVIDER=none |
✅ Full (no AI) | $0 |
| Provider | Monthly Cost | Notes |
|---|---|---|
| Local SLM | $0 | Free forever, runs locally |
| Groq (Kimi K2) | ~$0.30-0.50 | Fast, good quality |
| Gemini Flash | ~$0.40-0.60 | Slower, free tier available |
| Implicit Only | $0 | No AI scoring, usage patterns only |
Recommendation: Use default local SLM (zero cost, full privacy, fast).
| Operation | Latency | Notes |
|---|---|---|
| Local SLM Scoring (CPU) | 7-16ms | Per memory evaluation (real-world) |
| Local SLM Scoring (GPU) | 10-20ms | With CUDA/MPS/DirectML |
| Quality-Boosted Search | <100ms | Over-fetch + rerank |
| Implicit Signals | <10ms | Always fast |
| Quality Metadata Update | <5ms | Storage backend write |
| Association Boost Calc | 5-10μs | Negligible overhead |
| Operation | Scale | Performance | Notes |
|---|---|---|---|
| Bulk ONNX Evaluation | 3,750 memories | ~60s total | ~16ms per memory |
| Batch Consolidation Updates | 4,478 updates | 50-100× speedup | Single transaction |
| Sync Queue Processing | 2,000 queue size | 0% failure rate | Was 27.8% at 1,000 |
| Metric | Before | After | Improvement |
|---|---|---|---|
| Typical Metadata Size | 732B | 159B | 78% reduction |
| Compression Overhead | N/A | <1ms | Negligible |
| Sync Success Rate | 72.2% | 100% | Zero failures |
| Cloudflare Sync Failures | 278 | 0 | Resolved |
Target Metrics:
- Quality calculation overhead: <10ms ✅
- Search latency with boost: <100ms total ✅
- No user-facing blocking (async scoring) ✅
| Platform | CPU | GPU Acceleration | Status |
|---|---|---|---|
| Windows | ✅ All x64 CPUs | CUDA, DirectML | ✅ Fully Supported |
| macOS | ✅ Intel & Apple Silicon | MPS (Metal) | ✅ Fully Supported |
| Linux | ✅ All x64 CPUs | CUDA, ROCm | ✅ Fully Supported |
| Hardware | Technology | Installation | Performance |
|---|---|---|---|
| NVIDIA | CUDA | pip install onnxruntime-gpu |
10-20ms (5-10× faster) |
| Apple Silicon | MPS | Built-in (onnxruntime) | 10-20ms (5-10× faster) |
| AMD (Windows) | DirectML | pip install onnxruntime-directml |
15-25ms (3-5× faster) |
| AMD (Linux) | ROCm | pip install onnxruntime-rocm |
15-25ms (3-5× faster) |
| CPU Fallback | All platforms | No extra install | 50-100ms (always works) |
Device Auto-Detection: Set MCP_QUALITY_LOCAL_DEVICE=auto (default) for automatic GPU selection with CPU fallback.
Full offline support with cached HuggingFace models:
# Pre-cache model on internet-connected machine
python -c "from transformers import AutoModelForSequenceClassification, AutoTokenizer; \
AutoTokenizer.from_pretrained('cross-encoder/ms-marco-MiniLM-L-6-v2'); \
AutoModelForSequenceClassification.from_pretrained('cross-encoder/ms-marco-MiniLM-L-6-v2')"
# Copy ~/.cache/huggingface/ to air-gapped machine
# Use local_files_only mode
export HF_HOME=~/.cache/huggingface
# System automatically tries local_files_only=True firstSymptom: quality_provider: ImplicitSignalsEvaluator (should be ONNXRankerModel)
Fixes:
-
Check ONNX Runtime installed:
pip install onnxruntime # For GPU: pip install onnxruntime-gpu (CUDA) # For DirectML: pip install onnxruntime-directml (Windows AMD)
-
Check model downloaded:
ls ~/.cache/mcp_memory/onnx_models/ms-marco-MiniLM-L-6-v2/ # Should contain: model.onnx, tokenizer.json, config.json
-
Check logs for errors:
tail -f logs/mcp_memory_service.log | grep -i "quality\|onnx"
-
Verify transformers/torch installed (required for model export):
pip install transformers torch
Symptom: All memories have quality ~1.0 (unrealistic distribution)
Cause: ONNX self-match bug (using memory content as its own query)
Fix: Upgrade to v8.47.1+ which generates queries from tags/metadata:
pip install --upgrade mcp-memory-service
# Reset scores if needed
python scripts/quality/reset_onnx_scores.pyVerification:
# Check distribution (should be realistic)
curl http://127.0.0.1:8000/api/quality/distribution
# Expected: ~40-50% high, ~0-10% medium, ~40-50% low
# Not: 100% high qualitySymptom: All memories have quality_score: 0.5 (neutral default)
Cause: Quality scoring not triggered yet (memories haven't been retrieved)
Fix: Retrieve memories to trigger scoring:
claude /memory-recall "any search query"
# Quality scoring happens in background after retrieval
# Or trigger bulk evaluation
python scripts/quality/bulk_evaluate_onnx.pySymptom: Local SLM uses CPU despite having GPU
Fixes:
-
Install GPU-enabled ONNX Runtime:
# NVIDIA CUDA pip install onnxruntime-gpu # DirectML (Windows AMD/Intel) pip install onnxruntime-directml # ROCm (Linux AMD) pip install onnxruntime-rocm
-
Force device selection:
export MCP_QUALITY_LOCAL_DEVICE=cuda # or mps, directml, rocm
-
Verify GPU availability:
# NVIDIA nvidia-smi # AMD (Linux) rocm-smi # Apple Silicon (built-in MPS) system_profiler SPDisplaysDataType | grep "Chipset Model"
Symptom: Search results don't show quality reranking
Checks:
-
Verify enabled:
echo $MCP_QUALITY_BOOST_ENABLED # Should be "true"
-
Use explicit MCP tool:
retrieve_with_quality_boost(query="test", quality_weight=0.5) -
Check debug info in results:
result.debug_info['reranked'] # Should be True result.debug_info['quality_score'] # Should exist result.debug_info['search_type'] # Should be "semantic_quality_boost"
Symptom: operations_failed > 0 in sync status, 400 Bad Request errors
Cause: Metadata size exceeds Cloudflare D1 10KB limit
Fix: Upgrade to v8.48.0+ which implements CSV compression:
pip install --upgrade mcp-memory-service
# Verify compression working
bash verify_compression.sh
# Expected output:
# Failed: 0 (should be 0) ✅
# No compression warnings (good!) ✅Symptom: Well-connected memories not receiving quality boost
Checks:
-
Verify boost enabled:
echo $MCP_CONSOLIDATION_QUALITY_BOOST_ENABLED # Should be "true"
-
Check connection count threshold:
echo $MCP_CONSOLIDATION_MIN_CONNECTIONS_FOR_BOOST # Default: 5
-
Trigger consolidation manually:
curl -X POST http://127.0.0.1:8000/api/consolidation/trigger \ -H "Content-Type: application/json" \ -d '{"time_horizon": "weekly"}'
-
Verify boost metadata:
# Check memory metadata for boost tracking curl http://127.0.0.1:8000/api/quality/memories/{hash} | jq '.quality_boost_applied'
Use local SLM (default) for:
- Zero cost
- Full privacy
- Offline capability
- Good accuracy (realistic quality distribution after v8.47.1 fix)
# Week 1: Collect quality scores (boost disabled)
export MCP_QUALITY_BOOST_ENABLED=false
# Week 2: Test with low weight
export MCP_QUALITY_BOOST_ENABLED=true
export MCP_QUALITY_BOOST_WEIGHT=0.2 # 20% quality
# Week 3+: Increase if helpful
export MCP_QUALITY_BOOST_WEIGHT=0.3 # 30% quality (recommended)Check analytics regularly:
analyze_quality_distribution()
# Target distribution (realistic after v8.47.1):
# - High quality (≥0.7): 40-50% of memories
# - Medium quality (0.5-0.7): 0-10%
# - Low quality (<0.5): 40-50%
# This is NORMAL and HEALTHY - cross-encoder scores query-memory relevanceNote: Distribution depends on query generation strategy. v8.47.1+ uses tags/metadata which produces more polarized but realistic scores.
Rate important memories manually:
# After finding a very helpful memory
rate_memory(content_hash="abc123...", rating=1, feedback="Critical info!")
# After finding unhelpful memory
rate_memory(content_hash="def456...", rating=-1, feedback="Outdated")Manual ratings weighted 60%, AI scores 40%.
For knowledge-intensive workflows:
# Enable association boost (default: true)
export MCP_CONSOLIDATION_QUALITY_BOOST_ENABLED=true
# Lower threshold for aggressive boost
export MCP_CONSOLIDATION_MIN_CONNECTIONS_FOR_BOOST=3
# Run weekly consolidation
curl -X POST http://127.0.0.1:8000/api/consolidation/trigger \
-d '{"time_horizon": "weekly"}'For large-scale operations:
# Increase queue size for bulk operations (v8.47.1)
export MCP_HYBRID_QUEUE_SIZE=2000 # Default: 2000
export MCP_HYBRID_BATCH_SIZE=100 # Default: 100
# Metadata compression enabled by default (v8.48.0)
# Verify compression working:
bash verify_compression.shMonthly checklist:
- Check quality distribution (analytics dashboard)
- Verify realistic distribution (~40-50% high after v8.47.1)
- Review top 10 performers (should be genuinely helpful)
- Review bottom 10 (candidates for deletion)
- Verify provider breakdown (mostly
onnx_local) - Check average quality score (target: 0.4-0.6 is normal)
- Monitor association boost application (if enabled)
- Verify sync success rate (should be 100% after v8.48.0)
# Conservative: Preserve longer
export MCP_QUALITY_RETENTION_HIGH=730 # 2 years for high quality
export MCP_QUALITY_RETENTION_MEDIUM=365 # 1 year for medium
export MCP_QUALITY_RETENTION_LOW_MIN=90 # 90 days minimum for low
# Aggressive: Archive sooner
export MCP_QUALITY_RETENTION_HIGH=180 # 6 months for high
export MCP_QUALITY_RETENTION_MEDIUM=90 # 3 months for medium
export MCP_QUALITY_RETENTION_LOW_MIN=14 # 2 weeks minimum for low# Semantic-first (default)
export MCP_QUALITY_BOOST_WEIGHT=0.3 # 30% quality, 70% semantic
# Balanced
export MCP_QUALITY_BOOST_WEIGHT=0.5 # 50% quality, 50% semantic
# Quality-first
export MCP_QUALITY_BOOST_WEIGHT=0.7 # 70% quality, 30% semanticRecommendation: Start with 0.3, increase if quality boost improves results.
# Conservative (only highly connected)
export MCP_CONSOLIDATION_MIN_CONNECTIONS_FOR_BOOST=10
export MCP_CONSOLIDATION_QUALITY_BOOST_FACTOR=1.1 # 10% boost
# Balanced (default)
export MCP_CONSOLIDATION_MIN_CONNECTIONS_FOR_BOOST=5
export MCP_CONSOLIDATION_QUALITY_BOOST_FACTOR=1.2 # 20% boost
# Aggressive (maximize network effect)
export MCP_CONSOLIDATION_MIN_CONNECTIONS_FOR_BOOST=3
export MCP_CONSOLIDATION_QUALITY_BOOST_FACTOR=1.3 # 30% boostUse local SLM primarily, cloud APIs as fallback:
export MCP_QUALITY_AI_PROVIDER=auto # Try all available tiers
export GROQ_API_KEY="your-key" # Groq as Tier 2 fallbackBehavior:
- Try local SLM (99% success rate after v8.45.3 fixes)
- If fails, try Groq API
- If fails, try Gemini API
- Ultimate fallback: Implicit signals only
From Issue #260 and #261 roadmap:
| Metric | Target | Status | Notes |
|---|---|---|---|
| Retrieval Precision | >70% useful (top-5) | ✅ Achieved | Up from ~50% baseline |
| Quality Coverage | >30% memories scored | ✅ Achieved | 3,750 memories scored (v8.47.1) |
| Quality Distribution | Realistic spread | ✅ Achieved | 42.9% high, 3.2% med, 53.9% low (v8.47.1) |
| Search Latency | <100ms with boost | ✅ Achieved | 50ms semantic + 20ms rerank + 30ms scoring |
| Monthly Cost | <$0.50 or $0 | ✅ Achieved | $0 with local SLM default |
| Local SLM Usage | >95% of scoring | ✅ Achieved | 99%+ with v8.45.3 fixes |
| Sync Success Rate | >95% | ✅ Achieved | 100% with v8.48.0 compression |
A: No! The default local SLM works with zero configuration, no API keys, and no external calls.
A: $0 with the default local SLM. Optional cloud APIs cost ~$0.30-0.50/month for typical usage.
A: No. Scoring happens asynchronously in the background. Quality-boosted search adds <20ms overhead.
A: Yes, set MCP_QUALITY_SYSTEM_ENABLED=false. System works normally without quality scores.
A: Produces realistic quality distributions (42.9% high, 3.2% medium, 53.9% low) after v8.47.1 self-match bug fix. Scores query-memory relevance, not absolute quality.
A: System falls back to implicit signals (access patterns). No failures, degraded gracefully.
A: Yes! Implement the QualityEvaluator interface and configure via MCP_QUALITY_AI_PROVIDER.
A: Yes! Local SLM works fully offline. Supports air-gapped environments with cached models.
A: No. Quality scores are capped at 1.0, and boost only applies to well-connected memories (≥5 associations default). Configurable thresholds prevent over-promotion.
A: Transparent. Compression/decompression happens automatically during sync. Quality scores remain intact and accurate.
- ONNX Quality Evaluation Deep Dive - Technical details on cross-encoder design, self-match bug fix, bulk evaluation
- Metadata Compression System - CSV compression architecture, 3-phase roadmap, validation
- Quality + Hooks Integration - 3-phase integration workflow, scoring weights, session-end triggers
- Memory Consolidation Guide - Association-based boost, quality-weighted decay, retention tiers
- Web Dashboard Guide - Quality badges, analytics view, dark mode, Chart.js integration
- Hybrid Backend Guide - Sync enhancements, metadata normalization, queue overflow fixes
GitHub Issues:
- Issue #260 - Quality System Specification
- Issue #261 - Roadmap (Quality → Agentic RAG)
v8.48.0 (2025-12-07):
- CSV-based metadata compression (78% reduction, 100% sync success)
- Metadata size validation (<9.5KB threshold)
- 3-phase compression roadmap (Phase 1 complete)
v8.47.1 (2025-12-07):
- ONNX self-match bug fix (realistic quality distribution)
- Association pollution fix (948 system memories excluded)
- Sync queue overflow fix (2,000 queue size, 100 batch size)
- Batch consolidation optimization (50-100x speedup)
v8.47.0 (2025-12-06):
- Association-based quality boost (network effect intelligence)
- Full audit trail metadata (boost tracking)
- Configurable boost settings (3 environment variables)
v8.46.0-v8.46.3 (2025-12-06):
- Quality + Hooks integration (3-phase approach)
- Quality score persistence fix (hybrid backend)
- Windows compatibility fixes (encoding, session-start hook)
v8.45.0-v8.45.3 (2025-12-05 to 2025-12-06):
- Initial release of Memory Quality System
- Local SLM (ONNX) as primary tier
- Quality-based forgetting in consolidation
- Quality-boosted search with reranking
- Dashboard UI with quality badges and analytics
- Comprehensive MCP tools and HTTP API
- ONNX model export fix (dynamic export, offline support)
- Dashboard dark mode improvements (Chart.js integration)
- HTTP API router fix (404 on
/api/quality/*endpoints)
Need help? Open an issue at https://github.com/doobidoo/mcp-memory-service/issues