Skip to content

14 Memory Quality System Guide

Henry edited this page Dec 7, 2025 · 1 revision

Memory Quality System Guide

📌 Navigation: Home | Installation | Features | Dashboard | ONNX Deep Dive | Metadata Compression Version: 8.48.0+ (Updated Dec 7, 2025) Status: ✅ Production Ready Feature: Memento-Inspired Quality System (Issue #260) Last Updated: December 7, 2025


Table of Contents


Overview

The Memory Quality System transforms MCP Memory Service from static storage to a learning memory system. It automatically evaluates memory quality using AI-driven scoring and uses these scores to improve retrieval precision, consolidation efficiency, and overall system intelligence.

Key Benefits

  • 40-70% improvement in retrieval precision (top-5 useful rate: 50% → 70-85%)
  • Zero cost with local SLM (privacy-preserving, offline-capable)
  • Smarter consolidation - Preserve high-quality memories longer
  • Quality-boosted search - Prioritize best memories in results
  • Network intelligence - Well-connected memories automatically boosted (v8.47.0+)
  • Cloud-ready - 78% metadata compression for sync (v8.48.0+)
  • Automatic learning - System improves from usage patterns

What's New in v8.45-v8.48

v8.48.0 (Dec 7, 2025) - Metadata Compression Breakthrough

🎯 Problem Solved: Cloudflare D1 10KB metadata limit causing sync failures

  • CSV-Based Metadata Compression: 78% size reduction (732B → 159B typical)
  • 100% Sync Success: Resolved all metadata size limit errors (0 failures, down from 278)
  • Transparent Operation: Automatic compression/decompression in hybrid backend
  • Metadata Validation: Pre-sync size checks prevent API failures before they occur
  • 3-Phase Roadmap: Phase 1 (CSV) complete, Phase 2 (binary) and Phase 3 (deduplication) available

📖 See: Metadata Compression System Guide for complete details

v8.47.1 (Dec 7, 2025) - ONNX Quality Improvements

Critical Bug Fixes:

  • ONNX Self-Match Bug: Fixed artificially inflated scores (~1.0 for all) by generating queries from tags/metadata
  • Realistic Distribution: Now produces 42.9% high, 3.2% medium, 53.9% low (avg 0.468 vs 1.000)
  • Association Pollution: Filters 948 system-generated memories (type='association', 'compressed_cluster')
  • Sync Queue Overflow: Increased capacity 1,000→2,000, batch size 50→100 (0% failure rate)
  • Batch Optimization: 50-100x speedup for consolidation relevance updates

📖 See: ONNX Quality Evaluation Deep Dive for technical details

v8.47.0 (Dec 6, 2025) - Association-Based Quality Boost

Network Effect Intelligence:

  • Well-connected memories (≥5 associations) automatically receive 20% quality boost
  • Full audit trail: quality_boost_applied, quality_boost_date, quality_boost_connection_count
  • Configurable: 3 environment variables with validation (boost enabled by default)
  • Impact: ~4% relevance increase, potential retention tier promotion (medium→high)
  • 5 comprehensive test cases (100% pass rate)

📖 See: Association-Based Quality Boost section below

v8.46.0-v8.46.3 (Dec 6, 2025) - Quality + Hooks Integration

3-Phase Integration:

  • Phase 1: Hooks read backendQuality from metadata (20% scoring weight)
  • Phase 2: Session-end hook triggers async quality evaluation
  • Phase 3: Quality-boosted search with configurable weights

Platform Fixes:

  • Windows hooks installer encoding fix (UTF-8 console configuration)
  • Session-start hook crash fix (missing queryMemoriesByTagsAndTime() function)
  • Quality score persistence in hybrid backend (Cloudflare metadata normalization)

📖 See: Quality + Hooks Integration Guide for workflows

v8.45.1-v8.45.3 (Dec 5-6, 2025) - Foundation & Stabilization

Infrastructure:

  • HTTP API router fix (404 errors on /api/quality/* endpoints)
  • ONNX model export fix (dynamic export from transformers, offline mode support)
  • Dashboard dark mode improvements (Chart.js integration, form controls)
  • Quality distribution MCP tool fix (storage method call correction)

How It Works

Multi-Tier AI Scoring (Local-First)

The system evaluates memory quality (0.0-1.0 score) using a multi-tier fallback chain:

Tier Provider Cost Latency Privacy Default
1 Local SLM (ONNX) $0 50-100ms ✅ Full ✅ Yes
2 Groq API ~$0.30/mo 900ms ❌ External ❌ Opt-in
3 Gemini API ~$0.40/mo 2000ms ❌ External ❌ Opt-in
4 Implicit Signals $0 10ms ✅ Full Fallback

Default setup: Local SLM only (zero cost, full privacy, no external API calls)

Quality Score Components

quality_score = (
    local_slm_score × 0.50 +      # Cross-encoder evaluation
    implicit_signals × 0.50        # Usage patterns
)

implicit_signals = (
    access_frequency × 0.40 +      # How often retrieved
    recency × 0.30 +              # When last accessed
    retrieval_ranking × 0.30      # Average position in results
)

Local SLM (Tier 1 - Primary)

Model: ms-marco-MiniLM-L-6-v2 (23MB) Architecture: Cross-encoder (processes query + memory together) Performance:

  • CPU: 50-100ms per evaluation (7-16ms in practice)
  • GPU (CUDA/MPS/DirectML): 10-20ms per evaluation

Scoring Process:

  1. Tokenize: [CLS] query [SEP] memory [SEP]
  2. Run ONNX inference (local, private)
  3. Return relevance score 0.0-1.0

Important: Cross-encoders score query-memory relevance, not absolute quality. Queries are generated from tags/metadata (what memory is about) to avoid self-match bias. See ONNX Deep Dive for details.

GPU Acceleration (automatic):

  • CUDA (NVIDIA)
  • CoreML/MPS (Apple Silicon)
  • DirectML (Windows)
  • ROCm (AMD on Linux)
  • CPU fallback (always works)

Installation & Setup

1. Basic Setup (Local SLM Only)

Zero configuration required - The quality system works out of the box with local SLM:

# Install MCP Memory Service (if not already installed)
pip install mcp-memory-service

# Quality system is enabled by default with local SLM
# No API keys needed, no external calls

First-time model download (automatic):

  • Model exports from HuggingFace transformers to ONNX on first use
  • Saved to ~/.cache/mcp_memory/onnx_models/ms-marco-MiniLM-L-6-v2/
  • Supports offline/air-gapped environments (local_files_only=True)

2. Optional: Cloud APIs (Opt-In)

If you want cloud-based scoring (Groq or Gemini):

# Enable Groq API (fast, cheap)
export GROQ_API_KEY="your-groq-api-key"
export MCP_QUALITY_AI_PROVIDER=groq  # or "auto" to try all tiers

# Enable Gemini API (Google)
export GOOGLE_API_KEY="your-gemini-api-key"
export MCP_QUALITY_AI_PROVIDER=gemini

3. Configuration Options

# Quality System Core
export MCP_QUALITY_SYSTEM_ENABLED=true         # Default: true
export MCP_QUALITY_AI_PROVIDER=local           # local|groq|gemini|auto|none

# Local SLM Configuration (Tier 1)
export MCP_QUALITY_LOCAL_MODEL=ms-marco-MiniLM-L-6-v2  # Model name
export MCP_QUALITY_LOCAL_DEVICE=auto           # auto|cpu|cuda|mps|directml|rocm

# Quality-Boosted Search (Opt-In)
export MCP_QUALITY_BOOST_ENABLED=false         # Default: false (opt-in)
export MCP_QUALITY_BOOST_WEIGHT=0.3            # 0.0-1.0 (30% quality, 70% semantic)

# Quality-Based Retention (Consolidation)
export MCP_QUALITY_RETENTION_HIGH=365          # Days for quality ≥0.7
export MCP_QUALITY_RETENTION_MEDIUM=180        # Days for quality 0.5-0.7
export MCP_QUALITY_RETENTION_LOW_MIN=30        # Min days for quality <0.5
export MCP_QUALITY_RETENTION_LOW_MAX=90        # Max days for quality <0.5

# Association-Based Quality Boost (v8.47.0+)
export MCP_CONSOLIDATION_QUALITY_BOOST_ENABLED=true  # Default: true
export MCP_CONSOLIDATION_MIN_CONNECTIONS_FOR_BOOST=5  # Default: 5 (range: 1-100)
export MCP_CONSOLIDATION_QUALITY_BOOST_FACTOR=1.2    # Default: 1.2 = 20% boost (range: 1.0-2.0)

# Hybrid Backend Sync (v8.47.1+)
export MCP_HYBRID_QUEUE_SIZE=2000              # Default: 2000 (was 1000)
export MCP_HYBRID_BATCH_SIZE=100               # Default: 100 (was 50)

Using the Quality System

1. Automatic Quality Scoring

Quality scores are calculated automatically when memories are retrieved:

# Normal retrieval - quality scoring happens in background
claude /memory-recall "what did I work on yesterday"

# Quality score is updated in metadata (non-blocking)

2. Manual Rating (Optional)

Override AI scores with manual ratings:

# Rate a memory (MCP tool)
rate_memory(
    content_hash="abc123...",
    rating=1,  # -1 (bad), 0 (neutral), 1 (good)
    feedback="This was very helpful!"
)

# Manual ratings weighted 60%, AI scores weighted 40%

HTTP API:

curl -X POST http://127.0.0.1:8000/api/quality/memories/{hash}/rate \
  -H "Content-Type: application/json" \
  -d '{"rating": 1, "feedback": "Helpful!"}'

Async Evaluation Endpoint (v8.46.0+):

# Trigger AI evaluation on specific memory
curl -X POST http://127.0.0.1:8000/api/quality/memories/{hash}/evaluate

# Returns: quality_score, quality_provider, ai_score, evaluation_time_ms
# Performance: ~355ms with ONNX ranker

3. Quality-Boosted Search

Enable quality-based reranking for better results:

Method 1: Global Configuration

export MCP_QUALITY_BOOST_ENABLED=true
claude /memory-recall "search query"  # Uses quality boost

Method 2: Per-Query (MCP Tool)

# Search with quality boost (MCP tool)
retrieve_with_quality_boost(
    query="search query",
    n_results=10,
    quality_weight=0.3  # 30% quality, 70% semantic
)

Method 3: HTTP API (v8.46.0+)

curl -X POST http://127.0.0.1:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "search query", "quality_boost": true, "quality_weight": 0.3}'

Algorithm:

  1. Over-fetch 3× candidates (30 results for top 10)
  2. Rerank by: (1-weight) × semantic_similarity + weight × quality_score
  3. Return top N results with search_type: "semantic_quality_boost"

Performance: <100ms total (50ms semantic search + 20ms reranking + 30ms quality scoring)

4. View Quality Metrics

MCP Tool:

get_memory_quality(content_hash="abc123...")

# Returns:
# - quality_score: Current composite score (0.0-1.0)
# - quality_provider: Which tier scored it (ONNXRankerModel, etc.)
# - access_count: Number of retrievals
# - last_accessed_at: Last access timestamp
# - ai_scores: Historical AI evaluation scores
# - user_rating: Manual rating if present

HTTP API:

curl http://127.0.0.1:8000/api/quality/memories/{hash}

5. Quality Analytics

MCP Tool:

analyze_quality_distribution(min_quality=0.0, max_quality=1.0)

# Returns:
# - total_memories: Total count
# - high_quality_count: Score ≥0.7
# - medium_quality_count: 0.5 ≤ score < 0.7
# - low_quality_count: Score < 0.5
# - average_score: Mean quality score
# - provider_breakdown: Count by provider (onnx_local, groq, gemini, implicit)
# - top_10_memories: Highest scoring
# - bottom_10_memories: Lowest scoring

HTTP API:

# Distribution statistics
curl http://127.0.0.1:8000/api/quality/distribution

# Time series trends (weekly/monthly)
curl http://127.0.0.1:8000/api/quality/trends

Dashboard (http://127.0.0.1:8000/) - v8.45.2+ Dark Mode Support:

  • Quality badges on all memory cards (color-coded by tier: 🟢🟡🔴⚪)
  • Analytics view with distribution charts (bar + pie)
  • Provider breakdown visualization
  • Top/bottom performers lists
  • Dark mode Chart.js integration with proper contrast
  • Settings panel for quality configuration

Quality-Based Memory Management

1. Quality-Based Forgetting (Consolidation)

High-quality memories are preserved longer during consolidation:

Quality Tier Score Range Retention Period
High ≥0.7 365 days inactive
Medium 0.5-0.7 180 days inactive
Low <0.5 30-90 days inactive (scaled by score)

How it works:

  • Weekly consolidation scans inactive memories
  • Applies quality-based thresholds
  • Archives low-quality memories sooner
  • Preserves high-quality memories longer

2. Quality-Weighted Decay

High-quality memories decay slower in relevance scoring:

decay_multiplier = 1.0 + (quality_score × 0.5)
# High quality (0.9): 1.45× multiplier
# Medium quality (0.5): 1.25× multiplier
# Low quality (0.2): 1.10× multiplier

final_relevance = base_relevance × decay_multiplier

Effect: High-quality memories stay relevant 3× longer in search results.

Association-Based Quality Boost (NEW in v8.47.0)

Overview

Well-connected memories automatically receive quality score boosts based on the network effect principle: frequently referenced memories are likely more valuable.

How It Works

Trigger Condition: Memory has ≥5 associations (configurable) Boost Amount: 20% quality increase (configurable 1.0-2.0×) Timing: Applied during weekly consolidation Cap: Quality scores capped at 1.0 (prevents over-promotion)

Example:

Original quality: 0.65 (medium tier)
Connections: 8 associations (≥5 threshold)
Boost factor: 1.2 (20% increase)
New quality: 0.65 × 1.2 = 0.78 (high tier) ✅

Impact:
- Retention: 180 days → 365 days
- Relevance: ~4% increase in search ranking
- Tier promotion: Medium → High

Configuration

# Enable/disable association boost (default: enabled)
export MCP_CONSOLIDATION_QUALITY_BOOST_ENABLED=true

# Minimum connections required (default: 5, range: 1-100)
export MCP_CONSOLIDATION_MIN_CONNECTIONS_FOR_BOOST=5

# Boost multiplier (default: 1.2 = 20%, range: 1.0-2.0)
export MCP_CONSOLIDATION_QUALITY_BOOST_FACTOR=1.2

Preset Configurations:

Profile Min Connections Boost Factor Use Case
Conservative 10 1.1 (10%) Only boost highly connected memories
Balanced (default) 5 1.2 (20%) Recommended for most users
Aggressive 3 1.3 (30%) Maximize network effect influence

Audit Trail Metadata

Every boosted memory receives comprehensive tracking:

{
  "quality_boost_applied": true,
  "quality_boost_date": "2025-12-07T10:30:00Z",
  "quality_boost_reason": "association_connections",
  "quality_boost_connection_count": 8,
  "original_quality_before_boost": 0.65
}

Benefits:

  • Transparency: See exactly why quality changed
  • Analysis: Identify well-connected knowledge clusters
  • Debugging: Understand quality score evolution
  • Audit: Verify boost application during consolidation

Impact on Memory Lifecycle

Before Boost (quality: 0.65):

  • Tier: Medium
  • Retention: 180 days inactive
  • Relevance decay: 1.325× multiplier
  • Forgetting likelihood: Moderate

After Boost (quality: 0.78):

  • Tier: High ✅
  • Retention: 365 days inactive (+185 days)
  • Relevance decay: 1.39× multiplier (+0.065)
  • Forgetting likelihood: Low

Relevance Score Calculation:

# Quality boost applied BEFORE quality multiplier calculation
boosted_quality = original_quality * boost_factor  # 0.65 → 0.78
quality_multiplier = 1.0 + (boosted_quality * 0.5)  # 1.39
final_relevance = base_relevance * quality_multiplier

Use Cases

  1. Knowledge Graphs: Central concepts referenced by many notes
  2. Code Documentation: Core functions called by multiple modules
  3. Research Notes: Key papers cited in multiple summaries
  4. Project Planning: Milestone memories linked to many tasks

Performance

  • Computation Time: 5-10 microseconds per memory (negligible)
  • Memory Overhead: ~200 bytes per boosted memory (5 metadata fields)
  • Consolidation Impact: No measurable increase in duration
  • Integration Point: ExponentialDecayCalculator._calculate_memory_relevance()

Monitoring

Check which memories received boosts:

# Search for boosted memories
curl -X POST http://127.0.0.1:8000/api/search/by-metadata \
  -H "Content-Type: application/json" \
  -d '{"key": "quality_boost_applied", "value": true}'

# View boost distribution
curl http://127.0.0.1:8000/api/quality/distribution | jq '.boosted_count'

Privacy & Cost

Privacy Modes

Mode Configuration Privacy Cost
Local Only MCP_QUALITY_AI_PROVIDER=local ✅ Full (no external calls) $0
Hybrid MCP_QUALITY_AI_PROVIDER=auto ⚠️ Cloud fallback ~$0.30/mo
Cloud MCP_QUALITY_AI_PROVIDER=groq ❌ External API ~$0.30/mo
Implicit Only MCP_QUALITY_AI_PROVIDER=none ✅ Full (no AI) $0

Cost Comparison (3750 memories, 100 retrievals/day)

Provider Monthly Cost Notes
Local SLM $0 Free forever, runs locally
Groq (Kimi K2) ~$0.30-0.50 Fast, good quality
Gemini Flash ~$0.40-0.60 Slower, free tier available
Implicit Only $0 No AI scoring, usage patterns only

Recommendation: Use default local SLM (zero cost, full privacy, fast).

Performance Benchmarks

Quality System Operations

Operation Latency Notes
Local SLM Scoring (CPU) 7-16ms Per memory evaluation (real-world)
Local SLM Scoring (GPU) 10-20ms With CUDA/MPS/DirectML
Quality-Boosted Search <100ms Over-fetch + rerank
Implicit Signals <10ms Always fast
Quality Metadata Update <5ms Storage backend write
Association Boost Calc 5-10μs Negligible overhead

Bulk Operations (v8.47.1)

Operation Scale Performance Notes
Bulk ONNX Evaluation 3,750 memories ~60s total ~16ms per memory
Batch Consolidation Updates 4,478 updates 50-100× speedup Single transaction
Sync Queue Processing 2,000 queue size 0% failure rate Was 27.8% at 1,000

Metadata Compression (v8.48.0)

Metric Before After Improvement
Typical Metadata Size 732B 159B 78% reduction
Compression Overhead N/A <1ms Negligible
Sync Success Rate 72.2% 100% Zero failures
Cloudflare Sync Failures 278 0 Resolved

Target Metrics:

  • Quality calculation overhead: <10ms ✅
  • Search latency with boost: <100ms total ✅
  • No user-facing blocking (async scoring) ✅

Platform Support

Operating Systems

Platform CPU GPU Acceleration Status
Windows ✅ All x64 CPUs CUDA, DirectML ✅ Fully Supported
macOS ✅ Intel & Apple Silicon MPS (Metal) ✅ Fully Supported
Linux ✅ All x64 CPUs CUDA, ROCm ✅ Fully Supported

GPU Acceleration Support

Hardware Technology Installation Performance
NVIDIA CUDA pip install onnxruntime-gpu 10-20ms (5-10× faster)
Apple Silicon MPS Built-in (onnxruntime) 10-20ms (5-10× faster)
AMD (Windows) DirectML pip install onnxruntime-directml 15-25ms (3-5× faster)
AMD (Linux) ROCm pip install onnxruntime-rocm 15-25ms (3-5× faster)
CPU Fallback All platforms No extra install 50-100ms (always works)

Device Auto-Detection: Set MCP_QUALITY_LOCAL_DEVICE=auto (default) for automatic GPU selection with CPU fallback.

Offline / Air-Gapped Environments

Full offline support with cached HuggingFace models:

# Pre-cache model on internet-connected machine
python -c "from transformers import AutoModelForSequenceClassification, AutoTokenizer; \
  AutoTokenizer.from_pretrained('cross-encoder/ms-marco-MiniLM-L-6-v2'); \
  AutoModelForSequenceClassification.from_pretrained('cross-encoder/ms-marco-MiniLM-L-6-v2')"

# Copy ~/.cache/huggingface/ to air-gapped machine

# Use local_files_only mode
export HF_HOME=~/.cache/huggingface
# System automatically tries local_files_only=True first

Troubleshooting

Local SLM Not Working

Symptom: quality_provider: ImplicitSignalsEvaluator (should be ONNXRankerModel)

Fixes:

  1. Check ONNX Runtime installed:

    pip install onnxruntime
    # For GPU: pip install onnxruntime-gpu (CUDA)
    # For DirectML: pip install onnxruntime-directml (Windows AMD)
  2. Check model downloaded:

    ls ~/.cache/mcp_memory/onnx_models/ms-marco-MiniLM-L-6-v2/
    # Should contain: model.onnx, tokenizer.json, config.json
  3. Check logs for errors:

    tail -f logs/mcp_memory_service.log | grep -i "quality\|onnx"
  4. Verify transformers/torch installed (required for model export):

    pip install transformers torch

Quality Scores Artificially High (v8.47.1 Fix)

Symptom: All memories have quality ~1.0 (unrealistic distribution)

Cause: ONNX self-match bug (using memory content as its own query)

Fix: Upgrade to v8.47.1+ which generates queries from tags/metadata:

pip install --upgrade mcp-memory-service

# Reset scores if needed
python scripts/quality/reset_onnx_scores.py

Verification:

# Check distribution (should be realistic)
curl http://127.0.0.1:8000/api/quality/distribution

# Expected: ~40-50% high, ~0-10% medium, ~40-50% low
# Not: 100% high quality

Quality Scores Always 0.5

Symptom: All memories have quality_score: 0.5 (neutral default)

Cause: Quality scoring not triggered yet (memories haven't been retrieved)

Fix: Retrieve memories to trigger scoring:

claude /memory-recall "any search query"
# Quality scoring happens in background after retrieval

# Or trigger bulk evaluation
python scripts/quality/bulk_evaluate_onnx.py

GPU Not Detected

Symptom: Local SLM uses CPU despite having GPU

Fixes:

  1. Install GPU-enabled ONNX Runtime:

    # NVIDIA CUDA
    pip install onnxruntime-gpu
    
    # DirectML (Windows AMD/Intel)
    pip install onnxruntime-directml
    
    # ROCm (Linux AMD)
    pip install onnxruntime-rocm
  2. Force device selection:

    export MCP_QUALITY_LOCAL_DEVICE=cuda  # or mps, directml, rocm
  3. Verify GPU availability:

    # NVIDIA
    nvidia-smi
    
    # AMD (Linux)
    rocm-smi
    
    # Apple Silicon (built-in MPS)
    system_profiler SPDisplaysDataType | grep "Chipset Model"

Quality Boost Not Working

Symptom: Search results don't show quality reranking

Checks:

  1. Verify enabled:

    echo $MCP_QUALITY_BOOST_ENABLED  # Should be "true"
  2. Use explicit MCP tool:

    retrieve_with_quality_boost(query="test", quality_weight=0.5)
  3. Check debug info in results:

    result.debug_info['reranked']  # Should be True
    result.debug_info['quality_score']  # Should exist
    result.debug_info['search_type']  # Should be "semantic_quality_boost"

Hybrid Backend Sync Failures (v8.48.0 Fix)

Symptom: operations_failed > 0 in sync status, 400 Bad Request errors

Cause: Metadata size exceeds Cloudflare D1 10KB limit

Fix: Upgrade to v8.48.0+ which implements CSV compression:

pip install --upgrade mcp-memory-service

# Verify compression working
bash verify_compression.sh

# Expected output:
# Failed: 0 (should be 0) ✅
# No compression warnings (good!) ✅

Association Boost Not Applied (v8.47.0)

Symptom: Well-connected memories not receiving quality boost

Checks:

  1. Verify boost enabled:

    echo $MCP_CONSOLIDATION_QUALITY_BOOST_ENABLED  # Should be "true"
  2. Check connection count threshold:

    echo $MCP_CONSOLIDATION_MIN_CONNECTIONS_FOR_BOOST  # Default: 5
  3. Trigger consolidation manually:

    curl -X POST http://127.0.0.1:8000/api/consolidation/trigger \
      -H "Content-Type: application/json" \
      -d '{"time_horizon": "weekly"}'
  4. Verify boost metadata:

    # Check memory metadata for boost tracking
    curl http://127.0.0.1:8000/api/quality/memories/{hash} | jq '.quality_boost_applied'

Best Practices

1. Start with Defaults

Use local SLM (default) for:

  • Zero cost
  • Full privacy
  • Offline capability
  • Good accuracy (realistic quality distribution after v8.47.1 fix)

2. Enable Quality Boost Gradually

# Week 1: Collect quality scores (boost disabled)
export MCP_QUALITY_BOOST_ENABLED=false

# Week 2: Test with low weight
export MCP_QUALITY_BOOST_ENABLED=true
export MCP_QUALITY_BOOST_WEIGHT=0.2  # 20% quality

# Week 3+: Increase if helpful
export MCP_QUALITY_BOOST_WEIGHT=0.3  # 30% quality (recommended)

3. Monitor Quality Distribution

Check analytics regularly:

analyze_quality_distribution()

# Target distribution (realistic after v8.47.1):
# - High quality (≥0.7): 40-50% of memories
# - Medium quality (0.5-0.7): 0-10%
# - Low quality (<0.5): 40-50%

# This is NORMAL and HEALTHY - cross-encoder scores query-memory relevance

Note: Distribution depends on query generation strategy. v8.47.1+ uses tags/metadata which produces more polarized but realistic scores.

4. Manual Rating for Edge Cases

Rate important memories manually:

# After finding a very helpful memory
rate_memory(content_hash="abc123...", rating=1, feedback="Critical info!")

# After finding unhelpful memory
rate_memory(content_hash="def456...", rating=-1, feedback="Outdated")

Manual ratings weighted 60%, AI scores 40%.

5. Leverage Association Boost

For knowledge-intensive workflows:

# Enable association boost (default: true)
export MCP_CONSOLIDATION_QUALITY_BOOST_ENABLED=true

# Lower threshold for aggressive boost
export MCP_CONSOLIDATION_MIN_CONNECTIONS_FOR_BOOST=3

# Run weekly consolidation
curl -X POST http://127.0.0.1:8000/api/consolidation/trigger \
  -d '{"time_horizon": "weekly"}'

6. Optimize Hybrid Backend Sync (v8.47.1+, v8.48.0)

For large-scale operations:

# Increase queue size for bulk operations (v8.47.1)
export MCP_HYBRID_QUEUE_SIZE=2000      # Default: 2000
export MCP_HYBRID_BATCH_SIZE=100       # Default: 100

# Metadata compression enabled by default (v8.48.0)
# Verify compression working:
bash verify_compression.sh

7. Periodic Review

Monthly checklist:

  • Check quality distribution (analytics dashboard)
  • Verify realistic distribution (~40-50% high after v8.47.1)
  • Review top 10 performers (should be genuinely helpful)
  • Review bottom 10 (candidates for deletion)
  • Verify provider breakdown (mostly onnx_local)
  • Check average quality score (target: 0.4-0.6 is normal)
  • Monitor association boost application (if enabled)
  • Verify sync success rate (should be 100% after v8.48.0)

Advanced Configuration

Custom Retention Policy

# Conservative: Preserve longer
export MCP_QUALITY_RETENTION_HIGH=730       # 2 years for high quality
export MCP_QUALITY_RETENTION_MEDIUM=365     # 1 year for medium
export MCP_QUALITY_RETENTION_LOW_MIN=90     # 90 days minimum for low

# Aggressive: Archive sooner
export MCP_QUALITY_RETENTION_HIGH=180       # 6 months for high
export MCP_QUALITY_RETENTION_MEDIUM=90      # 3 months for medium
export MCP_QUALITY_RETENTION_LOW_MIN=14     # 2 weeks minimum for low

Custom Quality Boost Weight

# Semantic-first (default)
export MCP_QUALITY_BOOST_WEIGHT=0.3  # 30% quality, 70% semantic

# Balanced
export MCP_QUALITY_BOOST_WEIGHT=0.5  # 50% quality, 50% semantic

# Quality-first
export MCP_QUALITY_BOOST_WEIGHT=0.7  # 70% quality, 30% semantic

Recommendation: Start with 0.3, increase if quality boost improves results.

Custom Association Boost Profile

# Conservative (only highly connected)
export MCP_CONSOLIDATION_MIN_CONNECTIONS_FOR_BOOST=10
export MCP_CONSOLIDATION_QUALITY_BOOST_FACTOR=1.1  # 10% boost

# Balanced (default)
export MCP_CONSOLIDATION_MIN_CONNECTIONS_FOR_BOOST=5
export MCP_CONSOLIDATION_QUALITY_BOOST_FACTOR=1.2  # 20% boost

# Aggressive (maximize network effect)
export MCP_CONSOLIDATION_MIN_CONNECTIONS_FOR_BOOST=3
export MCP_CONSOLIDATION_QUALITY_BOOST_FACTOR=1.3  # 30% boost

Hybrid Cloud Strategy

Use local SLM primarily, cloud APIs as fallback:

export MCP_QUALITY_AI_PROVIDER=auto  # Try all available tiers
export GROQ_API_KEY="your-key"       # Groq as Tier 2 fallback

Behavior:

  1. Try local SLM (99% success rate after v8.45.3 fixes)
  2. If fails, try Groq API
  3. If fails, try Gemini API
  4. Ultimate fallback: Implicit signals only

Success Metrics (Phase 1 Achieved ✅)

From Issue #260 and #261 roadmap:

Metric Target Status Notes
Retrieval Precision >70% useful (top-5) ✅ Achieved Up from ~50% baseline
Quality Coverage >30% memories scored ✅ Achieved 3,750 memories scored (v8.47.1)
Quality Distribution Realistic spread ✅ Achieved 42.9% high, 3.2% med, 53.9% low (v8.47.1)
Search Latency <100ms with boost ✅ Achieved 50ms semantic + 20ms rerank + 30ms scoring
Monthly Cost <$0.50 or $0 ✅ Achieved $0 with local SLM default
Local SLM Usage >95% of scoring ✅ Achieved 99%+ with v8.45.3 fixes
Sync Success Rate >95% ✅ Achieved 100% with v8.48.0 compression

FAQ

Q: Do I need API keys for the quality system?

A: No! The default local SLM works with zero configuration, no API keys, and no external calls.

Q: How much does it cost?

A: $0 with the default local SLM. Optional cloud APIs cost ~$0.30-0.50/month for typical usage.

Q: Does quality scoring slow down searches?

A: No. Scoring happens asynchronously in the background. Quality-boosted search adds <20ms overhead.

Q: Can I disable the quality system?

A: Yes, set MCP_QUALITY_SYSTEM_ENABLED=false. System works normally without quality scores.

Q: How accurate is the local SLM?

A: Produces realistic quality distributions (42.9% high, 3.2% medium, 53.9% low) after v8.47.1 self-match bug fix. Scores query-memory relevance, not absolute quality.

Q: What if the local SLM fails to download?

A: System falls back to implicit signals (access patterns). No failures, degraded gracefully.

Q: Can I use my own quality scoring model?

A: Yes! Implement the QualityEvaluator interface and configure via MCP_QUALITY_AI_PROVIDER.

Q: Does this work offline?

A: Yes! Local SLM works fully offline. Supports air-gapped environments with cached models.

Q: Will association boost over-promote memories? (v8.47.0)

A: No. Quality scores are capped at 1.0, and boost only applies to well-connected memories (≥5 associations default). Configurable thresholds prevent over-promotion.

Q: How does metadata compression affect quality scores? (v8.48.0)

A: Transparent. Compression/decompression happens automatically during sync. Quality scores remain intact and accurate.

Related Documentation

GitHub Issues:

Changelog

v8.48.0 (2025-12-07):

  • CSV-based metadata compression (78% reduction, 100% sync success)
  • Metadata size validation (<9.5KB threshold)
  • 3-phase compression roadmap (Phase 1 complete)

v8.47.1 (2025-12-07):

  • ONNX self-match bug fix (realistic quality distribution)
  • Association pollution fix (948 system memories excluded)
  • Sync queue overflow fix (2,000 queue size, 100 batch size)
  • Batch consolidation optimization (50-100x speedup)

v8.47.0 (2025-12-06):

  • Association-based quality boost (network effect intelligence)
  • Full audit trail metadata (boost tracking)
  • Configurable boost settings (3 environment variables)

v8.46.0-v8.46.3 (2025-12-06):

  • Quality + Hooks integration (3-phase approach)
  • Quality score persistence fix (hybrid backend)
  • Windows compatibility fixes (encoding, session-start hook)

v8.45.0-v8.45.3 (2025-12-05 to 2025-12-06):

  • Initial release of Memory Quality System
  • Local SLM (ONNX) as primary tier
  • Quality-based forgetting in consolidation
  • Quality-boosted search with reranking
  • Dashboard UI with quality badges and analytics
  • Comprehensive MCP tools and HTTP API
  • ONNX model export fix (dynamic export, offline support)
  • Dashboard dark mode improvements (Chart.js integration)
  • HTTP API router fix (404 on /api/quality/* endpoints)

Need help? Open an issue at https://github.com/doobidoo/mcp-memory-service/issues

Clone this wiki locally