Cortex Token Efficiency

Cortex V2 introduces a sophisticated token management system that reduces context retrieval costs by 5-15x while maintaining retrieval quality.

Overview

Traditional memory systems send full context every time. Cortex V2 uses:

Hierarchical compression — 4 levels of detail
Session deduplication — Never send the same memory twice
Smart budget management — Fit maximum value in minimum tokens

Compression Levels

Level	Name	Tokens	Use Case
0	IDs Only	~10	Reference tracking
1	One-liners	~50	Quick summaries
2	With Examples	~200	Working context
3	Full Detail	~500+	Deep dives

Level 0: IDs Only

{
  "id": "mem_abc123",
  "type": "tribal_knowledge"
}

Level 1: One-liners

{
  "id": "mem_abc123",
  "type": "tribal_knowledge",
  "summary": "Always use bcrypt for password hashing, never MD5"
}

Level 2: With Examples

{
  "id": "mem_abc123",
  "type": "tribal_knowledge",
  "summary": "Always use bcrypt for password hashing, never MD5",
  "example": "const hash = await bcrypt.hash(password, 10);",
  "context": "Security requirement from 2024 audit"
}

Level 3: Full Detail

{
  "id": "mem_abc123",
  "type": "tribal_knowledge",
  "summary": "Always use bcrypt for password hashing, never MD5",
  "content": "Full explanation with rationale...",
  "example": "const hash = await bcrypt.hash(password, 10);",
  "context": "Security requirement from 2024 audit",
  "causalChain": [...],
  "relatedMemories": [...],
  "confidence": 0.95,
  "usageCount": 47
}

Session Deduplication

The session context tracks what's been sent:

interface SessionContext {
  loadedMemories: Set<string>;  // Memory IDs already sent
  loadedPatterns: Set<string>;  // Pattern IDs already sent
  tokensSent: number;           // Running token count
  queriesMade: number;          // Query count
}

When retrieving memories:

Check if memory ID is in loadedMemories
If yes, skip or send Level 0 reference only
If no, send at requested compression level
Add to loadedMemories

Budget Management

Token Budget Allocation

const budget = {
  total: 4000,
  allocation: {
    patterns: 0.4,      // 1600 tokens
    tribal: 0.3,        // 1200 tokens
    constraints: 0.2,   // 800 tokens
    antipatterns: 0.1   // 400 tokens
  }
};

Dynamic Level Selection

The system automatically selects compression levels based on:

Available budget
Memory relevance score
Whether memory was previously sent

function selectLevel(memory: Memory, budget: number, session: SessionContext): CompressionLevel {
  if (session.loadedMemories.has(memory.id)) {
    return CompressionLevel.IdsOnly;  // Already sent
  }
  
  if (budget < 50) return CompressionLevel.IdsOnly;
  if (budget < 200) return CompressionLevel.OneLiners;
  if (budget < 500) return CompressionLevel.WithExamples;
  return CompressionLevel.FullDetail;
}

Usage Examples

Basic Context Retrieval

const context = await cortex.getContext('add_feature', 'authentication', {
  maxTokens: 2000,
  compressionLevel: 2,  // With examples
});

Budget-Aware Retrieval

const context = await cortex.getContext('fix_bug', 'payment', {
  maxTokens: 1000,
  compressionLevel: 'auto',  // System chooses
  prioritize: ['patterns', 'constraints'],
});

Session-Aware Retrieval

// First query - full context
const ctx1 = await cortex.getContext('add_feature', 'auth');
// ~2000 tokens

// Second query - deduplicated
const ctx2 = await cortex.getContext('add_feature', 'auth/login');
// ~500 tokens (shared memories skipped)

MCP Tool: `drift_memory_for_context`

{
  "intent": "add_feature",
  "focus": "authentication",
  "maxTokens": 2000,
  "compressionLevel": 2,
  "sessionId": "session_abc123"
}

Response includes token tracking:

{
  "memories": [...],
  "tokenUsage": {
    "used": 1847,
    "budget": 2000,
    "saved": 3200,
    "deduplicatedCount": 5
  }
}

Performance Benchmarks

Scenario	Without V2	With V2	Reduction
First query	8000 tokens	2000 tokens	4x
Follow-up query	8000 tokens	500 tokens	16x
Multi-file session	24000 tokens	3000 tokens	8x

Best Practices

Use session IDs — Enable deduplication across queries
Start with Level 2 — Good balance of detail and efficiency
Let the system choose — Use compressionLevel: 'auto'
Monitor token usage — Check tokenUsage in responses

🚀 Getting Started

🤖 MCP Server

🧠 Cortex Memory System

📖 Reference

🔬 Analysis

🛠️ AI Tools

🏗️ Advanced Features

🔧 CI/CD Integration

💬 Community

📊 Stats (v0.9.40)

10 Languages
21 Frameworks
16 ORMs
400+ Detectors
50+ MCP Tools
60+ CLI Commands
23 Memory Types

Cortex Token Efficiency

Cortex Token Efficiency

Overview

Compression Levels

Level 0: IDs Only

Level 1: One-liners

Level 2: With Examples

Level 3: Full Detail

Session Deduplication

Budget Management

Token Budget Allocation

Dynamic Level Selection

Usage Examples

Basic Context Retrieval

Budget-Aware Retrieval

Session-Aware Retrieval

MCP Tool: drift_memory_for_context

Performance Benchmarks

Best Practices

Related Documentation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

🚀 Getting Started

🤖 MCP Server

🧠 Cortex Memory System

MCP Tool: `drift_memory_for_context`