-
Notifications
You must be signed in to change notification settings - Fork 59
Cortex Token Efficiency
geoffrey fernald edited this page Feb 1, 2026
·
1 revision
Cortex V2 introduces a sophisticated token management system that reduces context retrieval costs by 5-15x while maintaining retrieval quality.
Traditional memory systems send full context every time. Cortex V2 uses:
- Hierarchical compression β 4 levels of detail
- Session deduplication β Never send the same memory twice
- Smart budget management β Fit maximum value in minimum tokens
| Level | Name | Tokens | Use Case |
|---|---|---|---|
| 0 | IDs Only | ~10 | Reference tracking |
| 1 | One-liners | ~50 | Quick summaries |
| 2 | With Examples | ~200 | Working context |
| 3 | Full Detail | ~500+ | Deep dives |
{
"id": "mem_abc123",
"type": "tribal_knowledge"
}{
"id": "mem_abc123",
"type": "tribal_knowledge",
"summary": "Always use bcrypt for password hashing, never MD5"
}{
"id": "mem_abc123",
"type": "tribal_knowledge",
"summary": "Always use bcrypt for password hashing, never MD5",
"example": "const hash = await bcrypt.hash(password, 10);",
"context": "Security requirement from 2024 audit"
}{
"id": "mem_abc123",
"type": "tribal_knowledge",
"summary": "Always use bcrypt for password hashing, never MD5",
"content": "Full explanation with rationale...",
"example": "const hash = await bcrypt.hash(password, 10);",
"context": "Security requirement from 2024 audit",
"causalChain": [...],
"relatedMemories": [...],
"confidence": 0.95,
"usageCount": 47
}The session context tracks what's been sent:
interface SessionContext {
loadedMemories: Set<string>; // Memory IDs already sent
loadedPatterns: Set<string>; // Pattern IDs already sent
tokensSent: number; // Running token count
queriesMade: number; // Query count
}When retrieving memories:
- Check if memory ID is in
loadedMemories - If yes, skip or send Level 0 reference only
- If no, send at requested compression level
- Add to
loadedMemories
const budget = {
total: 4000,
allocation: {
patterns: 0.4, // 1600 tokens
tribal: 0.3, // 1200 tokens
constraints: 0.2, // 800 tokens
antipatterns: 0.1 // 400 tokens
}
};The system automatically selects compression levels based on:
- Available budget
- Memory relevance score
- Whether memory was previously sent
function selectLevel(memory: Memory, budget: number, session: SessionContext): CompressionLevel {
if (session.loadedMemories.has(memory.id)) {
return CompressionLevel.IdsOnly; // Already sent
}
if (budget < 50) return CompressionLevel.IdsOnly;
if (budget < 200) return CompressionLevel.OneLiners;
if (budget < 500) return CompressionLevel.WithExamples;
return CompressionLevel.FullDetail;
}const context = await cortex.getContext('add_feature', 'authentication', {
maxTokens: 2000,
compressionLevel: 2, // With examples
});const context = await cortex.getContext('fix_bug', 'payment', {
maxTokens: 1000,
compressionLevel: 'auto', // System chooses
prioritize: ['patterns', 'constraints'],
});// First query - full context
const ctx1 = await cortex.getContext('add_feature', 'auth');
// ~2000 tokens
// Second query - deduplicated
const ctx2 = await cortex.getContext('add_feature', 'auth/login');
// ~500 tokens (shared memories skipped){
"intent": "add_feature",
"focus": "authentication",
"maxTokens": 2000,
"compressionLevel": 2,
"sessionId": "session_abc123"
}Response includes token tracking:
{
"memories": [...],
"tokenUsage": {
"used": 1847,
"budget": 2000,
"saved": 3200,
"deduplicatedCount": 5
}
}| Scenario | Without V2 | With V2 | Reduction |
|---|---|---|---|
| First query | 8000 tokens | 2000 tokens | 4x |
| Follow-up query | 8000 tokens | 500 tokens | 16x |
| Multi-file session | 24000 tokens | 3000 tokens | 8x |
- Use session IDs β Enable deduplication across queries
- Start with Level 2 β Good balance of detail and efficiency
-
Let the system choose β Use
compressionLevel: 'auto' -
Monitor token usage β Check
tokenUsagein responses
- Cortex V2 Overview
- Memory Setup Wizard
- Memory CLI
- Universal Memory Types
- Learning System
- Token Efficiency
- Causal Graphs
- Code Generation
- Predictive Retrieval
- Architecture
- Call Graph Analysis
- Impact Analysis
- Security Analysis
- Data Boundaries
- Test Topology
- Coupling Analysis
- Error Handling Analysis
- Wrappers Detection
- Environment Variables
- Constants Analysis
- Styling DNA
- Constraints
- Contracts
- Decision Mining
- Speculative Execution
- Watch Mode
- Trends Analysis
- Projects Management
- Package Context
- Monorepo Support
- Reports & Export
- Dashboard
- 10 Languages
- 21 Frameworks
- 16 ORMs
- 400+ Detectors
- 50+ MCP Tools
- 60+ CLI Commands
- 23 Memory Types