"""
for metric_key, info in metrics_info.items():
if metric_key in overall:
html_content += f"""
-
-
{info["name"]}
-
-
类别: {info["category"]}
-
说明: {info["description"]}
+
+
{info["name"]}
+
+
类别: {info["category"]}
+
说明: {info["description"]}
+
-
"""
html_content += """
+
-
-
-
📊 评级说明
-
-
- 优秀 (≥ 0.90)
-
-
-
- 良好 (0.70 - 0.89)
-
-
-
- 需要改进 (< 0.70)
+
+
📊 评级说明
+
+
+ 优秀 (≥ 0.90)
+
+
+
+ 良好 (0.70 - 0.89)
+
+
+
+ 需要改进 (< 0.70)
+
-
"""
# 页脚
diff --git a/litho.docs/1.Overview.md b/litho.docs/1.Overview.md
index dd84451..f5f5bf7 100644
--- a/litho.docs/1.Overview.md
+++ b/litho.docs/1.Overview.md
@@ -1,9 +1,7 @@
# System Context Overview
-**Document Generated on:** 2025-12-18 03:27:26 (UTC) (UTC)
-**Timestamp:** 1766028446
-**System:** `cortex-mem` – AI Agent Memory Management System
-**Architecture Level:** C4 Model – System Context (Level 1)
+**Generated on:** 2025-12-30 11:17:28 (UTC)
+**Timestamp:** 1767093448
---
@@ -13,181 +11,166 @@
**cortex-mem**
### Project Description
-`cortex-mem` is a full-stack memory management system designed specifically for AI agents. It enables persistent, semantically searchable, and optimizable storage of experiential knowledge (referred to as "memories") that AI agents accumulate during interactions. The system supports long-term context retention, enhancing agent consistency, reasoning, and intelligence across sessions.
+*cortex-mem* is a full-stack memory management system designed to provide AI agents with persistent, searchable, and optimizable memory storage. It enables intelligent agents to retain context across interactions by storing conversation history, procedural knowledge, and factual information in a structured and semantically accessible format.
-At its core, `cortex-mem` leverages vector embeddings and large language models (LLMs) to store, retrieve, and refine memories based on semantic meaning rather than exact keyword matches. It provides advanced capabilities such as deduplication, relevance scoring, memory optimization, and analytics through a modular architecture.
+The system supports multiple access interfaces—including CLI, HTTP API, MCP (Memory Control Protocol), and a web-based dashboard—allowing seamless integration into diverse AI agent architectures and operational workflows. By leveraging vector embeddings and Large Language Models (LLMs), *cortex-mem* delivers advanced semantic search, intelligent memory classification, and automated optimization capabilities.
### Core Functionality
-- **Memory CRUD Operations**: Create, read, update, and delete memories with rich metadata.
-- **Semantic Search**: Retrieve relevant memories using natural language queries via vector similarity search.
-- **Memory Optimization**: Automatically detect and resolve issues like redundancy, low quality, or irrelevance using LLM-driven analysis.
-- **Analytics & Insights**: Visualize memory usage patterns, performance metrics, and optimization outcomes via a dashboard.
-- **Multi-Interface Access**: Support for REST API, CLI, and MCP protocol for integration flexibility.
-- **Evaluation Framework**: Built-in tools for benchmarking recall accuracy, search effectiveness, and system performance.
+- **Persistent Memory Storage**: Long-term retention of agent memories with metadata and semantic context.
+- **Semantic Search**: Retrieve relevant memories using natural language queries via vector similarity.
+- **Memory Optimization**: Automatically detect and resolve issues such as duplication, irrelevance, or redundancy.
+- **Multi-Interface Access**: Support for developers, agents, and operators through CLI, REST API, MCP, and web UI.
+- **Configurable & Extensible**: Centralized configuration enables consistent behavior across deployment environments.
### Business Value
-`cortex-mem` significantly enhances the cognitive capabilities of AI agents by providing reliable, scalable, and intelligent memory infrastructure. This enables:
-- Improved continuity and personalization in agent-user interactions.
-- Higher fidelity in decision-making through access to historical context.
-- Operational visibility into agent memory health and evolution.
-- Accelerated development and evaluation of memory-augmented AI systems.
+*cortex-mem* enhances the intelligence and continuity of AI agents by enabling them to:
+- Maintain contextual awareness across extended conversations and tasks.
+- Recall relevant past experiences efficiently.
+- Operate more autonomously through self-maintained memory quality.
+
+This results in improved user experience, higher task success rates, and reduced cognitive load on both agents and their human operators.
### Technical Characteristics
-- **Modular & Layered Architecture**: Clear separation between core logic, interfaces, and external dependencies.
-- **Vector-First Design**: Memories are stored and retrieved using high-dimensional embeddings in a dedicated vector database.
-- **LLM-Integrated Processing**: External LLM services are used for embedding generation and semantic understanding.
-- **Configurable & Extensible**: Supports TOML-based configuration and pluggable components for storage and LLM providers.
-- **Evaluation-Driven Development**: Includes tools for testing, benchmarking, and validating memory system behavior.
+- **Architecture Style**: Modular microservices with domain-driven design principles.
+- **Primary Languages**: Rust (backend/core logic), TypeScript (frontend/dashboard).
+- **Integration Model**: API-first, service-oriented with external dependencies on LLMs and vector databases.
+- **Deployment Target**: Cloud-native, containerizable services with minimal infrastructure coupling.
---
## 2. Target Users
-| User Role | Description | Key Needs | Usage Scenarios |
-|---------|-------------|-----------|----------------|
-| **AI Agent Developers** | Software engineers building AI-powered applications requiring persistent memory. |
Integrate memory into agents via APIs
Manage memories programmatically
Ensure high recall and low latency in retrieval
|
Building chatbots with long-term user context
Developing autonomous agents that learn from experience
Testing memory recall under various query conditions
|
-| **System Administrators** | Operations personnel managing AI infrastructure and monitoring system health. |
Monitor memory usage and system performance
Run optimization routines
Inspect memory trends and anomalies
|
Scheduling nightly memory deduplication
Reviewing dashboard insights for capacity planning
Responding to alerts about memory bloat or degradation
|
-| **Research Scientists** | Academics and researchers studying AI memory systems and agent cognition. |
Benchmark memory retrieval accuracy
Evaluate optimization efficacy
Generate synthetic datasets for validation
|
Comparing `cortex-mem` against other memory systems
Studying the impact of memory quality on agent performance
Designing experiments with controlled memory inputs
|
+The *cortex-mem* system serves three primary user roles, each with distinct needs and interaction patterns:
+
+| User Role | Description | Key Needs |
+|---------|-------------|-----------|
+| **AI Agents** | Intelligent software agents requiring persistent memory to maintain context across interactions. |
Store conversation history
Retrieve relevant memories based on current context
Optimize memory usage for performance and relevance
|
+| **Developers** | Software engineers integrating *cortex-mem* into AI applications or agent frameworks. |
Programmatic access via API or MCP
Flexible configuration options
Clear integration patterns with existing systems
|
+| **System Administrators** | Operators responsible for monitoring, maintaining, and tuning the memory infrastructure. |
Real-time monitoring via dashboard
Tools for memory optimization and maintenance
Logging, alerting, and configuration management
|
-These user roles interact with the system through different entry points:
-- **Developers** use the REST API and CLI.
-- **Administrators** use the CLI and Insights Dashboard.
-- **Researchers** use the evaluation framework and API for data collection.
+### Usage Scenarios
+- An AI customer support agent recalls previous interactions with a user to provide personalized responses.
+- A developer integrates *cortex-mem* into an autonomous agent framework using the HTTP API for memory persistence.
+- An operator runs a nightly optimization job via CLI to deduplicate and clean memory entries.
+- A research team evaluates *cortex-mem* against alternative memory systems like *LangMem* for benchmarking purposes.
---
## 3. System Boundaries
### System Scope
-The `cortex-mem` system provides a complete, end-to-end solution for managing AI agent memories. It encompasses all aspects of memory lifecycle management, from ingestion and storage to search, optimization, and analysis.
+*cortex-mem* provides a dedicated memory management layer for AI agents. It handles the full lifecycle of memory data—including creation, retrieval, search, update, deletion, and optimization—through multiple standardized interfaces.
-> **Scope Statement**:
-> *The cortex-mem system provides a complete memory management solution for AI agents, including storage, retrieval, optimization, and analysis capabilities.*
+The system acts as a **middleware service**, decoupling memory logic from core agent intelligence, allowing it to be reused across different agent implementations.
### Included Components
-The following core components are within the system boundary:
+The following capabilities are within the scope of *cortex-mem*:
| Component | Description |
|--------|-------------|
-| **Memory CRUD Operations** | Full support for creating, reading, updating, and deleting memories with metadata tagging. |
-| **Semantic Search Engine** | Enables natural language search over stored memories using vector embeddings. |
-| **Memory Optimization Engine** | Detects duplicates, improves memory quality, and removes irrelevant entries using LLMs. |
-| **Insights & Analytics Dashboard** | Web-based UI for visualizing memory usage, trends, and optimization reports. |
-| **REST API Service** (`cortex-mem-service`) | HTTP interface for programmatic access to memory operations and system controls. |
-| **Command-Line Interface** (`cortex-mem-cli`) | Tool for local interaction, scripting, and administrative tasks. |
-| **MCP Protocol Adapter** (`cortex-mem-mcp`) | Enables integration with agent frameworks using the Memory Consistency Protocol. |
-| **Evaluation Framework** | Tools for testing recall, precision, latency, and optimization effectiveness. |
-| **Core Memory Engine** (`cortex-mem-core`) | Central module handling memory logic, vector integration, and LLM coordination. |
-| **Configuration Management** | Centralized configuration via TOML files supporting environment-specific settings. |
+| **Memory Storage & Retrieval** | CRUD operations for memory entries with rich metadata. |
+| **Semantic Search** | Vector-based similarity search using embeddings generated by LLMs. |
+| **Memory Optimization Engine** | Automated analysis and improvement of memory quality (e.g., deduplication, relevance filtering). |
+| **Multiple Access Interfaces** |
CLI: For administrative and batch operations
HTTP API: For programmatic integration
MCP: For direct AI agent tool invocation
Web Dashboard: For monitoring and visualization
|
+| **Web-Based Monitoring Dashboard** | Real-time insights into memory usage, performance metrics, and optimization results. |
+| **Configuration Management** | Centralized configuration system for all components (e.g., LLM endpoints, Qdrant settings). |
### Excluded Components
-The following are explicitly outside the scope of `cortex-mem`:
+The following are explicitly **outside** the scope of *cortex-mem*:
| Component | Reason for Exclusion |
|--------|------------------------|
-| **Core LLM Model Training** | The system consumes LLM APIs but does not train or fine-tune foundation models. |
-| **Vector Embedding Model Development** | Relies on external LLM services for embeddings; does not develop embedding models. |
-| **Operating System Resource Management** | Does not manage CPU, memory, or disk at the OS level. |
-| **Network Infrastructure Provisioning** | Assumes network connectivity is provided; does not handle networking setup or scaling. |
+| **Core AI Agent Logic** | *cortex-mem* does not implement agent reasoning, planning, or decision-making. It only provides memory services. |
+| **Application-Specific Business Rules** | Domain-specific logic (e.g., "remember user preferences only after consent") is handled by the agent, not *cortex-mem*. |
+| **User Interface Design for End Applications** | While a monitoring dashboard is provided, end-user UIs (e.g., chat interfaces) are outside scope. |
+| **Network Infrastructure Management** | Deployment, scaling, networking, and security of underlying infrastructure are managed externally. |
-This boundary ensures focus on memory-specific functionality while leveraging best-of-breed external services for AI processing and infrastructure.
+> ✅ **Boundary Principle**: *cortex-mem* focuses on **what** is remembered and **how** it is accessed, not **why** or **when** it should be used.
---
## 4. External System Interactions
-The `cortex-mem` system interacts with several external systems to deliver its functionality. These interactions are essential for storage, AI processing, and user access.
+*cortex-mem* integrates with several external systems to deliver its functionality. These interactions are essential for core operations such as embedding generation, vector storage, and benchmarking.
+
+### External Systems
-### External Systems List
+| System Name | Interaction Type | Description | Direction |
+|------------|------------------|-------------|----------|
+| **Qdrant** | Database Storage | Vector database used to store and retrieve memory embeddings. Enables high-performance semantic search through approximate nearest neighbor (ANN) indexing. | *cortex-mem* → Qdrant |
+| **OpenAI** | API Integration | LLM service used for:
Generating text embeddings
Extracting structured facts/entities
Analyzing memory content for optimization
| *cortex-mem* → OpenAI |
+| **LangMem** | Benchmarking | Alternative memory system used for comparative evaluation and performance testing. Not integrated at runtime. | One-way analysis (no runtime dependency) |
-| External System | Interaction Type | Description | Direction |
-|----------------|------------------|-------------|----------|
-| **Qdrant** | Database Storage | Vector database used to store memory embeddings and enable semantic search. Memories are indexed by their vector representations for fast similarity lookup. | System → Qdrant (Write/Read) |
-| **LLM Services** (e.g., OpenAI, Anthropic, etc.) | API Integration | External language model APIs used to generate embeddings for memory content and queries. Also used during optimization to assess and rewrite memories. | System → LLM (Request/Response) |
-| **HTTP Clients** | User Interface | Web browsers and HTTP clients that access the Insights Dashboard and REST API. Includes developer tools, monitoring systems, and agent integrations. | Client → System |
-| **Command Line Interface (CLI)** | User Interface | Terminal applications used by developers and administrators to interact with the system via shell commands. | User → System |
+### Interaction Details
-### Dependency Analysis
+#### Qdrant Integration
+- **Purpose**: Persistent storage of vector embeddings and metadata.
+- **Protocol**: gRPC/HTTP (via Qdrant client SDK).
+- **Data Flow**:
+ - On memory creation: embedding vector + metadata → stored in Qdrant collection.
+ - On search: query embedding → similarity search → top-k results returned.
+- **Failure Impact**: Loss of semantic search capability; memory metadata may still be accessible via fallback mechanisms.
-| From | To | Type | Strength | Description |
-|------|----|------|----------|-------------|
-| `cortex-mem` → Qdrant | Service Dependency | 9.5 | Core persistence layer; all memory data is stored and retrieved via Qdrant’s vector search capabilities. |
-| `cortex-mem` → LLM Services | Service Call | 9.0 | Critical for embedding generation and memory optimization. System cannot function without access to LLM APIs. |
-| HTTP Clients → `cortex-mem` | User Interaction | 8.5 | Primary means for users to view analytics and trigger operations via dashboard or API. |
-| CLI Users → `cortex-mem` | User Interaction | 8.0 | Used for scripting, debugging, and administrative control. |
+#### OpenAI Integration
+- **Purpose**: Enable intelligent memory processing via LLM capabilities.
+- **Usage Patterns**:
+ - Embedding generation (`text-embedding-ada-002` or equivalent).
+ - Structured extraction (e.g., "extract entities from this conversation").
+ - Memory analysis (e.g., "are these two memories duplicates?").
+- **Protocol**: HTTPS REST API.
+- **Failure Impact**: Degraded functionality in memory creation, search, and optimization; system may fall back to cached embeddings or skip enrichment.
-> **Note**: All external dependencies are abstracted behind interfaces in `cortex-mem-core`, allowing for future pluggability (e.g., switching from Qdrant to Pinecone or Weaviate).
+#### LangMem (Benchmarking)
+- **Purpose**: Used in evaluation workflows to compare *cortex-mem*’s performance (latency, recall, precision) against alternative memory systems.
+- **Interaction**: Offline comparison scripts; no runtime coupling.
+- **Impact**: No operational dependency; used solely for R&D and validation.
---
## 5. System Context Diagram
-### C4 Model – System Context (Level 1)
+Below is a **C4 Model - Level 1: System Context** diagram representing *cortex-mem* and its relationships with users and external systems.
-Below is a Mermaid-formatted diagram representing the **System Context** view of `cortex-mem`. This diagram illustrates the system as a single container and its relationships with external entities.
+### Mermaid Diagram (System Context)
```mermaid
C4Context
title System Context Diagram for cortex-mem
- Person(developer, "AI Agent Developer", "Builds AI agents that require persistent memory. Uses API and CLI.")
- Person(admin, "System Administrator", "Manages system health, runs optimizations, monitors usage.")
- Person(researcher, "Research Scientist", "Evaluates memory performance, runs benchmarks, analyzes data.")
+ Person(ai_agent, "AI Agent", "Intelligent software agent requiring persistent memory")
+ Person(developer, "Developer", "Engineer integrating memory into AI applications")
+ Person(admin, "System Administrator", "Operator managing memory infrastructure")
- System(cortex_mem, "cortex-mem", "Memory management system for AI agents with semantic search, optimization, and analytics.")
+ System(cortex_mem, "cortex-mem", "Memory management system for AI agents\nProvides persistent, searchable, and optimizable memory storage")
- System_Ext(qdrant, "Qdrant", "Vector database for storing and retrieving memory embeddings.")
- System_Ext(llm_services, "LLM Services", "External APIs for generating embeddings and improving memory quality (e.g., OpenAI, Anthropic).")
- System_Boundary(http_clients, "HTTP Clients") {
- System_Ext(browser, "Web Browser", "Accesses insights dashboard and REST API.")
- }
- System_Boundary(cli, "CLI Environment") {
- System_Ext(terminal, "Terminal", "Runs cortex-mem-cli commands.")
- }
+ System_Ext(qdrant, "Qdrant", "Vector database for embedding storage and semantic search")
+ System_Ext(openai, "OpenAI", "LLM service for embeddings, text generation, and analysis")
+ System_Ext(langmem, "LangMem", "Alternative memory system used for benchmarking")
- Rel(developer, cortex_mem, "Uses API and CLI to manage agent memories")
- Rel(admin, cortex_mem, "Monitors via dashboard, runs optimization via CLI")
- Rel(researcher, cortex_mem, "Runs evaluations, extracts data via API")
+ Rel(ai_agent, cortex_mem, "Uses memory via MCP or API", "JSON/HTTP, MCP Protocol")
+ Rel(developer, cortex_mem, "Integrates via API/CLI", "REST API, CLI Commands")
+ Rel(admin, cortex_mem, "Monitors & manages via dashboard", "Web UI, CLI")
- Rel(cortex_mem, qdrant, "Stores and retrieves vector embeddings via gRPC/HTTP")
- Rel(cortex_mem, llm_services, "Sends text to generate embeddings and optimize memories")
- Rel(cortex_mem, browser, "Serves web UI and accepts API requests over HTTPS")
- Rel(cortex_mem, terminal, "Accepts command-line input and returns output")
-
- UpdateLayoutConfig($c4ShapeInRow="2", $c4BoundaryInRow="2")
+ Rel(cortex_mem, qdrant, "Stores/retrieves vectors", "gRPC/HTTP")
+ Rel(cortex_mem, openai, "Requests embeddings & analysis", "HTTPS API")
+ Rel_R(cortex_mem, langmem, "Benchmarked against", "Offline evaluation")
```
### Key Interaction Flows
1. **Memory Creation Flow**
- - Developer sends `POST /memories` via HTTP client.
- - `cortex-mem` uses LLM service to generate embedding.
- - Stores memory + embedding in Qdrant.
- - Returns memory ID.
-
-2. **Semantic Search Flow**
- - User submits natural language query via API or CLI.
- - Query is embedded using LLM service.
- - Vector search performed in Qdrant.
- - Results filtered and ranked by `cortex-mem-core`, returned to user.
-
-3. **Optimization Flow**
- - Admin triggers `/optimization/start`.
- - System analyzes memories for duplicates and quality issues.
- - LLM rewrites or merges low-quality entries.
- - Updated memories stored back in Qdrant.
- - Report generated and accessible via dashboard.
-
-4. **Insights Access Flow**
- - Browser requests dashboard page.
- - `cortex-mem-insights` fetches memory statistics and optimization history.
- - Renders charts and tables showing memory growth, search success rate, etc.
-
-### Architecture Decisions Reflected
-- **Single System Boundary**: All components (`core`, `service`, `cli`, `insights`) are treated as one cohesive system.
-- **Externalized AI & Storage**: LLMs and vector DBs are external systems, emphasizing cloud-native integration.
-- **Multi-Channel Access**: Supports both machine (API) and human (CLI, UI) interaction patterns.
-- **No Direct User-to-Database Access**: All data flows through `cortex-mem`, ensuring consistency and security.
+ - User/AI agent → CLI/API/MCP → *cortex-mem* → OpenAI (embedding) → Qdrant (storage)
+ - Metadata and extracted facts are stored alongside the vector.
+
+2. **Memory Retrieval Flow**
+ - Query → *cortex-mem* → OpenAI (query embedding) → Qdrant (similarity search) → Ranked results → User/Agent
+
+3. **Memory Optimization Flow**
+ - Admin triggers optimization → *cortex-mem* analyzes memory collection → Uses OpenAI to assess similarity → Executes merge/delete → Reports via dashboard
+
+4. **Monitoring & Configuration**
+ - Admin uses web dashboard to view memory stats, run optimizations, and adjust configuration.
+ - All components read from centralized config (e.g., `config.toml`).
---
@@ -195,60 +178,69 @@ C4Context
### Main Technology Stack
-| Layer | Technology | Purpose |
-|------|-----------|--------|
-| **Core Engine** | Rust (`cortex-mem-core`) | High-performance, memory-safe implementation of memory logic. |
-| **API Service** | Rust + Axum (`cortex-mem-service`) | RESTful HTTP server with async support. |
-| **CLI Tool** | Rust + Clap (`cortex-mem-cli`) | Cross-platform command-line interface. |
-| **Vector Database** | Qdrant | Persistent, scalable vector storage with semantic search. |
-| **LLM Integration** | OpenAI API, Anthropic, or compatible | Embedding generation and memory refinement. |
-| **Frontend (Insights)** | TypeScript + React | Interactive dashboard for analytics and reporting. |
-| **Configuration** | TOML | Human-readable, version-controllable config files. |
-| **Protocol Support** | MCP (Memory Consistency Protocol) | Standardized interface for agent memory synchronization. |
+| Layer | Technology | Rationale |
+|------|-----------|---------|
+| **Backend Core** | Rust | High performance, memory safety, ideal for systems-level logic and LLM integration. |
+| **Frontend Dashboard** | TypeScript (SvelteKit) | Reactive UI framework for real-time monitoring and interaction. |
+| **Vector Database** | Qdrant | Efficient ANN search, strong filtering support, gRPC/HTTP API. |
+| **LLM Provider** | OpenAI | Industry-standard embeddings and structured output capabilities. |
+| **Configuration** | TOML + Environment Variables | Human-readable, hierarchical, and deployable across environments. |
+| **Interfaces** |
CLI: Rust (Clap)
HTTP API: Axum (Rust)
MCP: Custom protocol over JSON-RPC
Web: Svelte + REST
| Multiple entry points for different user types and integration needs. |
### Architecture Patterns
-| Pattern | Application in `cortex-mem` |
-|--------|----------------------------|
-| **Layered Architecture** | Clear separation: Core Logic → Service Layer → Interface Layer (API/CLI/Dashboard). |
-| **Plugin Abstraction** | LLM clients and vector stores implement traits/interfaces for easy swapping. |
-| **Event-Driven Orchestration** | Optimization workflows are triggered and progress asynchronously. |
-| **Configuration-Driven Behavior** | System behavior (e.g., embedding model, thresholds) is defined in TOML. |
-| **Evaluation-First Design** | Built-in tools allow continuous validation of memory recall and quality. |
-
-### Key Design Decisions
+#### Modular Microservices Architecture
+- **Separation of Concerns**: Each domain (e.g., Memory Management, LLM Integration) is encapsulated with clear responsibilities.
+- **Shared Core Library**: `cortex-mem-core` contains business logic reused across CLI, API, MCP, and dashboard services.
+- **Frontend-Backend Decoupling**: Web dashboard communicates via RESTful APIs to backend services.
-1. **Rust as Primary Language**
- - Chosen for memory safety, performance, and concurrency—critical for systems handling large-scale data and real-time queries.
+#### Domain-Driven Design (DDD)
+The system is organized into well-defined domains:
-2. **Vector Database Abstraction**
- - The `vector_store` module abstracts Qdrant behind a trait, enabling future support for alternative backends (e.g., Chroma, FAISS).
+| Domain | Type | Responsibility |
+|-------|------|----------------|
+| **Memory Management Domain** | Core Business | Memory lifecycle, search, classification |
+| **Memory Optimization Domain** | Core Business | Quality analysis, deduplication, plan execution |
+| **LLM Integration Domain** | Core Business | Embedding generation, content extraction, intelligence |
+| **Storage Integration Domain** | Infrastructure | Qdrant interaction, vector persistence |
+| **Access Interface Domain** | Tool Support | CLI, API, MCP, Web UI |
+| **Configuration Management Domain** | Infrastructure | Centralized settings for all components |
-3. **LLM as a Service**
- - Avoids coupling to any single provider; supports multiple LLM APIs through a unified client interface.
+#### Key Design Decisions
-4. **Unified Core Module**
- - `cortex-mem-core` encapsulates all business logic, ensuring consistent behavior across API, CLI, and dashboard.
+| Decision | Rationale | Impact |
+|--------|---------|--------|
+| **Rust for Core Logic** | Ensures performance, safety, and concurrency for high-throughput memory operations. | Enables low-latency responses even under heavy load. |
+| **Multiple Access Interfaces** | Supports diverse use cases: agents (MCP), developers (API), operators (CLI/Web). | Increases adoption and integration flexibility. |
+| **Centralized Configuration** | Single source of truth for LLM keys, Qdrant URLs, optimization thresholds. | Simplifies deployment and reduces misconfiguration risk. |
+| **Vector + Metadata Hybrid Storage** | Combines semantic search (vector) with precise filtering (metadata). | Delivers accurate and contextually relevant results. |
+| **LLM-Driven Intelligence** | Leverages LLMs not just for embeddings but also for fact extraction and decision logic. | Enables smarter memory management beyond keyword matching. |
-5. **Separation of Concerns**
- - Each domain (Memory Management, Storage, Optimization, Insights) has well-defined responsibilities and minimal overlap.
+### Cross-Domain Dependencies
-6. **Extensible Evaluation Framework**
- - Allows developers and researchers to define custom test suites for memory recall, search accuracy, and optimization impact.
+```mermaid
+graph TD
+ A[Access Interface Domain] -->|Calls| B(Memory Management Domain)
+ B -->|Uses| C[LLM Integration Domain]
+ B -->|Stores to| D[Storage Integration Domain]
+ C -->|Config from| E[Configuration Management Domain]
+ D -->|Config from| E
+ F[Memory Optimization Domain] -->|Analyzes & Updates| B
+ F -->|Config from| E
+ A -->|Config from| E
+```
-7. **TOML-Based Configuration**
- - Enables declarative setup of system parameters (e.g., embedding dimensions, optimization thresholds, API keys).
+> 🔗 **Dependency Strength Summary**:
+> - Strongest: Access Interface → Memory Management (9.5)
+> - Critical: Memory Management → Storage Integration (9.0)
+> - Important: All domains → Configuration Management (7.0–8.5)
---
## Conclusion
-The `cortex-mem` system represents a robust, production-ready memory management platform tailored for AI agents. Its **System Context** clearly defines its role within the broader ecosystem: a centralized, intelligent memory store that integrates with vector databases and LLMs to provide persistent, searchable, and self-improving knowledge retention.
-
-By adhering to the C4 model’s System Context level, this document establishes a shared understanding across stakeholders—developers, operators, and researchers—of what the system does, who uses it, how it interacts with the outside world, and what lies within its architectural boundaries.
-
-This foundation enables informed decision-making for future development, integration, and operational planning.
+The *cortex-mem* system establishes a robust, extensible foundation for AI agent memory management. By clearly defining its boundaries, user roles, and external integrations, it serves as a reliable middleware component in modern AI architectures.
----
+Its modular design, multi-interface support, and intelligent use of LLMs and vector databases make it suitable for both research and production environments. The system enables AI agents to evolve from stateless responders to context-aware, learning entities—paving the way for more autonomous and effective artificial intelligence.
-**End of Document**
\ No newline at end of file
+This C4 System Context document provides a comprehensive view of *cortex-mem*’s role in the ecosystem, serving as a reference for architects, developers, and operators involved in its deployment and extension.
\ No newline at end of file
diff --git a/litho.docs/2.Architecture.md b/litho.docs/2.Architecture.md
index bb089c2..f942bcd 100644
--- a/litho.docs/2.Architecture.md
+++ b/litho.docs/2.Architecture.md
@@ -1,7 +1,7 @@
# System Architecture Documentation
**Project:** `cortex-mem`
-**Generated at:** 2025-12-18 03:29:11 (UTC) (UTC)
-**Timestamp:** 1766028551
+**Generated on:** 2025-12-30 11:19:34 (UTC)
+**Timestamp:** 1767093574
---
@@ -9,44 +9,31 @@
### Architecture Design Philosophy
-The **cortex-mem** system is designed around the principles of **modularity**, **extensibility**, and **intelligent automation**, with a strong emphasis on enabling **AI agents to maintain persistent, searchable, and optimizable knowledge** across interactions. The architecture follows a **layered, domain-driven design (DDD)** approach, where core business logic is encapsulated in a central engine (`cortex-mem-core`), while multiple interface layers provide flexible access for developers, operators, and researchers.
+The `cortex-mem` system is designed around the principle of **persistent, intelligent memory management for AI agents**, enabling them to retain context across interactions and improve decision-making over time. The architecture emphasizes **modularity, extensibility, and interoperability**, ensuring that diverse integration scenarios—from CLI tools to agent frameworks—can coexist seamlessly.
-The system embraces **separation of concerns** through well-defined boundaries between:
-- **Core memory processing**
-- **Storage and retrieval**
-- **User-facing interfaces**
-- **Analytics and evaluation**
+At its core, the system follows a **microservices-inspired modular monolith pattern**, where core business logic is centralized in a shared Rust-based library (`cortex-mem-core`), while multiple access interfaces (CLI, HTTP API, MCP, Web UI) operate as independent but tightly integrated components. This design balances performance and maintainability with flexibility and scalability.
-This modular structure ensures that each component can evolve independently, supporting both research experimentation and production deployment.
-
----
+The system leverages **external AI and vector storage services** (OpenAI and Qdrant) to offload computationally intensive tasks such as embedding generation and semantic search, allowing the core engine to focus on orchestration, optimization, and lifecycle management.
### Core Architecture Patterns
-| Pattern | Application | Benefit |
-|-------|------------|--------|
-| **Modular Monolith / Plugin Architecture** | Core engine exposes pluggable traits (`VectorStore`, `LLMClient`, etc.) | Enables extensibility without tight coupling |
-| **Strategy Pattern** | Used for detectors, evaluators, and optimizers | Allows runtime selection of algorithms based on configuration |
-| **Shared State Pattern** | `Arc` shared across API handlers | Thread-safe access to core engine in async environment |
-| **Command-Query Responsibility Segregation (CQRS)** | Separation of write (CRUD) and read (search/analysis) paths | Optimized data flow for different use cases |
-| **Event-Driven Orchestration (Implicit)** | Optimization workflows trigger analysis → planning → execution | Supports complex, multi-stage processes |
-
----
+- **Layered Architecture**: Clear separation into access, business logic, infrastructure, and support layers.
+- **Modular Monolith with Shared Core**: All interfaces share a common core engine, reducing duplication and ensuring consistency.
+- **Event-Driven Orchestration**: Workflows are initiated by user or agent requests and orchestrated through a centralized memory manager.
+- **Configuration-Driven Behavior**: A centralized configuration system (`cortex-mem-config`) ensures uniform behavior across all components.
+- **LLM-Augmented Intelligence**: LLMs are used not just for embeddings, but also for content analysis, fact extraction, and optimization decisions.
### Technology Stack Overview
-| Layer | Technology | Rationale |
-|------|-----------|---------|
-| **Core Engine** | Rust | Memory safety, performance, async support, trait-based abstraction |
-| **REST API** | Axum (Rust) | Lightweight, type-safe, integrates seamlessly with Tokio |
-| **CLI Interface** | Clap (Rust) | Declarative argument parsing, rich CLI experience |
-| **Web Dashboard** | SvelteKit + Elysia.js | Fast frontend, minimal boilerplate, TypeScript safety |
-| **Vector Database** | Qdrant | High-performance semantic search, metadata filtering, cloud-native |
-| **LLM Integration** | OpenAI-compatible APIs via RIG | Flexibility to use various LLM providers (OpenAI, Anthropic, local models) |
-| **Configuration** | TOML | Human-readable, hierarchical, widely supported |
-| **Runtime** | Tokio | Asynchronous execution model for high concurrency |
-
-> ✅ **Key Insight**: The choice of **Rust** as the primary language reflects a focus on **performance, reliability, and safety**—critical for systems handling persistent agent memory where data integrity and low-latency operations are essential.
+| Layer | Technology | Purpose |
+|------|-----------|--------|
+| **Core Engine** | Rust | High-performance, memory-safe backend logic |
+| **Interfaces** | Rust (CLI, API, MCP), TypeScript/Svelte (Web UI) | Multi-channel access |
+| **Storage** | Qdrant (Vector DB) | Semantic search via embeddings |
+| **AI Services** | OpenAI API | Embedding generation, content analysis, structured extraction |
+| **Configuration** | TOML | Human-readable, hierarchical configuration |
+| **Observability** | Tracing (OpenTelemetry), Logging | Debugging and monitoring |
+| **Build & Packaging** | Cargo (Rust), npm/pnpm (Web) | Dependency and build management |
---
@@ -54,65 +41,78 @@ This modular structure ensures that each component can evolve independently, sup
### System Positioning and Value
-**cortex-mem** is a **full-stack memory management platform** designed specifically for **AI agents**. It enables agents to:
-- Retain context across conversations and tasks
-- Recall relevant past experiences via semantic search
-- Optimize memory quality through deduplication and enhancement
-- Visualize and analyze memory patterns
+`cortex-mem` is a **full-stack memory management system** that provides AI agents with persistent, searchable, and optimizable memory. It enables agents to:
-This capability significantly improves agent **consistency**, **effectiveness**, and **long-term intelligence**, making it a foundational component in advanced AI agent architectures.
+- Retain conversation history and learned knowledge
+- Retrieve relevant memories using semantic search
+- Optimize memory usage by removing duplicates and low-quality entries
+- Integrate seamlessly into agent workflows via multiple protocols
----
+This significantly enhances **agent intelligence, continuity, and user experience** by preventing context loss and enabling long-term learning.
### User Roles and Scenarios
-| Role | Description | Key Use Cases |
-|------|-----------|-------------|
-| **AI Agent Developers** | Engineers integrating memory into AI applications | - Store agent interactions - Retrieve context before responses - Monitor memory usage via API |
-| **System Administrators** | Ops teams managing agent infrastructure | - Run memory optimization jobs - Monitor system health - Analyze memory trends and costs |
-| **Research Scientists** | Researchers evaluating agent memory systems | - Benchmark recall accuracy - Test optimization strategies - Generate synthetic datasets |
-
----
+| Role | Needs | Interaction Scenarios |
+|------|-------|------------------------|
+| **AI Agents** | Store context, retrieve relevant memories, optimize memory usage | Use MCP or API to log interactions and query past experiences |
+| **Developers** | Integrate memory into AI apps, configure behavior, debug | Use CLI, API, or SDKs to build agent systems |
+| **System Administrators** | Monitor health, optimize performance, manage configurations | Use web dashboard and CLI for maintenance and tuning |
### External System Interactions
```mermaid
graph TD
- A[cortex-mem] -->|API Calls| B[LLM Services]
- A -->|gRPC/HTTP| C[Qdrant Vector Database]
- D[AI Agent / App] -->|HTTP Requests| A
- E[Admin User] -->|CLI Commands| A
- F[Researcher] -->|Dashboard Access| A
- G[External Agent Framework] -->|MCP Tool Calls| A
-
- style A fill:#2196F3,stroke:#1976D2,color:white
- style B fill:#FFC107,stroke:#FFA000,color:black
- style C fill:#4CAF50,stroke:#388E3C,color:white
- style D fill:#9E9E9E,stroke:#616161,color:white
- style E fill:#9E9E9E,stroke:#616161,color:white
- style F fill:#9E9E9E,stroke:#616161,color:white
- style G fill:#9E9E9E,stroke:#616161,color:white
-
- classDef user fill:#9E9E9E,stroke:#616161,color:white;
- class D,E,F,G user;
-```
+ subgraph ExternalSystems [External Systems]
+ Qdrant[(Qdrant Vector Database)]
+ OpenAI[(OpenAI LLM Service)]
+ LangMem[(LangMem - Benchmark)]
+ end
-#### Interaction Types:
-- **LLM Services**: Used for embedding generation, classification, keyword extraction, and optimization guidance.
-- **Qdrant**: Primary vector database for storing memory embeddings and metadata.
-- **HTTP Clients**: Access REST API and web dashboard.
-- **CLI Tools**: Direct terminal interaction for scripting and debugging.
-- **MCP Clients**: Integration with agent frameworks using Model Context Protocol.
+ subgraph cortex-mem [cortex-mem System]
+ CLI[cortex-mem-cli]
+ HTTPAPI[cortex-mem-service]
+ MCP[cortex-mem-mcp]
+ WebUI[cortex-mem-insights]
+ Core[cortex-mem-core]
+ end
----
+ CLI --> Core
+ HTTPAPI --> Core
+ MCP --> Core
+ WebUI --> HTTPAPI
+ WebUI --> Core
+
+ Core --> Qdrant
+ Core --> OpenAI
+ HTTPAPI --> OpenAI
+ MCP --> OpenAI
+
+ WebUI -.-> LangMem
+ HTTPAPI -.-> LangMem
+
+ style cortex-mem fill:#2196F3,stroke:#1976D2,color:white
+ style ExternalSystems fill:#9C27B0,stroke:#7B1FA2,color:white
+```
+
+- **Qdrant**: Used for storing and retrieving memory embeddings via vector similarity search.
+- **OpenAI**: Provides text embeddings, content analysis, and structured extraction (facts, entities).
+- **LangMem**: Used for benchmarking and comparative evaluation of memory systems.
### System Boundary Definition
-| Included Components | Excluded Components |
-|---------------------|---------------------|
-| - Memory CRUD operations - Semantic search with vector embeddings - Deduplication and optimization - Analytics dashboard - REST/CLI/MCP interfaces - Evaluation framework | - Training of LLMs or embedding models - Development of vector database engine - OS-level resource management - Network infrastructure provisioning |
+#### Included Components
+- Memory storage and retrieval
+- Semantic search capabilities
+- Memory optimization engine
+- Multiple access interfaces (CLI, API, MCP, Web Dashboard)
+- Configuration management
+- Integration with LLM and vector database services
-> 🔒 **Boundary Clarity**: The system **integrates** with external AI and storage services but does **not own** their underlying models or infrastructure.
+#### Excluded Components
+- Core AI agent logic (e.g., reasoning, planning)
+- Application-specific business rules
+- End-user UI design for agent applications
+- Network infrastructure management (e.g., load balancing, firewalls)
---
@@ -120,113 +120,79 @@ graph TD
### Domain Module Division
-The system is structured into **five primary domains**, each representing a logical grouping of functionality:
+The system is divided into six primary domain modules, each responsible for a distinct aspect of functionality:
| Domain | Type | Responsibility |
|-------|------|----------------|
-| **Memory Management Domain** | Core Business | Full lifecycle management of memories (CRUD, retrieval, metadata) |
-| **Memory Storage Domain** | Technical | Persistent storage using vector database (Qdrant) |
-| **Optimization Domain** | Core Business | Improve memory quality via deduplication, relevance tuning, enhancement |
-| **Insights & Analytics Domain** | Support | Visualization, monitoring, and reporting |
-| **Evaluation & Tools Domain** | Support | Benchmarking, testing, and example agents |
-
----
+| **Memory Management Domain** | Core Business | CRUD, classification, search |
+| **Memory Optimization Domain** | Core Business | Deduplication, quality analysis, merging |
+| **LLM Integration Domain** | Core Business | Embedding, extraction, intelligence |
+| **Storage Integration Domain** | Infrastructure | Qdrant interaction, vector persistence |
+| **Access Interface Domain** | Tool Support | CLI, API, MCP, Web UI |
+| **Configuration Management Domain** | Infrastructure | Centralized settings |
### Domain Module Architecture
```mermaid
graph TD
- subgraph "External Systems"
- LLM[LLM Services]
- QDRANT[Qdrant DB]
- HTTP[HTTP Clients]
- CLI[CLI Users]
- MCP[Agent Frameworks]
- end
-
- subgraph "cortex-mem System"
- CORE[cortex-mem-core\n(Memory Engine)]
-
- subgraph Interfaces
- SERVICE[cortex-mem-service\n(Rest API)]
- CLI_INTERFACE[cortex-mem-cli\n(Command Line)]
- MCP_ADAPTER[cortex-mem-mcp\n(MCP Adapter)]
- INSIGHTS[cortex-mem-insights\n(Dashboard UI)]
- end
-
- subgraph Support
- EVAL[cortex-mem-evaluation\n(Benchmarking)]
- TARS[cortex-mem-tars\n(Terminal Agent)]
- end
+ subgraph Domains [Domain Modules]
+ MMD[Memory Management Domain]
+ MOD[Memory Optimization Domain]
+ LLM[LLM Integration Domain]
+ SID[Storage Integration Domain]
+ AID[Access Interface Domain]
+ CMD[Configuration Management Domain]
end
- SERVICE --> CORE
- CLI_INTERFACE --> CORE
- MCP_ADAPTER --> CORE
- INSIGHTS --> SERVICE
- EVAL --> CORE
- TARS --> CORE
-
- CORE --> LLM
- CORE --> QDRANT
-
- HTTP --> SERVICE
- CLI --> CLI_INTERFACE
- MCP --> MCP_ADAPTER
-
- style CORE fill:#4CAF50,stroke:#388E3C,color:white
- style SERVICE fill:#2196F3,stroke:#1976D2,color:white
- style CLI_INTERFACE fill:#9C27B0,stroke:#7B1FA2,color:white
- style MCP_ADAPTER fill:#FF9800,stroke:#F57C00,color:white
- style INSIGHTS fill:#00BCD4,stroke:#00ACC1,color:white
- style EVAL fill:#795548,stroke:#5D4037,color:white
- style TARS fill:#607D8B,stroke:#455A64,color:white
+ AID --> MMD
+ MMD --> LLM
+ MMD --> SID
+ MOD --> MMD
+ LLM --> CMD
+ SID --> CMD
+ MOD --> CMD
+ AID --> CMD
+
+ style MMD fill:#4CAF50,stroke:#388E3C,color:white
+ style MOD fill:#4CAF50,stroke:#388E3C,color:white
+ style LLM fill:#4CAF50,stroke:#388E3C,color:white
+ style SID fill:#2196F3,stroke:#1976D2,color:white
+ style CMD fill:#2196F3,stroke:#1976D2,color:white
+ style AID fill:#607D8B,stroke:#455A64,color:white
```
----
-
### Storage Design
-#### Data Model (Qdrant Collection Schema)
-
-```json
-{
- "id": "uuid",
- "vector": "float[1536]", // Embedding dimension (auto-detected)
- "payload": {
+- **Primary Storage**: Qdrant vector database
+ - Stores memory vectors (embeddings), metadata, and structured content
+ - Supports similarity search via cosine distance
+ - Collections are auto-created based on agent or session context
+- **Metadata Schema**:
+ ```json
+ {
+ "id": "uuid",
"content": "string",
- "metadata": {
- "source": "string",
- "timestamp": "datetime",
- "type": "string",
- "importance": "float",
- "version": "int"
- },
+ "embedding": "float[]",
+ "type": "conversational|factual|procedural",
+ "timestamp": "datetime",
+ "source": "agent|user",
+ "quality_score": "float",
"keywords": ["string"],
- "embedding_model": "string"
+ "entities": ["string"]
}
-}
-```
-
-#### Key Features:
-- **Semantic Search**: Enabled via vector similarity (cosine distance)
-- **Metadata Filtering**: Supports filtering by `type`, `source`, `timestamp`, etc.
-- **Batch Operations**: Efficient bulk insert/update/delete
-- **Auto-detection**: Dynamically detects embedding dimensions from LLM
-
----
+ ```
+- **Indexing Strategy**: HNSW index for fast approximate nearest neighbor search
+- **Persistence**: All writes are synchronous; reads are cached via in-memory LRU where applicable
### Inter-Domain Module Communication
-| From → To | Communication Mechanism | Protocol/Data Format |
-|---------|--------------------------|-----------------------|
-| Memory Management → Storage | Function call (`upsert_point`, `search`) | Rust trait interface (`VectorStore`) |
-| Memory Management → LLM | HTTP API call | JSON over HTTPS |
-| Optimization → Memory Management | Direct method invocation | In-process Rust calls |
-| Insights → Memory Management | REST API call | JSON over HTTP |
-| CLI/API → Core | Shared state (`Arc`) | In-memory Rust objects |
-
-> 🔄 **Communication Pattern**: Most inter-module communication occurs **in-process** via function calls or shared state, ensuring low latency. External access (dashboard, evaluation) uses **REST API** as a secure boundary.
+| From → To | Communication Type | Mechanism |
+|---------|---------------------|----------|
+| Access Interface → Memory Management | Synchronous Request | Function calls (Rust) / HTTP (Web) |
+| Memory Management → LLM Integration | Synchronous API Call | REST to OpenAI |
+| Memory Management → Storage Integration | Synchronous DB Call | gRPC to Qdrant |
+| Memory Optimization → Memory Management | Synchronous Service Call | In-process method invocation |
+| All Domains → Configuration Management | Configuration Pull | TOML file parsing at startup |
---
@@ -234,208 +200,167 @@ graph TD
### Core Functional Components
-#### `cortex-mem-core` – Memory Engine
-
-**Responsibilities**:
-- Orchestrate memory lifecycle
-- Generate embeddings and metadata
-- Execute optimization plans
-- Manage vector store interactions
-
-**Internal Structure**:
-```rust
-pub struct MemoryManager {
- vector_store: Box,
- llm_client: Box,
- fact_extractor: Box,
- memory_updater: Box,
- importance_evaluator: Box,
- duplicate_detector: Box,
- memory_classifier: Box,
-}
-```
-
-**Key Traits**:
-| Trait | Purpose | Implementations |
-|------|--------|----------------|
-| `VectorStore` | Abstracts vector DB operations | `QdrantStore`, `ChromaStore` (planned) |
-| `LLMClient` | Standardizes LLM interactions | `OpenAIClient`, `AnthropicClient`, `LocalLlama` |
-| `DuplicateDetector` | Detects redundant memories | Rule-based, LLM-based, hybrid |
-
----
-
-#### `cortex-mem-service` – REST API
-
-- Built with **Axum**
-- Endpoints:
- - `POST /memories` – Create memory
- - `GET /memories/{id}` – Retrieve by ID
- - `POST /memories/search` – Semantic search
- - `POST /optimization/start` – Trigger optimization
- - `GET /status` – Health check
-- Uses `Arc` for thread-safe access
-- JSON request/response format
-
----
-
-#### `cortex-mem-cli` – Command Line Interface
-
-- Built with **Clap**
-- Commands:
- - `add "content" --type=event`
- - `search "what did I do yesterday?"`
- - `optimize --strategy=dedupe`
- - `list --filter type=belief`
-- Reads config from `config.toml`
-- Asynchronous execution via Tokio
-
----
-
-#### `cortex-mem-mcp` – MCP Protocol Adapter
-
-- Implements **Model Context Protocol (MCP)**
-- Exposes tools:
- - `store_memory(content, metadata)`
- - `query_memories(query, filters)`
- - `list_memories(filter)`
- - `get_memory(id)`
-- Enables integration with agent frameworks like LangChain, AutoGPT
-
----
-
-#### `cortex-mem-insights` – Dashboard UI
-
-- **Frontend**: SvelteKit (TypeScript)
-- **Backend**: Elysia.js (Bun runtime)
-- Features:
- - Memory distribution charts
- - Optimization history timeline
- - Real-time memory stream
- - Search playground
-- Pulls data from `cortex-mem-service` API
-
----
+| Component | Responsibility | Key Functions |
+|---------|----------------|-------------|
+| **MemoryManager** | Central orchestrator | Create, retrieve, update, delete memories; coordinate optimization |
+| **LLMClient** | LLM interaction | Generate embeddings, extract facts, analyze content |
+| **VectorStore (Qdrant)** | Vector persistence | Store/retrieve embeddings, execute similarity search |
+| **OptimizationEngine** | Memory quality control | Detect duplicates, suggest merges, execute cleanup |
+| **MemoryClassifier** | Type inference | Classify memory as conversational, factual, procedural |
+| **MemoryExtractor** | Structured data extraction | Extract keywords, entities, facts using LLM prompts |
### Technical Support Components
-| Component | Purpose | Technology |
-|--------|--------|-----------|
-| `cortex-mem-evaluation` | Benchmark memory recall and effectiveness | Synthetic dataset generation, accuracy scoring |
-| `cortex-mem-tars` | Example terminal agent demonstrating usage | Interactive CLI agent with memory loop |
-
----
+| Component | Responsibility | Key Functions |
+|---------|----------------|-------------|
+| **ConfigManager** | Configuration loading | Parse `config.toml`, validate, provide defaults |
+| **TracingSystem** | Observability | Log spans, metrics, errors via OpenTelemetry |
+| **CLI Interface** | Command-line access | Parse commands, execute workflows, display results |
+| **HTTP API Service** | RESTful access | Handle JSON requests, validate input, return responses |
+| **MCP Server** | Agent protocol interface | Handle MCP tool calls, route to memory functions |
+| **Web Dashboard** | Monitoring UI | Visualize memory stats, trigger optimization, view logs |
### Component Responsibility Division
```mermaid
graph TD
- CLI_INTERFACE -->|Calls| CORE
- SERVICE -->|Exposes| CORE
- MCP_ADAPTER -->|Translates| CORE
- INSIGHTS -->|Fetches| SERVICE
- EVAL -->|Tests| CORE
- TARS -->|Uses| CORE
-
- CORE -->|Uses| LLM_CLIENT[cortex-mem-core::llm::client]
- CORE -->|Uses| VEC_STORE[cortex-mem-core::vector_store::qdrant]
- CORE -->|Uses| MEM_MANAGER[cortex-mem-core::memory::manager]
- CORE -->|Uses| OPTIMIZER[cortex-mem-core::memory::optimizer]
-
- style CORE fill:#4CAF50,stroke:#388E3C,color:white
- style LLM_CLIENT fill:#FFEB3B,stroke:#FDD835,color:black
- style VEC_STORE fill:#FF9800,stroke:#FB8C00,color:black
- style MEM_MANAGER fill:#03A9F4,stroke:#0288D1,color:white
- style OPTIMIZER fill:#8BC34A,stroke:#7CB342,color:white
+ subgraph Components [Component Responsibilities]
+ MM[MemoryManager]
+ LC[LLMClient]
+ VS[VectorStore]
+ OE[OptimizationEngine]
+ MC[MemoryClassifier]
+ ME[MemoryExtractor]
+ CM[ConfigManager]
+ CLI[CLI Interface]
+ API[HTTP API Service]
+ MCP[MCP Server]
+ WD[Web Dashboard]
+ end
+
+ CLI --> MM
+ API --> MM
+ MCP --> MM
+ WD --> API
+ WD --> MM
+
+ MM --> LC
+ MM --> VS
+ MM --> OE
+ MM --> MC
+ MM --> ME
+
+ LC --> OpenAI
+ VS --> Qdrant
+
+ MM --> CM
+ CLI --> CM
+ API --> CM
+ MCP --> CM
+ WD --> CM
+
+ style MM fill:#4CAF50,stroke:#388E3C,color:white
+ style LC fill:#FF9800,stroke:#F57C00,color:white
+ style VS fill:#9C27B0,stroke:#7B1FA2,color:white
+ style OE fill:#FF5722,stroke:#E64A19,color:white
```
+### Component Interaction Relationships
+
+- **Request Flow**:
+ 1. Interface receives request (e.g., `add memory`)
+ 2. ConfigManager provides settings
+ 3. MemoryManager orchestrates:
+ - Classify content
+ - Generate embedding via LLMClient
+ - Extract facts/entities
+ - Store in VectorStore
+ 4. Response returned via interface
+
+- **Optimization Flow**:
+ 1. User triggers optimization
+ 2. OptimizationEngine analyzes memory collection
+ 3. Uses LLMClient to assess similarity
+ 4. Generates plan (merge/delete)
+ 5. Executes via MemoryManager
+ 6. Reports results via Web Dashboard or CLI
+
---
## 5. Key Processes
### Core Functional Processes
-#### Memory Management Process
+#### Memory Creation Process
```mermaid
graph TD
- A[User/Application] --> B{Initiate Memory Operation}
- B --> C[HTTP API / CLI Command]
- C --> D[Process Request]
- D --> E[Generate Embeddings via LLM Client]
- E --> F[Store in Qdrant Vector Database]
- F --> G[Return Success Response with Memory ID]
- G --> A
+ A[User/Agent: Add Memory] --> B[CLI/API/MCP Interface]
+ B --> C[MemoryManager: Classify Type]
+ C --> D[LLMClient: Generate Embedding]
+ D --> E[MemoryExtractor: Extract Facts/Entities]
+ E --> F[MemoryManager: Build Memory Object]
+ F --> G[VectorStore: Persist in Qdrant]
+ G --> H[Return Success]
```
-**Sequence Flow**:
-1. Client sends `POST /memories` with content
-2. Handler calls `MemoryManager::create_memory()`
-3. LLM generates embedding and extracts keywords/classification
-4. Memory stored in Qdrant with full payload
-5. Response returns `201 Created` with memory ID
+#### Memory Retrieval Process
----
+```mermaid
+graph TD
+ A[User/Agent: Search Query] --> B[API/CLI Interface]
+ B --> C[LLMClient: Generate Query Embedding]
+ C --> D[VectorStore: Semantic Search]
+ D --> E[MemoryManager: Apply Metadata Filters]
+ E --> F[Rank & Return Results]
+```
-#### Memory Search Process
+#### Memory Optimization Process
```mermaid
graph TD
- A[User/Application] --> B{Initiate Search}
- B --> C[Send Search Request with Query]
- C --> D[Generate Query Embedding via LLM]
- D --> E[Semantic Search in Qdrant DB]
- E --> F[Apply Metadata Filters]
- F --> G[Return Ranked Matching Memories]
- G --> A
+ A[User: Initiate Optimization] --> B[OptimizationEngine: Analyze Collection]
+ B --> C[LLMClient: Assess Similarity]
+ C --> D[Generate Plan: Merge/Delete]
+ D --> E[Preview & Confirm]
+ E --> F{Execute?}
+ F -->|Yes| G[MemoryManager: Apply Changes]
+ F -->|No| H[Cancel]
+ G --> I[Log Actions]
+ I --> J[Report Results]
```
-**Sequence Flow**:
-1. Client sends `POST /memories/search` with query
-2. Query embedded using same LLM model
-3. Qdrant performs approximate nearest neighbor (ANN) search
-4. Results filtered by metadata (e.g., `type=belief`, `timestamp > 2024`)
-5. Ranked list returned with relevance scores
-
----
+### Technical Processing Workflows
-#### Memory Optimization Process
+#### System Initialization Workflow
```mermaid
-flowchart TD
- Start[Start Optimization] --> Detector[Detect Issues]
- Detector -->|Duplicates| Deduplicator[DeduplicationDetector]
- Detector -->|Low Quality| QualityChecker[QualityAssessment]
- Detector -->|Outdated| TimeAnalyzer[TimeDecayAnalysis]
- Detector -->|Classification| Classifier[ClassificationValidator]
-
- Detector --> Planner[Generate Optimization Plan]
- Planner --> Analyzer[Analyze Impact & Risk]
- Analyzer --> User[Review Recommendations?]
- User -->|Approve| Executor[Execute Actions]
- Executor --> Reporter[Generate Results Report]
- Reporter --> End[Return Summary]
+graph TD
+ A[Load config.toml] --> B[Initialize Tracing]
+ B --> C[Auto-detect LLM Client]
+ C --> D[Auto-detect Vector Store]
+ D --> E[Determine Embedding Dim]
+ E --> F[Create MemoryManager]
+ F --> G[Start CLI]
+ F --> H[Start HTTP API]
+ F --> I[Start MCP Server]
+ F --> J[Start Web Dashboard]
```
-**Key Stages**:
-1. **Detection**: Scan memories for duplicates, low importance, outdated content
-2. **Planning**: Propose merge/delete/enhance actions
-3. **Approval**: Optional human-in-the-loop review
-4. **Execution**: Apply changes via `MemoryUpdater`
-5. **Reporting**: Return summary of actions taken
+### Data Flow Paths
----
+- **Ingress**: Raw text from user/agent → Interface → Core
+- **Processing**: Text → Embedding → Structured data → Metadata enrichment
+- **Egress**: Search results, optimization reports, logs
+- **Storage**: Vector + metadata → Qdrant → Indexed for search
### Exception Handling Mechanisms
-| Failure Type | Handling Strategy |
-|-------------|-------------------|
-| **LLM Timeout** | Retry with exponential backoff (3 attempts) |
-| **Qdrant Unavailable** | Return 503, log error, retry on next operation |
-| **Invalid Request** | Return 400 with validation details |
-| **Duplicate Memory** | Auto-merge or return conflict (configurable) |
-| **Optimization Risk High** | Block execution unless override flag set |
-
-> ⚠️ **Improvement Suggestion**: Add circuit breaker pattern for external dependencies and implement distributed tracing.
+- **LLM Timeout**: Retry with exponential backoff; fallback to cached embeddings if available
+- **Qdrant Unavailable**: Queue operations (in-memory buffer); retry on reconnect
+- **Invalid Configuration**: Fail fast at startup; provide detailed error messages
+- **Memory Corruption**: Validate checksums; isolate and flag corrupted entries
+- **Optimization Safety**: Preview mode required; all destructive actions require confirmation
---
@@ -443,113 +368,75 @@ flowchart TD
### Core Module Implementation
-#### `memory/manager.rs` – Orchestration Hub
+#### MemoryManager (Rust)
+- **Location**: `cortex-mem-core/src/memory/manager.rs`
+- **Key Traits**:
+ ```rust
+ trait MemoryStorage {
+ fn add(&self, memory: Memory) -> Result;
+ fn search(&self, query: &str, filters: Filters) -> Result>;
+ fn update(&self, id: Id, patch: MemoryPatch) -> Result<()>;
+ }
+ ```
-- Single entry point for all memory operations
-- Composes extractors, evaluators, and updaters
-- Thread-safe via `Arc>` or async equivalents
+#### LLMClient (Rust)
+- Uses OpenAI embeddings API (`text-embedding-3-small`)
+- Implements structured extraction via JSON-mode prompting
+- Caches embeddings to reduce API cost
-```rust
-impl MemoryManager {
- pub async fn create_memory(&self, content: String, metadata: Metadata) -> Result {
- let embedding = self.llm_client.embed(&content).await?;
- let keywords = self.llm_client.extract_keywords(&content).await?;
- let memory_type = self.memory_classifier.classify(&content).await?;
- let importance = self.importance_evaluator.score(&content).await?;
-
- let memory = Memory::new(content, embedding, metadata)
- .with_keywords(keywords)
- .with_type(memory_type)
- .with_importance(importance);
-
- self.vector_store.upsert(&memory).await?;
- Ok(memory)
- }
-}
-```
-
----
+#### OptimizationEngine
+- **Duplicate Detection**: Uses cosine similarity + LLM semantic comparison
+- **Quality Scoring**: Based on length, coherence, relevance, and recency
+- **Merge Strategy**: Combines overlapping memories using LLM summarization
### Key Algorithm Design
-#### Hybrid Deduplication Algorithm
-
-```rust
-fn detect_duplicate(&self, new_memory: &Memory, existing: &[Memory]) -> Option {
- // Step 1: Fast rule-based filter (same source + similar timestamp)
- let candidates = rule_filter(new_memory, existing);
-
- // Step 2: Semantic similarity (vector distance < threshold)
- let similar = semantic_filter(new_memory.embedding, &candidates);
-
- // Step 3: LLM validation (are they truly duplicates?)
- if let Some(primary) = similar.first() {
- if self.llm_client.confirm_duplicate(new_memory, primary).await? {
- return Some(primary.id);
- }
- }
- None
-}
+#### Semantic Search Algorithm
+1. Normalize query text
+2. Generate embedding via LLM
+3. Query Qdrant with `with_payload=true`, `with_vectors=false`
+4. Apply metadata filters (date, type, source)
+5. Re-rank using hybrid scoring (similarity + quality score)
+
+#### Optimization Plan Generation
+```python
+def generate_optimization_plan(memories):
+ clusters = cluster_by_similarity(memories, threshold=0.92)
+ plan = []
+ for cluster in clusters:
+ if len(cluster) > 1:
+ primary = select_highest_quality(cluster)
+ for mem in cluster:
+ if mem != primary:
+ plan.append(MergeAction(primary.id, mem.id))
+ return plan
```
-> ✅ **Accuracy**: Combines speed (rules) with precision (LLM)
-
----
-
### Data Structure Design
-#### Memory Object (Rust)
-
```rust
#[derive(Serialize, Deserialize)]
pub struct Memory {
pub id: Uuid,
pub content: String,
pub embedding: Vec,
- pub metadata: HashMap,
+ pub memory_type: MemoryType, // Conversational, Factual, Procedural
+ pub timestamp: DateTime,
+ pub source: String, // Agent ID or User ID
+ pub quality_score: f32,
+ pub facts: Vec,
+ pub entities: Vec,
pub keywords: Vec,
- pub memory_type: MemoryType,
- pub importance: f32,
- pub created_at: DateTime,
- pub updated_at: DateTime,
- pub version: u32,
}
```
-#### Configuration Model (`config.toml`)
-
-```toml
-[server]
-host = "127.0.0.1"
-port = 8080
-
-[qdrant]
-url = "http://localhost:6334"
-collection = "memories"
-
-[llm]
-provider = "openai"
-model = "text-embedding-3-small"
-api_key = "sk-..."
-
-[optimization]
-deduplication_threshold = 0.92
-importance_decay_rate = 0.01
-```
-
----
-
### Performance Optimization Strategies
-| Strategy | Implementation |
-|--------|----------------|
-| **Embedding Caching** | Cache recent embeddings by content hash (planned) |
-| **Batch Processing** | Support bulk insert/search operations |
-| **Async I/O** | All LLM and DB calls are non-blocking |
-| **Indexing** | Qdrant uses HNSW index for fast ANN search |
-| **Connection Pooling** | Reuse HTTP connections to LLM and Qdrant |
-
-> 🚀 **Future Enhancement**: Add Redis cache layer for frequent queries.
+- **Embedding Caching**: Cache embeddings by content hash to avoid redundant LLM calls
+- **Batch Processing**: Support bulk insert/search operations
+- **Index Partitioning**: Split collections by agent or time window
+- **Asynchronous Logging**: Non-blocking trace and metric emission
+- **Connection Pooling**: Reuse HTTP and gRPC connections to Qdrant/OpenAI
---
@@ -557,101 +444,145 @@ importance_decay_rate = 0.01
### Runtime Environment Requirements
-| Component | CPU | Memory | Disk | Network |
-|--------|------|--------|------|---------|
-| `cortex-mem-core` | 2 vCPU | 4 GB | 1 GB (temp) | Outbound to LLM/Qdrant |
-| `cortex-mem-service` | 1 vCPU | 1 GB | - | Inbound HTTP |
-| `cortex-mem-insights` | 1 vCPU | 2 GB | - | Inbound HTTP |
-| Qdrant | 4 vCPU | 8 GB | SSD | Internal gRPC |
-| LLM (external) | N/A | N/A | N/A | Public API |
-
----
+| Component | Language | Runtime | Memory | CPU |
+|--------|---------|--------|-------|-----|
+| Core Engine | Rust | Native binary | 512MB+ | 1 vCPU |
+| HTTP API | Rust | Native binary | 256MB+ | 0.5 vCPU |
+| Web Dashboard | Node.js + Svelte | Node 18+ | 256MB | 0.5 vCPU |
+| Qdrant | Rust | Docker | 2GB+ | 2 vCPU |
+| OpenAI | Cloud | N/A | N/A | N/A |
### Deployment Topology Structure
```mermaid
-graph LR
- subgraph "Cloud / On-Prem"
- LB[Load Balancer]
- subgraph "Application Layer"
- API[cortex-mem-service\nReplica 1-3]
- CLI[CLI Tools]
- end
- subgraph "Core Layer"
- CORE[cortex-mem-core\nShared Engine]
- end
- subgraph "Data Layer"
- QDRANT[Qdrant Cluster]
- CACHE[(Redis Cache)\nOptional]
- end
- subgraph "External"
- LLM[LLM API\n(OpenAI, etc.)]
- BROWSER[Web Browser]
- AGENT[AI Agent]
+graph TD
+ subgraph Cloud [Cloud Environment]
+ subgraph VPC [VPC: cortex-mem-vpc]
+ subgraph Compute [Compute Layer]
+ CLI[CLI - Local or CI]
+ API[cortex-mem-service]
+ MCP[cortex-mem-mcp]
+ WD[cortex-mem-insights]
+ end
+
+ subgraph Data [Data Layer]
+ Qdrant[Qdrant Cluster]
+ PG[PostgreSQL - Optional Metadata]
+ end
+
+ subgraph External [External Services]
+ OpenAI[OpenAI API]
+ LangMem[LangMem Benchmark]
+ end
end
end
- BROWSER --> LB --> API --> CORE
- CLI --> CORE
- AGENT --> API
- CORE --> QDRANT
- CORE --> CACHE
- CORE --> LLM
-
- style CORE fill:#4CAF50,stroke:#388E3C,color:white
- style API fill:#2196F3,stroke:#1976D2,color:white
- style QDRANT fill:#4CAF50,stroke:#388E3C,color:white
- style LLM fill:#FFC107,stroke:#FFA000,color:black
- style CACHE fill:#9C27B0,stroke:#7B1FA2,color:white
+ CLI --> API
+ API --> Qdrant
+ API --> OpenAI
+ MCP --> API
+ WD --> API
+ WD --> Qdrant
+
+ style VPC fill:#E3F2FD,stroke:#2196F3
+ style Compute fill:#BBDEFB,stroke:#1976D2
+ style Data fill:#E1BEE7,stroke:#9C27B0
```
+### Scalability Design
+
+- **Horizontal Scaling**: HTTP API and MCP services can be deployed in multiple instances behind a load balancer
+- **Vertical Scaling**: Qdrant can scale memory and CPU for larger embedding dimensions or collections
+- **Caching Layer**: Optional Redis cache for frequent queries and embeddings
+- **Queue-Based Processing**: Long-running optimization jobs can be offloaded to a message queue (e.g., RabbitMQ)
+
+### Monitoring and Operations
+
+- **Metrics**: Prometheus for latency, request rate, error rate
+- **Tracing**: OpenTelemetry + Jaeger for request flow visibility
+- **Logging**: Structured JSON logs with severity levels
+- **Alerting**: Grafana alerts on high error rates or slow LLM responses
+- **Health Checks**: `/health` endpoint for liveness and readiness
+- **Backup**: Regular snapshots of Qdrant collections
+
---
+## Architecture Insights
+
### Scalability Design
-| Aspect | Strategy |
-|------|----------|
-| **Horizontal Scaling** | Multiple `cortex-mem-service` instances behind load balancer |
-| **Vertical Scaling** | Scale Qdrant and core engine with memory/CPU |
-| **Data Partitioning** | Future support for sharded collections by agent ID |
-| **Streaming Support** | Planned for large search results and long-running optimizations |
+- **Extension Points**:
+ - New interfaces (e.g., WebSocket, gRPC) can be added without modifying core
+ - Additional vector databases (Pinecone, Weaviate) can be integrated via adapter pattern
+ - Custom optimization strategies can be plugged in via trait implementation
----
+- **Future-Proofing**:
+ - Support for multiple LLM providers (Anthropic, Cohere) via abstraction layer
+ - Federated memory across agents using sharding
-### Monitoring and Operations
+### Performance Considerations
+
+- **Bottlenecks**:
+ - LLM API latency (mitigated by caching and batching)
+ - Vector search performance at scale (mitigated by HNSW indexing)
+ - Memory classification overhead (mitigated by lightweight models)
-#### Current Capabilities
-- Health checks (`GET /status`)
-- Logging via `tracing` crate
-- Configuration reload (TOML)
-
-#### Recommended Enhancements
-| Tool | Purpose |
-|------|--------|
-| **Prometheus + Grafana** | Metrics: request rate, latency, memory count |
-| **OpenTelemetry** | Distributed tracing across components |
-| **Sentry** | Error tracking and alerting |
-| **Log Aggregation** | ELK or Loki for centralized logs |
-
-> 🔐 **Security Recommendations**:
-> - Add JWT authentication for API
-> - Rate limiting on public endpoints
-> - Encrypt sensitive config (e.g., API keys)
-> - Role-based access control (RBAC) for dashboard
+- **Optimization Roadmap**:
+ - On-premise embedding models (e.g., BERT, Sentence Transformers)
+ - Incremental optimization (vs. full scans)
+ - In-memory vector cache for hot queries
+
+### Security Design
+
+- **Authentication**: API keys for HTTP/MCP interfaces
+- **Authorization**: Role-based access (read/write/optimization)
+- **Data Encryption**: TLS for all external communications
+- **Secrets Management**: API keys stored in environment variables or secret manager
+- **Input Sanitization**: All user inputs validated and sanitized
---
-## Conclusion
+## Development & Operations Guidance
+
+### Development Guidance
+
+- **Contribution Workflow**:
+ 1. Modify `cortex-mem-core` for business logic
+ 2. Update interfaces as needed
+ 3. Add tests in `tests/` directory
+ 4. Run `cargo test --all-features`
-The **cortex-mem** architecture presents a **robust, modular, and intelligent** solution for AI agent memory management. Its **Rust-based core** ensures performance and safety, while **multiple access interfaces** (REST, CLI, MCP, Dashboard) provide flexibility for diverse users.
+- **Best Practices**:
+ - Keep core logic in `cortex-mem-core`
+ - Use configuration for environment-specific settings
+ - Log all LLM and DB interactions for debugging
-The system effectively combines **vector storage (Qdrant)** with **LLM-powered intelligence** to enable semantic search, automatic metadata enrichment, and proactive optimization. The clear separation of domains and components makes it **easy to extend, maintain, and scale**.
+### Operations Guidance
-### Final Recommendations
-1. **Add Caching Layer** (Redis) to reduce LLM/vector DB load
-2. **Implement Authentication** for production use
-3. **Enhance Observability** with metrics and tracing
-4. **Expand Testing** with unit/integration tests
-5. **Support More Vector Stores** (Chroma, Weaviate) via trait implementation
+- **Deployment**:
+ - Use Docker Compose for local development
+ - Use Kubernetes for production (with HPA and monitoring)
+
+- **Maintenance**:
+ - Schedule nightly optimization jobs
+ - Monitor embedding API usage and costs
+ - Backup Qdrant collections weekly
+
+### Decision Support
+
+- **Why Rust?** Performance, safety, and concurrency for core logic
+- **Why Qdrant?** Open-source, high-performance vector DB with rich filtering
+- **Why Multiple Interfaces?** Supports diverse integration scenarios
+- **Why Centralized Core?** Ensures consistency and reduces duplication
+
+### Knowledge Transfer
+
+New team members should:
+1. Read `config.toml` to understand system settings
+2. Study `cortex-mem-core/src/memory/manager.rs` for core logic
+3. Run CLI commands to see end-to-end flow
+4. Explore web dashboard for monitoring capabilities
+
+---
-This architecture is well-positioned to serve as a **foundational memory layer** in next-generation AI agent systems.
\ No newline at end of file
+**End of Document**
\ No newline at end of file
diff --git a/litho.docs/3.Workflow.md b/litho.docs/3.Workflow.md
index c0337c1..5ddaa69 100644
--- a/litho.docs/3.Workflow.md
+++ b/litho.docs/3.Workflow.md
@@ -1,226 +1,189 @@
# Core Workflows
-**Document Generated On:** 2025-12-18 11:29:14 UTC
-**System Name:** `cortex-mem` – A Comprehensive Memory Management System for AI Agents
-
----
-
## 1. Workflow Overview
-The **cortex-mem** system is a full-stack memory management platform designed to enable AI agents with persistent, searchable, and optimizable long-term memory. It supports multiple interaction modes—via HTTP API, CLI, MCP protocol, and interactive TUI applications—and provides advanced capabilities such as semantic search, deduplication, quality optimization, and analytics.
+The **Cortex-Mem** system is a comprehensive memory management platform designed to enable AI agents and human operators to store, retrieve, analyze, and optimize persistent memories across multiple interaction modalities. The system supports diverse access patterns through CLI, HTTP API, MCP (Memory Control Protocol), and a web-based dashboard, all unified under a shared core logic layer.
-### Main Workflows
-The system revolves around three core workflows:
-1. **Memory Management Process (CRUD)** – Creation, retrieval, update, and deletion of memories.
-2. **Memory Search Process** – Semantic and metadata-based querying of stored memories.
-3. **Memory Optimization Process** – Quality improvement through deduplication, relevance tuning, and structural refinement.
-
-These workflows are orchestrated across modular domains including **Memory Management**, **Memory Storage**, **Optimization**, and **Insights & Analytics**, all coordinated via shared state, configuration, and asynchronous execution.
+### System Main Workflows
+- **Memory Management Workflow**: End-to-end lifecycle of creating, storing, retrieving, and optimizing memories.
+- **System Initialization Workflow**: Bootstrapping process that loads configuration, initializes services, and exposes interfaces.
+- **Optimization Execution Workflow**: Intelligent analysis and improvement of memory collections for quality, relevance, and efficiency.
### Core Execution Paths
-| Workflow | Entry Point(s) | Key Output |
-|--------|----------------|-----------|
-| Memory Management | `/memories` (API), `add`, `delete` (CLI) | Memory ID or operation status |
-| Memory Search | `/search`, `list`, `search` commands | Ranked list of matching memories |
-| Memory Optimization | `/optimization/start`, `optimize` command | Optimization report with impact metrics |
+1. **User-initiated Memory Operations** (via CLI or Web UI)
+2. **Agent-initiated Memory Access** (via MCP or API)
+3. **Automated Optimization & Maintenance**
+4. **Monitoring & Diagnostic Queries**
### Key Process Nodes
-- **Request Initiation**: User or agent triggers an action via CLI, API, or UI.
-- **Embedding Generation**: LLM client generates vector embeddings from text content.
-- **Vector Storage/Retrieval**: Qdrant database stores and retrieves embeddings using approximate nearest neighbor (ANN) search.
-- **Filtering & Scoring**: Metadata filters applied post-semantic search; results ranked by relevance score.
-- **Optimization Orchestration**: Detection of duplicates, low-quality entries, and relevance issues followed by LLM-driven cleanup.
-- **Response Delivery**: Structured response returned to client with success/failure status and data.
+| Node | Description |
+|------|-------------|
+| `Configuration Load` | Parses TOML config with fallback paths; sets up logging, LLM, vector DB, and service parameters |
+| `MemoryManager Init` | Central orchestrator that binds storage, LLM, and optimization logic |
+| `Content Analysis` | Uses LLM to extract facts, entities, keywords, and classify memory type |
+| `Embedding Generation` | Converts text into vector representations using OpenAI or compatible models |
+| `Vector Storage` | Persists embeddings in Qdrant with metadata indexing |
+| `Semantic Search` | Matches queries via cosine similarity on vectors |
+| `Optimization Engine` | Detects duplicates, low-quality entries, and suggests merging strategies |
+| `State Reporting` | Aggregates health metrics from backend services for monitoring |
### Process Coordination Mechanisms
-- **Asynchronous Communication**: All operations use async/await patterns via Tokio runtime for non-blocking I/O.
-- **Shared State via Arc/Mutex**: In CLI and TUI apps, `Arc` enables thread-safe access.
-- **Configuration-Driven Initialization**: TOML-based config (`config.toml`) controls subsystem behavior (LLM, Qdrant, logging).
-- **Event Loop Architecture**: Interactive tools like `cortex-mem-tars` use message-passing loops for input handling and rendering.
-
-```mermaid
-graph TD
- subgraph "User Interfaces"
- A[HTTP Client] --> C[cortex-mem-service]
- B[Terminal User] --> D[cortex-mem-cli]
- E[AI Agent] --> F[cortex-mem-mcp]
- G[TUI Application] --> H[cortex-mem-tars]
- end
-
- C --> I[cortex-mem-core]
- D --> I
- F --> I
- H --> I
-
- subgraph "Core Engine"
- I --> J[LLM Client]
- I --> K[Qdrant Vector Store]
- end
-
- I --> L[cortex-mem-insights]
- L --> M[Dashboard UI]
-
- style I fill:#4a90e2,color:white
- style J fill:#50c878,color:white
- style K fill:#d64161,color:white
-```
-
-> **Figure 1: High-Level System Architecture and Workflow Coordination**
+- **Shared State via `Arc`**: Ensures thread-safe access across async handlers in Rust components.
+- **Event-Driven Communication**: Message passing between UI and background tasks using channels (`mpsc::UnboundedSender`) in TUI applications.
+- **API Abstraction Layer**: TypeScript client encapsulates REST interactions, enabling consistent frontend-backend communication.
+- **Reactive Stores (Svelte)**: Frontend state managed via writable/derived stores for real-time updates.
+- **Tracing Integration**: Unified logging across modules using `tracing` crate for observability.
---
## 2. Main Workflows
-### 2.1 Memory Management Process
+### 2.1 Memory Management Workflow
-This is the foundational CRUD workflow enabling creation, reading, updating, and deletion of memory records.
+This is the primary business workflow, governing how memories are created, stored, retrieved, and optimized.
-#### Execution Path
```mermaid
-sequenceDiagram
- participant U as User/Application
- participant CLI as cortex-mem-cli
- participant API as cortex-mem-service
- participant MM as MemoryManager
- participant LLM as LLM Client
- participant VS as Qdrant Vector Store
-
- U->>CLI: add --content "Hello world"
- CLI->>MM: create_memory(content)
- MM->>LLM: generate_embedding(content)
- LLM-->>MM: embedding vector
- MM->>VS: upsert(memory + embedding)
- VS-->>MM: success
- MM-->>CLI: memory_id
- CLI-->>U: Created memory ID: abc123
+graph TD
+ A[User/Agent Request] --> B{Operation Type}
+ B -->|Create| C[Content Analysis & Classification]
+ B -->|Search| D[Generate Query Embedding]
+ B -->|Optimize| E[Analyze Memory Collection]
+
+ C --> F[Extract Facts, Entities, Keywords]
+ F --> G[Generate Memory Embedding]
+ G --> H[Store in Qdrant DB]
+
+ D --> I[Vector Similarity Search]
+ I --> J[Apply Metadata Filters]
+ J --> K[Return Ranked Results]
+
+ E --> L[Detect Duplicates & Issues]
+ L --> M[Generate Optimization Plan]
+ M --> N[Execute Merge/Delete Operations]
+
+ H --> P((Persistent Storage))
+ K --> Q[Display Results]
+ N --> R[Update Memory State]
+
+ style A fill:#4CAF50,stroke:#388E3C
+ style Q fill:#2196F3,stroke:#1976D2
+ style P fill:#FF9800,stroke:#F57C00
```
-#### Detailed Steps
-| Step | Component | Operation | Input | Output |
-|------|---------|----------|-------|--------|
-| 1 | `cortex-mem-cli/src/main.rs` | Parse CLI args, dispatch to `add` command | Command-line arguments | `AddCommand` struct |
-| 2 | `cortex-mem-cli/src/commands/add.rs` | Extract content, detect conversation format | Raw string input | Parsed `MemoryContent` |
-| 3 | `cortex-mem-core/src/memory/manager.rs` | Call `create_memory()` | Content, user/agent IDs | Memory object |
-| 4 | `cortex-mem-core/src/llm/client.rs` | Generate embedding via LLM API | Text content | `[f32; 384]` vector |
-| 5 | `cortex-mem-core/src/vector_store/qdrant.rs` | Upsert into Qdrant collection | Memory + embedding | Success/Failure |
-| 6 | `cortex-mem-cli/src/commands/add.rs` | Print result to console | Memory ID | Human-readable output |
-
-#### Business Value
-- Enables AI agents to persist context across sessions.
-- Supports structured metadata tagging (topics, keywords, type).
-- Provides reliable storage with unique identifiers for future reference.
-
-#### Technical Characteristics
-- **Idempotent Operations**: Duplicate inserts return existing ID if detected.
-- **Batch Support**: Bulk operations supported via `/batch` endpoints.
-- **Async I/O**: All steps executed asynchronously using Tokio.
+#### Execution Order and Dependencies
+1. **Input Reception** → Command parsing (CLI) or route dispatch (HTTP/MCP)
+2. **Preprocessing** → Content classification, role detection (e.g., "User:", "Assistant:")
+3. **LLM Interaction** → Fact extraction, embedding generation
+4. **Storage Operation** → Vector write to Qdrant + metadata persistence
+5. **Indexing & Retrieval** → Semantic search using vector similarity
+6. **Post-processing** → Filtering by user_id, agent_id, topics, etc.
+7. **Output Formatting** → Structured console output or JSON response
+
+#### Input/Output Data Flows
+| Step | Input | Output | Source Module | Target Module |
+|------|-------|--------|---------------|----------------|
+| Create | Raw text/conversation | Classified content | CLI/API | Memory Types System |
+| Extract | Text content | Facts, entities, keywords | Information Extraction | Memory CRUD |
+| Embed | Text | Vector embedding | LLM Client | Vector Store |
+| Store | Memory object | Stored ID | Memory Manager | Qdrant |
+| Search | Query string/filters | List of scored memories | Semantic Search | Access Interface |
+| Optimize | Strategy + filters | Optimization plan | Optimization Engine | Memory Manager |
+
+> ✅ **Business Value**: Enables contextual continuity in AI agents by preserving conversational history and derived insights.
---
-### 2.2 Memory Search Process
+### 2.2 System Initialization Workflow
-Enables users and agents to retrieve relevant memories based on semantic similarity or metadata filtering.
+Handles bootstrapping of the entire system across all entry points.
-#### Execution Path
```mermaid
graph TD
- A[User/App] --> B{Initiate Search}
- B --> C[Send Query + Filters]
- C --> D[Generate Embedding via LLM]
- D --> E[Semantic Search in Qdrant]
- E --> F[Apply Metadata Filters]
- F --> G[Rank & Return Results]
- G --> A
+ A[Load Configuration] --> B[Initialize Tracing & Logging]
+ B --> C{Auto-detect Components}
+ C --> D[Detect Vector Store]
+ C --> E[Detect LLM Client]
+ C --> F[Determine Embedding Dimension]
+
+ D --> G[Create MemoryManager]
+ E --> G
+ F --> G
+
+ G --> H[Expose Interfaces]
+ H --> I[CLI Interface]
+ H --> J[HTTP API Service]
+ H --> K[MCP Server]
+ H --> L[Web Dashboard]
+
+ style A fill:#4CAF50,stroke:#388E3C
+ style I fill:#2196F3,stroke:#1976D2
+ style J fill:#2196F3,stroke:#1976D2
+ style K fill:#2196F3,stroke:#1976D2
+ style L fill:#2196F3,stroke:#1976D2
```
-#### Detailed Steps
-| Step | Module | Function | Description |
-|------|--------|--------|-------------|
-| 1 | `handlers.rs::search_memories` | Receive `/search` request | Accepts query string and optional filters (user_id, agent_id, topics) |
-| 2 | `llm/client.rs::generate_embedding` | Encode query into vector | Uses configured LLM (e.g., OpenAI, Ollama) to embed query |
-| 3 | `qdrant.rs::semantic_search` | Perform ANN search | Retrieve top-K nearest neighbors above threshold |
-| 4 | `memory/manager.rs::apply_filters` | Filter by metadata | Apply user, agent, topic, keyword constraints |
-| 5 | `memory/manager.rs::rank_results` | Re-rank by composite score | Combine semantic score with importance, recency, relevance |
-| 6 | `handlers.rs` | Return JSON response | Paginated list with scores, metadata, and snippets |
-
-#### Supported Query Modes
-- **Semantic Only**: Free-text search ("What did I say about planning?")
-- **Metadata Filtered**: List all memories for a given agent/topic
-- **Hybrid**: "Find conversations about goals" → semantic + filter on `type=conversation`
-
-#### Performance Notes
-- Default limit: 20 results
-- Configurable scoring weights: `relevance_weight`, `importance_weight`, `recency_decay`
-- Caching layer planned but not yet implemented
+#### Technical Details
+- **Config Loading Path Resolution**:
+ - Current directory → Home directory → System-wide locations
+ - Supports default values via `Default` trait (e.g., log level = INFO)
+- **Auto-Detection Logic**:
+ - If embedding dimension not specified, infers from LLM model or tests dynamically
+ - Falls back to common dimensions (e.g., 1536 for OpenAI)
+- **Service Binding**:
+ - All interfaces share the same `MemoryManager` instance wrapped in `Arc>`
+ - Tokio runtime manages concurrent access
+
+> ⚙️ **Operational Insight**: This modular initialization enables deployment flexibility—developers can run only needed interfaces (e.g., just MCP server).
---
-### 2.3 Memory Optimization Process
+### 2.3 Optimization Execution Workflow
-A high-value workflow that improves memory quality by detecting and resolving issues such as duplication, redundancy, and poor structure.
+Enables intelligent cleanup and enhancement of memory collections.
-#### Execution Path
```mermaid
-sequenceDiagram
- participant U as User
- participant O as Optimizer CLI
- participant OD as OptimizationDetector
- participant OP as OptimizerEngine
- participant MM as MemoryManager
- participant LLM as LLM Service
-
- U->>O: optimize --strategy deduplicate --preview
- O->>OD: detect_issues(filters)
- OD->>MM: list_all_memories()
- MM-->>OD: memory batch
- OD->>LLM: analyze_for_duplicates(batch)
- LLM-->>OD: issue report
- OD-->>O: preview plan
- U->>O: confirm execution
- O->>OP: execute_plan(plan)
- OP->>LLM: rewrite/refactor memories
- OP->>MM: update/delete affected memories
- OP-->>O: optimization_report
- O-->>U: Display summary and impact
+graph TD
+ A[User Initiates Optimization] --> B[Select Strategy & Filters]
+ B --> C[Analyze Memory Collection]
+ C --> D[Identify Duplicates & Issues]
+ D --> E[Use LLM for Similarity Assessment]
+ E --> F[Generate Optimization Plan]
+ F --> G[Preview Changes & Confirm]
+ G --> H{Execute?}
+ H -->|Yes| I[Apply Changes via MemoryManager]
+ H -->|No| J[Cancel]
+
+ I --> K[Merge Related Memories]
+ I --> L[Delete Redundant Entries]
+ I --> M[Update Memory Metadata]
+
+ K --> N[Log Actions & Metrics]
+ L --> N
+ M --> N
+
+ N --> O[Report Results & Statistics]
+ O --> P[Display Optimization Summary]
+
+ style A fill:#4CAF50,stroke:#388E3C
+ style P fill:#2196F3,stroke:#1976D2
```
-#### Key Subprocesses
-1. **Issue Detection**
- - Scan memories for:
- - Exact or near-duplicate content
- - Low-importance scores (< 0.3)
- - Poorly structured entries
- - Redundant information across related memories
- - Implemented in `optimization_detector.rs`
-
-2. **Plan Generation**
- - Build safe transformation plan:
- - Merge duplicates
- - Rewrite unclear entries
- - Archive obsolete memories
- - Preview mode shows proposed changes before execution
-
-3. **Execution Engine**
- - Apply transformations using LLM rewriting prompts
- - Batch updates to minimize API calls
- - Track version history for rollback capability
-
-4. **Reporting**
- - Generate JSON and Markdown reports
- - Include statistics: # resolved, # deleted, # updated, estimated quality gain
-
-#### Configuration Options
-| Parameter | Default | Purpose |
-|---------|--------|--------|
-| `--strategy` | `full` | One of: `deduplicate`, `quality`, `relevance`, `full` |
-| `--filter-user` | all | Restrict scope to specific user |
-| `--dry-run` | false | Show what would be done without applying changes |
-| `--aggressive` | false | Enable deeper restructuring (riskier) |
-
-#### Business Value
-- Prevents memory bloat and degradation over time.
-- Maintains high signal-to-noise ratio in agent knowledge base.
-- Enables trustworthiness of retrieved context.
+#### Supported Strategies
+| Strategy | Purpose |
+|--------|---------|
+| Full | Comprehensive scan and optimization |
+| Incremental | Only recent memories |
+| Deduplication | Find and merge near-duplicate entries |
+| Quality | Remove low-importance or irrelevant memories |
+| Relevance | Filter based on topic alignment |
+| Space | Maximize storage efficiency |
+
+#### Preview Mode Safety
+- Shows estimated impact before execution
+- Requires explicit confirmation
+- Supports dry-run analysis without modification
+
+> 🔐 **Security Note**: Interactive confirmation prevents accidental data loss during bulk operations.
---
@@ -228,107 +191,82 @@ sequenceDiagram
### 3.1 Multi-Module Coordination Mechanisms
-The system uses a layered architecture where higher-level interfaces delegate to a unified core engine.
+| Relationship | Mechanism | Example |
+|------------|-----------|--------|
+| **Access Interface → Memory Management** | Direct function call via `MemoryManager` | CLI calls `.add_memory()` |
+| **Memory Management → LLM Integration** | Async method invocation | `llm_client.generate_embedding()` |
+| **Memory Management → Storage Integration** | Vector store abstraction | `QdrantStore.store_embedding()` |
+| **Frontend → Backend** | RESTful API over HTTP | `/api/memory/search` |
+| **Agent → System** | MCP protocol over stdio | `store_memory`, `query_memory` tools |
-#### Inter-Module Dependencies
+#### Inter-Process Communication Diagram
```mermaid
-graph LR
- CLI[cortex-mem-cli] --> CORE[cortex-mem-core]
- API[cortex-mem-service] --> CORE
- MCP[cortex-mem-mcp] --> CORE
- EVAL[cortex-mem-evaluation] --> CORE
- INSIGHTS[cortex-mem-insights] --> API
- CORE --> LLM[LLM Client]
- CORE --> VEC[Qdrant]
-
- classDef domain fill:#e0f7fa,stroke:#0097a7;
- class CLI,API,MCP,EVAL,INSIGHTS domain;
-```
-
-> **Figure 2: Module Interaction Diagram**
-
-Each interface module acts as a facade:
-- **CLI**: Terminal-first UX with rich formatting and interactivity
-- **API**: RESTful interface for integration with external systems
-- **MCP**: Protocol adapter for AI tool calling (e.g., LangChain, AutoGPT)
-- **Insights**: Visualization dashboard built on top of API
+sequenceDiagram
+ participant User
+ participant CLI as cortex-mem-cli
+ participant API as cortex-mem-service
+ participant MCP as cortex-mem-mcp
+ participant MM as MemoryManager
+ participant LLM as LLM Client
+ participant VS as Qdrant Vector Store
-All route through `MemoryManager`, ensuring consistent business logic enforcement.
+ User->>CLI: add --content "Hello"
+ CLI->>MM: add_memory(content)
+ MM->>LLM: generate_embedding(text)
+ LLM-->>MM: embedding vector
+ MM->>VS: insert(embedding, metadata)
+ VS-->>MM: success
+ MM-->>CLI: memory ID
+ CLI-->>User: "Memory added: id=abc123"
+
+ User->>API: POST /api/memory/search {query:"greeting"}
+ API->>MM: search(query)
+ MM->>LLM: generate_embedding(query)
+ MM->>VS: query_by_vector(embedding)
+ VS-->>MM: list of IDs
+ MM->>VS: get_metadata_batch(IDs)
+ MM-->>API: ranked results
+ API-->>User: JSON response
+```
### 3.2 State Management and Synchronization
-#### Shared State Patterns
-| Context | Mechanism | Example |
-|-------|----------|--------|
-| CLI App | `Arc` | Thread-safe sharing between event loop and background tasks |
-| TUI App | Message Passing (`mpsc::UnboundedSender`) | Decouple UI rendering from memory persistence |
-| Web Server | `AppState` with `Arc>` | Share manager across Axum handlers |
-| Frontend | Svelte Stores (`writable`, `derived`) | Reactive state for dashboards |
-
-#### Lifecycle Coordination
-In `cortex-mem-tars`, graceful shutdown ensures:
-1. Exit TUI immediately on `/quit`
-2. Continue processing pending memory saves in background
-3. Log final session summary after persistence completes
-
-Implemented via Tokio task spawning and join handles.
+#### Backend (Rust)
+- **Thread Safety**: Shared state protected via `Arc>` or `Arc>`
+- **Async Runtime**: Tokio handles concurrency with non-blocking I/O
+- **Global Config**: Immutable after load, cloned where needed
+
+#### Frontend (Svelte)
+- **Reactive Stores**:
+ - `memoryStore`: Holds list of memories with filters
+ - `optimizationStore`: Tracks job status and history
+ - `systemStore`: Monitors backend health
+- **Derived Stores**:
+ ```ts
+ export const optimizationStatus = derived(
+ optimizationStore,
+ ($store) => $store.job?.status || 'idle'
+ );
+ ```
### 3.3 Data Passing and Sharing
-#### Internal Data Flow
-```mermaid
-flowchart LR
- Input --> Normalization --> Embedding --> Storage --> Indexing --> Queryable
-```
-
-Data transformations include:
-- UTF-8 normalization and truncation (max 16k tokens)
-- Automatic topic extraction from content
-- Importance scoring via LLM classification
-- Vector indexing within Qdrant
-
-#### Cross-Layer Contracts
-Defined via:
-- Rust structs (`MemoryRecord`, `Filters`, `OptimizationPlan`)
-- TypeScript interfaces (`ApiMemory`, `OptimizationJobStatus`)
-- Serde serialization for inter-process communication
-
-Ensures consistency between CLI, API, and frontend layers.
+| Layer | Mechanism | Format |
+|------|----------|--------|
+| Internal (Rust) | Function arguments, structs | Native types (`Memory`, `Filters`) |
+| External (API) | JSON over HTTP | Defined in `types.ts` |
+| CLI ↔ Core | In-process calls | Strongly-typed Rust structs |
+| Agent ↔ MCP | JSON-RPC over stdio | MCP-compliant tool schema |
### 3.4 Execution Control and Scheduling
-#### Asynchronous Processing Model
-All heavy operations run asynchronously:
-- Embedding generation
-- Vector search
-- Optimization jobs
-- Batch imports
-
-Using **Tokio** runtime with:
-- Task spawning for long-running operations
-- Timeouts enforced per operation (configurable)
-- Backpressure via bounded channels
-
-#### Job Tracking
-Optimization jobs tracked in-memory (future: persistent store):
-- Job ID generation
-- Status polling endpoint (`GET /optimization/status/:id`)
-- Cancellation support via cancellation tokens
-
-Example job lifecycle:
-```mermaid
-stateDiagram-v2
- [*] --> Idle
- Idle --> Analyzing: start_optimization()
- Analyzing --> Executing: issues_detected
- Executing --> Completed: all_applied
- Executing --> Failed: error_during_apply
- Analyzing --> Cancelled: cancel_request
- Executing --> Cancelled: cancel_request
- Failed --> [*]
- Completed --> [*]
- Cancelled --> [*]
-```
+- **Manual Triggering**: User commands via CLI or Web UI
+- **Scheduled Jobs**: Planned via external schedulers (not built-in yet)
+- **Concurrent Processing**:
+ - Multiple CLI/API requests handled concurrently via Tokio
+ - Batch operations use rate limiting (1s delay between batches)
+
+> 🔄 **Performance Tip**: Use batch operations for large-scale imports to reduce overhead.
---
@@ -336,66 +274,54 @@ stateDiagram-v2
### 4.1 Error Detection and Handling
-#### Common Failure Points
-| Layer | Potential Errors | Handling Strategy |
-|------|------------------|-------------------|
-| CLI Parsing | Invalid args | Clap validation + help display |
-| LLM Call | Rate limiting, timeout, auth failure | Retry with exponential backoff; fallback to cached embedding if available |
-| Qdrant DB | Connection loss, index corruption | Graceful degradation; log warning, retry on next attempt |
-| Memory Write | Duplicate key, schema mismatch | Idempotent insert; deduplicate before write |
-| Optimization | Invalid plan, partial failure | Atomic transaction simulation; roll back failed batches |
-
-#### Centralized Logging
-Uses `tracing_subscriber` with structured logs:
-```rust
-tracing::error!(error = ?e, memory_id, "Failed to delete memory");
-```
-Log levels: INFO (default), DEBUG (verbose), ERROR (critical)
+| Component | Strategy |
+|---------|----------|
+| **CLI** | Structured tracing logs + user-friendly messages |
+| **HTTP API** | Centralized error handler returns `{success: false, error: {...}}` |
+| **MCP Server** | Translates domain errors to `ErrorData` per MCP spec |
+| **Frontend** | Try-catch blocks with fallback rendering |
+
+#### Common Errors and Responses
+| Error Type | Handling |
+|----------|---------|
+| `Memory Not Found` | Return 404; suggest alternatives if similar exist |
+| `LLM Rate Limit` | Retry with exponential backoff; fall back to cached embeddings |
+| `Qdrant Unavailable` | Log error; return cached results if possible |
+| `Invalid Configuration` | Fail fast at startup with descriptive message |
+| `JSON Parse Failure` | Graceful degradation; use defaults or skip item |
### 4.2 Exception Recovery Mechanisms
-#### Resilience Features
-- **Retry Logic**: For transient failures (network, rate limits)
-- **Fallback Embeddings**: Use last-known-good embedding during outages
-- **Safe Mode**: Disable optimization when LLM unavailable
-- **Data Validation**: Validate memory integrity before and after operations
-
-#### Rollback Capabilities
-- Memory updates preserve previous versions (soft-delete model)
-- Optimization plans can be reverted using audit trail
-- Backup export via `cortex-mem-cli list --format=json > backup.json`
+- **Graceful Degradation**:
+ - When LLM fails, use basic keyword matching instead of semantic search
+ - On API failure, show last known good state
+- **Retry Logic**:
+ - Python evaluation scripts include retry loops
+ - HTTP clients implement automatic retries (configurable)
+- **Fallback Modes**:
+ - CLI optimization has “preview” mode to avoid unintended changes
+ - Web UI shows mock data when backend is unreachable
### 4.3 Fault Tolerance Strategy Design
-| Risk | Mitigation |
+| Aspect | Strategy |
|------|----------|
-| Single Point of Failure | Stateless services; Qdrant supports clustering |
-| Data Loss | Regular backups encouraged; WAL enabled in Qdrant |
-| LLM Outage | Cache recent embeddings; degrade to keyword-only search |
-| High Load | Request throttling; queue large jobs |
-| Misconfiguration | Schema validation on config load; defaults provided |
+| **Data Loss Prevention** | Final shutdown phase ensures pending writes complete |
+| **Service Isolation** | Each interface runs independently; one crash doesn’t affect others |
+| **Idempotent Operations** | Add operations check for duplicates before insertion |
+| **Backup Support** | Optimization includes optional backup creation before major changes |
### 4.4 Failure Retry and Degradation
-#### Retry Policy
-Configurable in `config.toml`:
-```toml
-[llm.retry]
-max_attempts = 3
-initial_backoff_ms = 500
-max_backoff_ms = 5000
-jitter_factor = 0.1
-```
-
-Applied automatically in `client.rs`.
-
-#### Degradation Modes
-When LLM service is unreachable:
-- **Search**: Fall back to metadata-only filtering
-- **Add**: Skip embedding, mark as "pending embedding"
-- **Optimize**: Disable LLM-dependent strategies, allow manual cleanup only
+- **Rate Limit Handling**:
+ - Detected via HTTP 429 or LLM client feedback
+ - Automatically switches to lower-frequency mode
+ - Delays next request using jittered exponential backoff
+- **Degraded Mode Features**:
+ - Disable advanced features (e.g., deduplication) when LLM unavailable
+ - Fall back to exact match search when embedding generation fails
-Indicated in system status dashboard.
+> 💡 **Best Practice**: Always validate configuration early using `validate_config()` utilities to prevent runtime failures.
---
@@ -403,133 +329,128 @@ Indicated in system status dashboard.
### 5.1 Core Algorithm Processes
-#### Embedding Pipeline
+#### Memory Creation Pipeline
```rust
-async fn create_memory(&self, content: String) -> Result {
- let embedding = self.llm_client.generate_embedding(&content).await?;
- let memory = MemoryRecord::new(content, embedding, self.user_id);
- self.vector_store.upsert(memory).await?;
- Ok(memory.id)
+async fn create_memory(content: &str) -> Result {
+ let memory_type = classify_content(content);
+ let facts = extractor.extract_facts(content).await?;
+ let embedding = llm_client.embed(content).await?;
+ let metadata = MemoryMetadata::new(user_id, agent_id, memory_type);
+
+ let memory = Memory::builder()
+ .content(content)
+ .embedding(embedding)
+ .metadata(metadata)
+ .facts(facts)
+ .build();
+
+ memory_manager.store(memory).await
}
```
-- Uses sentence-transformers or OpenAI models
-- Dimension: typically 384–1536 floats
-- Distance metric: Cosine similarity
-
-#### Deduplication Algorithm
-1. Group memories by user + agent
-2. Compute pairwise cosine distances
-3. Cluster similar vectors (threshold: 0.92)
-4. Select representative memory (highest importance + recency)
-5. Rewrite others to link or merge
-Uses hierarchical agglomerative clustering (HAC).
-
-#### Importance Scoring
-Uses few-shot prompting:
+#### Optimization Decision Hierarchy
```text
-Rate this memory's importance from 0.0 to 1.0:
-- Is it actionable?
-- Does it contain personal facts?
-- Will it be useful later?
-
-Content: "{content}"
-→ Score:
+Preference Order:
+IGNORE > MERGE > UPDATE > CREATE
+
+Rules:
+1. If new fact matches existing closely → IGNORE
+2. If related but complementary → MERGE (summarize)
+3. If outdated/incomplete → UPDATE
+4. Otherwise → CREATE new memory
```
-Score normalized and combined with length, source, and frequency.
### 5.2 Data Processing Pipelines
-#### Ingestion Pipeline
+#### Ingestion Flow
```mermaid
flowchart LR
- RawInput --> Parser --> Cleaner --> Classifier --> Embedder --> Indexer --> Stored
+ RawInput --> Parser{是否为对话}
+ Parser -->|"是"| RoleSplit[拆分为用户和助手]
+ Parser -->|"否"| DirectStore
+ RoleSplit --> BatchProcessor
+ BatchProcessor --> Embedder
+ Embedder --> Extractor
+ Extractor --> Classifier
+ Classifier --> Storage
```
-Steps:
-1. **Parse**: Detect conversation turns (`User:` / `Assistant:`)
-2. **Clean**: Remove noise, normalize whitespace
-3. **Classify**: Assign `type` (fact, conversation, goal, etc.)
-4. **Extract Topics**: Keyword extraction via TF-IDF + LLM suggestion
-5. **Generate Embedding**
-6. **Index**: Insert into Qdrant with payload (metadata)
-
-#### Search Pipeline
+#### Search Flow
```mermaid
flowchart LR
- Query --> Embed --> ANN --> Filter --> Rerank --> Format --> Return
+ Query --> EmbedQuery
+ EmbedQuery --> VectorSearch["Find top-k similar vectors"]
+ VectorSearch --> FetchMetadata
+ FetchMetadata --> ApplyFilters["Filter by user_id, agent_id, topics"]
+ ApplyFilters --> RankResults["Score by relevance + recency"]
+ RankResults --> FormatOutput
```
-Post-processing includes:
-- Snippet generation (context window around match)
-- Highlighting query terms
-- Recency boost: `score *= exp(-λ * age_in_days)`
-
### 5.3 Business Rule Execution
-#### Memory Retention Policies
-| Type | Default TTL | Configurable |
-|------|------------|-------------|
-| Conversation | 90 days | Yes |
-| Fact | Permanent | Yes |
-| Temporary Note | 7 days | Yes |
-| Goal | Until completion | Auto-archive |
+| Rule | Enforcement Point |
+|------|-------------------|
+| No duplicate memories within 5 minutes | At ingestion time via timestamp + hash check |
+| Maximum 100 characters in preview | During formatting in frontend/backend |
+| Importance score decay over time | Applied during search ranking |
+| Only owner can delete memory | Checked in `MemoryManager.delete()` using `user_id` |
+| Confirmation required for bulk delete | Enforced in CLI command logic |
-Enforced via scheduled cleanup job.
+### 5.4 Technical Implementation Details
-#### Access Control Rules
-Currently based on:
-- `user_id` (required)
-- `agent_id` (optional filter)
-- Future: RBAC roles (planned)
+#### Asynchronous Architecture
+- All critical operations are `async fn`
+- Tokio runtime powers CLI, API, and MCP servers
+- Non-blocking I/O ensures responsiveness even under load
-All queries implicitly scoped to requester’s identity.
+#### Memory Intelligence Loop
+Uses LLM-driven updater to decide actions:
+```rust
+let decision = llm_client.prompt_with_schema::(
+ "Given these existing memories and new input, what should we do?",
+ context
+).await?;
+```
-### 5.4 Technical Implementation Details
+#### Internationalization (i18n)
+- Built with Svelte stores and reactive `$t` function
+- Supports English, Chinese, Japanese
+- Fallback to English on missing keys
+- Language preference persisted in `localStorage`
-#### Core Components
-| File | Responsibility |
-|------|----------------|
-| `memory/manager.rs` | Primary orchestrator; exposes public API |
-| `llm/client.rs` | Handles authentication, retries, batching |
-| `vector_store/qdrant.rs` | Implements Qdrant gRPC client bindings |
-| `optimization_detector.rs` | Analyzes memory corpus for issues |
-| `optimizer.rs` | Executes refactoring plans |
-
-#### Concurrency Model
-- **Tokio Runtime**: Multi-threaded, work-stealing scheduler
-- **No Global Mutexes**: Per-operation locking
-- **Channel-Based Communication**: Between UI and worker threads
-
-#### Configuration System
-Built with:
-- `serde` for deserialization
-- `config.toml` as primary source
-- Environment variable overrides
-- Defaults via `impl Default` trait
-
-Supports hot-reload in development mode.
-
-#### Performance Optimization Strategies
-| Area | Technique |
-|------|----------|
-| Latency | Async I/O, connection pooling |
-| Throughput | Batch operations, bulk indexing |
-| Scalability | Stateless services, horizontal scaling |
-| Efficiency | Embedding caching, lazy loading |
-| Observability | Tracing spans, metrics export |
+#### Terminal UI (TARS Example)
+- Built with `ratatui` + `crossterm`
+- Three-panel layout: conversation (75%), input (25%), logs (right sidebar)
+- Real-time streaming responses with cursor animation
+- Sophisticated UTF-8 handling for cursor positioning
+
+#### Shutdown Sequence (Critical Reliability Feature)
+In `examples/cortex-mem-tars/src/main.rs`:
+```rust
+// After UI exits, continue processing memory saves
+drop(ui_tx); // Signal UI shutdown
+while let Ok(log) = log_rx.try_recv() {
+ process_log_for_memory_storage(&log);
+}
+// Ensure all pending memory operations complete
+```
-Planned enhancements:
-- Redis cache layer for frequent queries
-- Background embedding queue
-- Incremental optimization (vs full scan)
+> ✅ **Reliability Insight**: This design guarantees no memory loss due to premature termination.
---
## Conclusion
-The **cortex-mem** system delivers a robust, extensible framework for AI agent memory management. Its core workflows—memory management, search, and optimization—are deeply integrated, consistently implemented, and resilient to failure. The architecture promotes reuse through a shared core engine while supporting diverse frontends and integration points.
+The **Cortex-Mem** system delivers a robust, extensible framework for managing AI agent memories across diverse usage scenarios. Its architecture emphasizes:
+
+- **Modularity**: Clear separation between access interfaces and core logic
+- **Resilience**: Comprehensive error handling, graceful degradation, and safe shutdown
+- **Intelligence**: LLM-powered content understanding and autonomous optimization
+- **Observability**: Rich logging, monitoring endpoints, and interactive dashboards
-This document serves as a complete guide for developers, operators, and researchers working with the system, providing both high-level understanding and deep technical insight into its operation.
+This documentation provides full operational visibility into the system’s workflows, enabling developers, operators, and integrators to effectively deploy, maintain, and extend its capabilities.
-**Generated from research materials as of 2025-12-18 11:29:14 UTC**
+**Generated on**: 2025-12-30 19:19:37 UTC
+**Documentation Version**: 1.0
+**System Name**: Cortex-Mem
+**Primary Users**: AI Agents, Developers, System Administrators
\ No newline at end of file
diff --git a/litho.docs/4.Deep-Exploration/Access Interface Domain.md b/litho.docs/4.Deep-Exploration/Access Interface Domain.md
new file mode 100644
index 0000000..a3a4a20
--- /dev/null
+++ b/litho.docs/4.Deep-Exploration/Access Interface Domain.md
@@ -0,0 +1,422 @@
+# Technical Documentation: Access Interface Domain
+
+**Generation Time:** 2025-12-30 11:26:39 (UTC)
+**Document Version:** 1.0
+**System:** Cortex-Mem – AI Agent Memory Management System
+
+---
+
+## 1. Introduction
+
+The **Access Interface Domain** in the `cortex-mem` system provides multiple entry points for interacting with the persistent memory management capabilities of AI agents. It enables diverse user types—including developers, system administrators, and intelligent software agents—to store, retrieve, search, and optimize memories through tailored interfaces.
+
+This domain serves as the primary interaction layer between end users/agents and the core memory engine (`cortex-mem-core`), abstracting complex operations into accessible protocols while maintaining consistency across different access methods.
+
+### Key Objectives
+- Support heterogeneous integration scenarios (CLI, API, agent tools, UI)
+- Provide consistent behavior via centralized configuration
+- Enable both human and machine-driven interactions
+- Facilitate monitoring, debugging, and operational control
+
+---
+
+## 2. Architecture Overview
+
+The Access Interface Domain follows a **multi-interface facade pattern**, where each sub-interface acts as a specialized gateway to the shared business logic implemented in `cortex-mem-core`. All interfaces depend directly on the `MemoryManager` for executing memory operations.
+
+```mermaid
+graph TD
+ A[Access Interface Domain] --> B[CLI Interface]
+ A --> C[HTTP API Service]
+ A --> D[MCP Protocol Interface]
+ A --> E[Web Dashboard]
+
+ B -->|Uses| Core[cortex-mem-core]
+ C -->|Uses| Core
+ D -->|Uses| Core
+ E -->|API Calls| C
+ E -->|Direct Use| Core
+
+ style A fill:#2196F3,stroke:#1976D2,color:white
+ style Core fill:#4CAF50,stroke:#388E3C,color:white
+```
+
+> **Note**: The Web Dashboard primarily communicates via the HTTP API but may also use direct core access for advanced features.
+
+### Interaction Flow Pattern
+```mermaid
+sequenceDiagram
+ participant User
+ participant CLI as CLI Interface
+ participant HTTP_API as HTTP API Service
+ participant MCP as MCP Protocol Interface
+ participant Web_Dashboard as Web Dashboard
+ participant Memory_System as MemoryManager (Core)
+
+ User->>CLI: Execute command (e.g., add, search)
+ CLI->>Memory_System: Forward request
+ Memory_System-->>CLI: Return result
+ CLI-->>User: Print output
+
+ User->>HTTP_API: Send HTTP request
+ HTTP_API->>Memory_System: Handle route & forward
+ Memory_System-->>HTTP_API: Response
+ HTTP_API-->>User: JSON response
+
+ AI_Agent->>MCP: Invoke MCP tool (store/query)
+ MCP->>Memory_System: Process call
+ Memory_System-->>MCP: Result
+ MCP-->>AI_Agent: Return structured data
+
+ User->>Web_Dashboard: Browser navigation
+ Web_Dashboard->>HTTP_API: Fetch data via REST
+ HTTP_API-->>Web_Dashboard: JSON payload
+ Web_Dashboard-->>User: Rendered UI
+```
+
+---
+
+## 3. Submodules and Implementation Details
+
+The Access Interface Domain consists of four distinct submodules, each designed for specific usage patterns and target audiences.
+
+### 3.1 CLI Interface
+
+#### Purpose
+Provides a command-line tool for direct interaction by developers and operators. Ideal for scripting, batch operations, and local development.
+
+#### Entry Point
+- **File**: `cortex-mem-cli/src/main.rs`
+- **Framework**: [Clap](https://crates.io/crates/clap) for declarative argument parsing
+
+#### Supported Commands
+| Command | Functionality |
+|--------|---------------|
+| `add` | Store new memory with metadata |
+| `search` | Semantic + metadata-filtered retrieval |
+| `list` | List memories with filters |
+| `delete` | Remove memory by ID |
+| `optimize` | Trigger optimization workflows |
+| `optimize-status` | Check ongoing job status |
+
+#### Code Structure
+```rust
+// Example: AddCommand execution flow
+pub async fn execute(
+ &self,
+ content: String,
+ user_id: Option,
+ agent_id: Option,
+ memory_type: String,
+) -> Result<(), Box> {
+ let metadata = build_metadata(user_id, agent_id, memory_type);
+
+ if is_conversation(&content) {
+ let messages = parse_conversation_content(&content, &user_id, &agent_id);
+ self.memory_manager.add_memory(&messages, metadata).await?;
+ } else {
+ self.memory_manager.store(content, metadata).await?;
+ }
+}
+```
+
+#### Conversation Parsing Logic
+Handles multi-turn dialogues from CLI input:
+- Detects lines starting with `User:` or `Assistant:`
+- Splits content into role-based messages
+- Falls back to single-user message if no roles detected
+
+> **Example Input**:
+```
+User: What's the capital of France?
+Assistant: The capital of France is Paris.
+```
+
+Parsed into two `Message` objects with appropriate roles.
+
+#### Initialization Sequence
+1. Initialize tracing (`tracing_subscriber`)
+2. Parse CLI arguments using Clap
+3. Load config from `config.toml`
+4. Auto-detect vector store and LLM client
+5. Create `MemoryManager`
+6. Route command to handler
+
+---
+
+### 3.2 HTTP API Service
+
+#### Purpose
+Exposes RESTful endpoints for programmatic access. Enables integration with external applications, microservices, and web clients.
+
+#### Entry Point
+- **File**: `cortex-mem-service/src/main.rs`
+- **Framework**: [Axum](https://crates.io/crates/axum) for routing and middleware
+
+#### Endpoint Summary
+| Method | Path | Description |
+|-------|------|-------------|
+| GET | `/health` | System health check |
+| POST | `/memories` | Create memory |
+| GET | `/memories` | List memories |
+| POST | `/memories/search` | Search memories (semantic + filters) |
+| GET | `/memories/{id}` | Retrieve specific memory |
+| PUT | `/memories/{id}` | Update memory |
+| DELETE | `/memories/{id}` | Delete memory |
+| POST | `/optimization` | Start optimization job |
+| GET | `/optimization/{job_id}` | Get optimization status |
+| GET | `/llm/status` | LLM service status |
+
+#### Request Handling Pattern
+```rust
+pub async fn create_memory(
+ State(state): State,
+ Json(request): Json,
+) -> Result, (StatusCode, Json)> {
+ let metadata = build_metadata_from_request(&request);
+
+ if is_conversation_request(&request.content) {
+ let messages = parse_conversation_content(
+ &request.content,
+ &request.user_id,
+ &request.agent_id
+ );
+ match state.memory_manager.add_memory(&messages, metadata).await { ... }
+ } else {
+ match state.memory_manager.store(request.content, metadata).await { ... }
+ }
+}
+```
+
+#### Shared State Model
+```rust
+#[derive(Clone)]
+pub struct AppState {
+ pub memory_manager: Arc,
+ pub optimization_jobs: Arc>>,
+}
+```
+
+All routes receive this state via Axum’s dependency injection mechanism.
+
+#### Middleware
+- CORS enabled permissively during development
+- Structured error handling with standardized responses
+- Tracing integration for observability
+
+---
+
+### 3.3 MCP Protocol Interface
+
+#### Purpose
+Implements the **Memory Control Protocol (MCP)** for seamless integration with AI agents. Allows agents to treat memory operations as callable tools.
+
+#### Entry Point
+- **Binary**: `cortex-mem-mcp/src/main.rs`
+- **Library**: `cortex-mem-mcp/src/lib.rs`
+- **Transport**: Standard I/O (stdio) for compatibility with agent frameworks
+
+#### Implemented Tools
+| Tool Name | Function |
+|----------|---------|
+| `store_memory` | Save a new memory |
+| `query_memory` | Search memories semantically |
+| `list_memories` | List memories with filters |
+| `get_memory` | Retrieve memory by ID |
+
+#### Server Handler Implementation
+```rust
+impl MemoryMcpService {
+ async fn store_memory(
+ &self,
+ arguments: &Map,
+ ) -> Result {
+ let payload = map_mcp_arguments_to_payload(arguments, &self.agent_id);
+ match self.operations.store_memory(payload).await {
+ Ok(response) => Ok(CallToolResult::success(vec![Content::text(
+ serde_json::to_string_pretty(&response).unwrap()
+ )])),
+ Err(e) => Err(self.tools_error_to_mcp_error(e))
+ }
+ }
+
+ // Similar implementations for query_memory, list_memories, get_memory
+}
+```
+
+#### Initialization
+- Loads configuration from default or specified path
+- Initializes `MemoryManager` with auto-detected components
+- Sets up `MemoryOperations` wrapper
+- Serves over stdio using `rmcp::transport::stdio`
+
+#### Integration Example (Agent Side)
+An AI agent can invoke:
+```json
+{
+ "tool": "store_memory",
+ "arguments": {
+ "content": "The user prefers vegan meals.",
+ "user_id": "usr_123",
+ "memory_type": "factual"
+ }
+}
+```
+
+And receive confirmation of successful storage.
+
+---
+
+### 3.4 Web Dashboard
+
+#### Purpose
+Provides a visual interface for monitoring, managing, and analyzing memory collections. Designed for system administrators and developers needing insight into memory quality and performance.
+
+#### Stack
+- **Frontend Framework**: [Svelte](https://svelte.dev/)
+- **Backend Server**: [Elysia](https://elysiajs.com/) (TypeScript)
+- **Routing**: File-based SvelteKit routing
+- **Styling**: Tailwind CSS
+
+#### Entry Points
+- **Server**: `cortex-mem-insights/src/server/index.ts`
+- **Layout**: `cortex-mem-insights/src/routes/+layout.svelte`
+
+#### Layout Component (`+layout.svelte`)
+```svelte
+
+
+
+
+
+
+
+
+
+```
+
+#### Features
+- Real-time system health visualization
+- Memory search and browsing
+- Optimization job tracking
+- Statistics dashboard (duplicates, retention, usage trends)
+- Internationalization support via `$lib/i18n`
+
+#### API Integration
+Communicates with the backend via:
+- `/api/memory/*` – CRUD and search operations
+- `/api/optimization/*` – Optimization lifecycle
+- `/api/system/*` – Health and configuration
+
+Built using modular Elysia plugins:
+```ts
+.use(memoryRoutes)
+.use(optimizationRoutes)
+.use(systemRoutes)
+```
+
+---
+
+## 4. Cross-Cutting Concerns
+
+### 4.1 Configuration Management
+All interfaces share a common configuration model defined in `cortex-mem-config`, loaded from `config.toml`.
+
+Key settings include:
+- Qdrant connection details
+- LLM provider (OpenAI) credentials
+- Server host/port (for HTTP/MCP services)
+- Logging level
+- Memory retention policies
+
+Each interface reads the same config file, ensuring uniform behavior.
+
+### 4.2 Error Handling and Logging
+- Unified logging via `tracing` crate
+- Structured JSON logs for machine readability
+- Human-friendly CLI output formatting
+- HTTP error codes and standardized error payloads
+- MCP-compliant error reporting
+
+### 4.3 Security Considerations
+- No built-in authentication in current version (assumes trusted environment)
+- Sensitive data protection relies on transport-level security
+- Future roadmap likely includes API keys and RBAC
+
+---
+
+## 5. Integration with Core Domains
+
+The Access Interface Domain depends heavily on other domains:
+
+| From | To | Type | Purpose |
+|------|----|------|--------|
+| Access Interface | Memory Management | Service Call | Execute CRUD, search, optimization |
+| Access Interface | Configuration Management | Configuration Dependency | Load runtime settings |
+| Access Interface | LLM Integration | Indirect | Embedding generation, analysis |
+| Access Interface | Storage Integration | Indirect | Vector persistence |
+
+> **Dependency Strength**: 9.5 (Very Strong)
+
+All interfaces delegate actual memory processing to `MemoryManager`, adhering to the principle of separation of concerns.
+
+---
+
+## 6. Usage Scenarios
+
+| User Type | Preferred Interface | Use Case |
+|---------|---------------------|----------|
+| Developer | CLI / HTTP API | Scripting, testing, integration |
+| AI Agent | MCP Protocol | Contextual memory access during reasoning |
+| Operator | Web Dashboard | Monitoring, maintenance, optimization |
+| Application | HTTP API | Embedded agent memory layer |
+
+---
+
+## 7. Development and Extension Guide
+
+### Adding a New Command (CLI)
+1. Define command struct in `commands/mod.rs`
+2. Implement `execute()` method calling `MemoryManager`
+3. Add variant to `Commands` enum in `main.rs`
+4. Wire up in `main()` match block
+
+### Adding a New API Endpoint
+1. Define handler function in `handlers.rs`
+2. Add route in `main.rs` router
+3. Define request/response models in `models.rs`
+4. Ensure proper error mapping
+
+### Extending MCP Tools
+1. Add new method to `MemoryMcpService`
+2. Register in `ServerHandler` trait implementation
+3. Update tool definitions via `get_mcp_tool_definitions()`
+
+---
+
+## 8. Conclusion
+
+The **Access Interface Domain** is a critical component of the `cortex-mem` architecture, enabling flexible and secure interaction with AI agent memory systems. By providing four complementary interfaces—CLI, HTTP API, MCP, and Web Dashboard—it supports a wide range of use cases from automated agent workflows to human-operated administration.
+
+Its design emphasizes:
+- **Consistency**: Shared core logic ensures uniform behavior
+- **Extensibility**: Modular structure allows adding new interfaces
+- **Interoperability**: Supports both human and machine consumers
+- **Observability**: Rich logging and monitoring capabilities
+
+Future enhancements could include:
+- Authentication and authorization layers
+- WebSockets for real-time updates
+- gRPC interface for high-performance integrations
+- Plugin system for custom interface extensions
+
+This domain exemplifies modern API-first design principles applied to AI infrastructure, making persistent memory accessible, reliable, and maintainable.
\ No newline at end of file
diff --git a/litho.docs/4.Deep-Exploration/Configuration Management Domain.md b/litho.docs/4.Deep-Exploration/Configuration Management Domain.md
new file mode 100644
index 0000000..6c9cac9
--- /dev/null
+++ b/litho.docs/4.Deep-Exploration/Configuration Management Domain.md
@@ -0,0 +1,444 @@
+# Configuration Management Domain Technical Documentation
+
+## 1. Overview
+
+The Configuration Management Domain serves as the centralized configuration system for the Cortex-Mem platform, providing a unified approach to manage application settings across all components. This domain ensures consistent behavior and proper initialization of the memory management system by defining comprehensive configuration schemas and implementing robust loading, validation, and auto-detection mechanisms.
+
+As an Infrastructure Domain with high importance (8.0/10), the Configuration Management system acts as the single source of truth for settings that govern various subsystems including vector database connectivity, LLM integration, HTTP server parameters, embedding services, and memory management policies. The system is implemented primarily in Rust with supporting Python utilities for validation, following a modular design that enables extensibility while maintaining type safety.
+
+## 2. Architecture and Design
+
+### 2.1 Component Structure
+
+The Configuration Management Domain consists of three primary components:
+
+- **Config Structure**: Defines the schema and data types for configuration through Rust structs with serde serialization
+- **Config Loading**: Handles file parsing and error propagation using TOML format
+- **Config Validation**: Ensures configuration integrity through both structural and content validation
+
+The core implementation resides in `cortex-mem-config/src/lib.rs`, which exports a comprehensive set of configuration structs that are consumed by other components during initialization.
+
+### 2.2 Key Data Structures
+
+The configuration system is built around a hierarchical structure of Rust structs that represent different subsystem configurations:
+
+```rust
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct Config {
+ pub qdrant: QdrantConfig,
+ pub llm: LLMConfig,
+ pub server: ServerConfig,
+ pub embedding: EmbeddingConfig,
+ pub memory: MemoryConfig,
+ pub logging: LoggingConfig,
+}
+```
+
+Each subsystem has its own dedicated configuration struct with specific fields relevant to its operation:
+
+- **QdrantConfig**: Manages vector database connection parameters
+- **LLMConfig**: Handles language model service credentials and behavior
+- **ServerConfig**: Controls HTTP server binding and CORS policies
+- **EmbeddingConfig**: Configures embedding generation service
+- **MemoryConfig**: Defines memory management policies and thresholds
+- **LoggingConfig**: Specifies logging behavior and output destinations
+
+### 2.3 Design Principles
+
+The configuration system adheres to several key design principles:
+
+1. **Type Safety**: Leverages Rust's strong typing system to prevent invalid configurations at compile time
+2. **Default Values**: Implements sensible defaults through the `Default` trait for optional parameters
+3. **Extensibility**: Supports adding new configuration sections without breaking existing code
+4. **Validation**: Provides multi-layered validation to ensure configuration correctness
+5. **Auto-detection**: Includes intelligent features like automatic embedding dimension detection
+
+## 3. Implementation Details
+
+### 3.1 Configuration Schema
+
+The configuration schema is defined using Rust's derive macros for Serde, enabling seamless serialization between TOML files and in-memory data structures. The main `Config` struct composes multiple subsystem configurations, creating a comprehensive hierarchy that covers all aspects of the system.
+
+Key implementation features include:
+
+- **Serde Integration**: Uses `#[derive(Serialize, Deserialize)]` for automatic TOML parsing
+- **Error Handling**: Employs `anyhow::Result` for rich error propagation with context
+- **Path Abstraction**: Accepts generic path types through `AsRef` for flexibility
+- **Clone Support**: Implements `Clone` trait to enable configuration sharing across components
+
+### 3.2 Default Values Implementation
+
+The system implements default values for optional parameters through Rust's `Default` trait, ensuring that required functionality works even when users don't specify all settings:
+
+```rust
+impl Default for MemoryConfig {
+ fn default() -> Self {
+ MemoryConfig {
+ max_memories: 10000,
+ similarity_threshold: 0.65,
+ max_search_results: 50,
+ memory_ttl_hours: None,
+ auto_summary_threshold: 32768,
+ auto_enhance: true,
+ deduplicate: true,
+ merge_threshold: 0.75,
+ search_similarity_threshold: Some(0.70),
+ }
+ }
+}
+
+impl Default for LoggingConfig {
+ fn default() -> Self {
+ LoggingConfig {
+ enabled: false,
+ log_directory: "logs".to_string(),
+ level: "info".to_string(),
+ }
+ }
+}
+```
+
+This approach provides sensible defaults while still allowing users to override them through the configuration file.
+
+### 3.3 Auto-detection Mechanism
+
+One of the advanced features of the configuration system is the ability to auto-detect embedding dimensions when not explicitly specified. This is particularly valuable because embedding dimensions vary between different LLM models and services.
+
+The auto-detection workflow is implemented in `cortex-mem-core/src/init/mod.rs`:
+
+1. When `embedding_dim` is not specified in the Qdrant configuration
+2. A temporary LLM client is created using the provided API credentials
+3. A test embedding is generated for a sample text ("test")
+4. The dimension of the resulting embedding vector is detected
+5. The configuration is updated with the detected dimension
+
+```rust
+pub async fn create_auto_config(
+ base_config: &QdrantConfig,
+ llm_client: &dyn LLMClient,
+) -> Result {
+ let mut config = base_config.clone();
+
+ if config.embedding_dim.is_none() {
+ info!("Auto-detecting embedding dimension for configuration...");
+ let test_embedding = llm_client.embed(\"test\").await?;
+ let detected_dim = test_embedding.len();
+ info!(\"Detected embedding dimension: {}\", detected_dim);
+ config.embedding_dim = Some(detected_dim);
+ }
+
+ Ok(config)
+}
+```
+
+This feature eliminates configuration errors related to mismatched embedding dimensions and improves user experience by reducing setup complexity.
+
+## 4. Configuration Workflow
+
+### 4.1 Configuration Loading Process
+
+The configuration loading process follows a well-defined sequence:
+
+```mermaid
+graph TD
+ A[Start] --> B{Config File Exists?}
+ B -->|No| C[Return Error]
+ B -->|Yes| D[Parse TOML File]
+ D --> E[Validate Required Sections]
+ E --> F[Validate Required Fields]
+ F --> G[Load Subsystem Configs]
+ G --> H[Apply Default Values if Needed]
+ H --> I[Auto-Detect Embedding Dimension?]
+ I -->|Yes| J[Infer from LLM Client]
+ J --> K[Create Final Config]
+ I -->|No| K
+ K --> L[Initialize Components]
+ L --> M[End]
+```
+
+### 4.2 Sequence of Operations
+
+The detailed sequence of operations during configuration loading:
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant ConfigModule
+ participant FileSystem
+ participant Validator
+ participant LLMClient
+ participant VectorStore
+
+ User->>ConfigModule: Request config load
+ ConfigModule->>FileSystem: Read config.toml
+ FileSystem-->>ConfigModule: Return file content
+ ConfigModule->>ConfigModule: Parse TOML to Config struct
+ ConfigModule->>Validator: Validate required sections/fields
+ Validator-->>ConfigModule: Validation result
+ alt If embedding_dim not specified
+ ConfigModule->>LLMClient: Create temporary client
+ LLMClient-->>ConfigModule: Return client
+ ConfigModule->>LLMClient: Request test embedding
+ LLMClient-->>ConfigModule: Return embedding dimension
+ ConfigModule->>ConfigModule: Update config with detected dimension
+ end
+ ConfigModule->>VectorStore: Initialize with config
+ VectorStore-->>ConfigModule: Return initialized store
+ ConfigModule-->>User: Return ready-to-use components
+```
+
+## 5. Configuration File Format
+
+### 5.1 TOML Configuration Structure
+
+The system uses TOML (Tom's Obvious, Minimal Language) as the configuration file format due to its readability and support for complex data structures. The main configuration file (`config.toml`) contains the following sections:
+
+```toml
+# Main configuration for the cortex-mem system
+
+[qdrant]
+url = "http://localhost:6334"
+collection_name = "cortex-mem-hewlett_drawn"
+# embedding_dim = 1024 # Optional, will be auto-detected if not specified
+timeout_secs = 30
+
+[llm]
+api_base_url = "https://wanqing-api.corp.kuaishou.com/api/gateway/v1/endpoints"
+api_key = "fs2wzco3o7haz38df1jo4vavnvauxtuz3f0b"
+model_efficient = "ep-i4abhq-1764595896785685523"
+temperature = 0.1
+max_tokens = 4096
+
+[server]
+host = "0.0.0.0"
+port = 3000
+cors_origins = ["*"]
+
+[embedding]
+api_base_url = "https://wanqing-api.corp.kuaishou.com/api/gateway/v1/endpoints"
+api_key = "fs2wzco3o7haz38df1jo4vavnvauxtuz3f0b"
+model_name = "ep-9kf01g-1762237999831608613"
+batch_size = 10
+timeout_secs = 30
+
+[memory]
+max_memories = 10000
+max_search_results = 50
+# memory_ttl_hours = 24 # Optional
+auto_summary_threshold = 4096
+auto_enhance = true
+deduplicate = true
+similarity_threshold = 0.65
+merge_threshold = 0.75
+search_similarity_threshold = 0.5
+
+[logging]
+enabled = true
+log_directory = "logs"
+level = "debug"
+```
+
+### 5.2 Required vs. Optional Parameters
+
+The configuration system distinguishes between required and optional parameters:
+
+**Required Parameters:**
+- `qdrant.url`: Vector database endpoint URL
+- `qdrant.collection_name`: Name of the collection in Qdrant
+- `llm.api_base_url`: LLM service API endpoint
+- `llm.api_key`: Authentication key for LLM service
+- `llm.model_efficient`: Model identifier for efficient operations
+- `embedding.api_base_url`: Embedding service API endpoint
+- `embedding.api_key`: Authentication key for embedding service
+- `embedding.model_name`: Model identifier for embeddings
+
+**Optional Parameters with Defaults:**
+- `qdrant.embedding_dim`: Auto-detected if not specified
+- `memory.memory_ttl_hours`: No expiration if not specified
+- `memory.search_similarity_threshold`: Defaults to 0.70
+- `logging.enabled`: Defaults to false
+- `logging.level`: Defaults to "info"
+
+## 6. Validation System
+
+### 6.1 Multi-Layered Validation Approach
+
+The configuration system employs a multi-layered validation strategy to ensure configuration integrity:
+
+1. **Structural Validation**: Performed automatically by Serde during TOML deserialization
+2. **Content Validation**: Implemented through custom Python utilities in `examples/lomoco-evaluation/src/cortex_mem/config_utils.py`
+3. **Runtime Validation**: Conducted during component initialization
+
+### 6.2 Validation Implementation
+
+The Python-based validation utilities provide comprehensive checking of configuration completeness:
+
+```python
+def validate_config(config_path: str) -> bool:
+ """Validate that config file exists and has required settings."""
+ if not os.path.exists(config_path):
+ print(f"Config file not found: {config_path}")
+ return False
+
+ try:
+ with open(config_path, 'r') as f:
+ content = f.read()
+
+ # Check for required sections
+ required_sections = ["llm", "embedding", "qdrant", "memory"]
+ missing_sections = []
+
+ for section in required_sections:
+ if f"[{section}]" not in content:
+ missing_sections.append(section)
+
+ if missing_sections:
+ print(f"Missing required sections in config: {missing_sections}")
+ return False
+
+ # Check for required fields in each section
+ import toml
+ config_data = toml.load(config_path)
+
+ # Check llm section
+ if "llm" in config_data:
+ llm = config_data["llm"]
+ required_llm_fields = ["api_key", "api_base_url", "model_efficient"]
+ missing_llm = [field for field in required_llm_fields if field not in llm]
+ if missing_llm:
+ print(f"Missing fields in [llm] section: {missing_llm}")
+ return False
+
+ # Similar checks for embedding, qdrant, and other sections
+ return True
+
+ except Exception as e:
+ print(f"Error validating config: {e}")
+ return False
+```
+
+The validation system checks for:
+- Existence of the configuration file
+- Presence of required sections (`[llm]`, `[embedding]`, `[qdrant]`, `[memory]`)
+- Required fields within each section
+- Valid data types and formats
+
+### 6.3 OpenAI-Specific Validation
+
+A specialized validation function checks OpenAI configuration specifically:
+
+```python
+def check_openai_config(config_path: str) -> bool:
+ """Check if OpenAI configuration is properly set."""
+ try:
+ import toml
+ config_data = toml.load(config_path)
+
+ # Check llm section
+ if "llm" not in config_data:
+ print("Missing [llm] section in config")
+ return False
+
+ llm = config_data["llm"]
+ if "api_key" not in llm or not llm["api_key"]:
+ print("OpenAI API key not set in [llm] section")
+ return False
+
+ if "api_base_url" not in llm or not llm["api_base_url"]:
+ print("OpenAI API base URL not set in [llm] section")
+ return False
+
+ # Check embedding section
+ if "embedding" not in config_data:
+ print("Missing [embedding] section in config")
+ return False
+
+ embedding = config_data["embedding"]
+ if "api_key" not in embedding or not embedding["api_key"]:
+ print("OpenAI API key not set in [embedding] section")
+ return False
+
+ return True
+
+ except Exception as e:
+ print(f"Error checking OpenAI config: {e}")
+ return False
+```
+
+## 7. Integration with Other Domains
+
+### 7.1 Configuration Dependencies
+
+The Configuration Management Domain serves as a foundational component that other domains depend on for their operation:
+
+| Dependent Domain | Configuration Usage |
+|------------------|----------------------|
+| **Storage Integration Domain** | Uses Qdrant configuration for vector database connectivity |
+| **LLM Integration Domain** | Uses LLM and embedding configurations for service authentication and behavior |
+| **Memory Optimization Domain** | Uses memory configuration for optimization thresholds and strategies |
+| **Access Interface Domain** | Uses server and logging configurations for interface behavior |
+
+### 7.2 Initialization Flow
+
+During system startup, the configuration system plays a critical role in the initialization workflow:
+
+```mermaid
+graph TD
+ A[Load Configuration] --> B[Initialize Tracing & Logging]
+ B --> C{Auto-detect Components}
+ C --> D[Detect Vector Store]
+ C --> E[Detect LLM Client]
+ C --> F[Determine Embedding Dimension]
+
+ D --> G[Create MemoryManager]
+ E --> G
+ F --> G
+
+ G --> H[Expose Interfaces]
+ H --> I[CLI Interface]
+ H --> J[HTTP API Service]
+ H --> K[MCP Server]
+ H --> L[Web Dashboard]
+```
+
+The configuration is loaded first, then used to initialize tracing and logging systems, followed by auto-detection of components based on the provided settings.
+
+## 8. Best Practices and Recommendations
+
+### 8.1 Configuration Management Best Practices
+
+1. **Environment-Specific Configuration**: Use different configuration files for development, testing, and production environments
+2. **Secret Management**: Never commit API keys to version control; use environment variables or secret management tools
+3. **Version Control**: Keep configuration files under version control (without secrets) to track changes
+4. **Documentation**: Document all configuration options and their effects
+5. **Testing**: Validate configuration changes in non-production environments before deployment
+
+### 8.2 Security Considerations
+
+1. **API Key Protection**: Ensure API keys are stored securely and have appropriate access controls
+2. **Network Security**: Use HTTPS for all external service communications
+3. **Input Validation**: Validate all configuration inputs to prevent injection attacks
+4. **Least Privilege**: Configure API keys with minimal required permissions
+5. **Audit Logging**: Enable logging to monitor configuration changes and access patterns
+
+### 8.3 Performance Optimization
+
+1. **Caching**: Cache configuration data in memory to avoid repeated file I/O
+2. **Connection Pooling**: Use connection pooling for database and API connections
+3. **Batch Processing**: Configure appropriate batch sizes for embedding generation
+4. **Timeout Settings**: Set reasonable timeouts to prevent hanging operations
+5. **Resource Limits**: Configure memory limits to prevent resource exhaustion
+
+## 9. Future Enhancements
+
+Potential improvements to the Configuration Management Domain include:
+
+1. **Dynamic Reloading**: Support for reloading configuration without restarting the system
+2. **Remote Configuration**: Integration with configuration servers for centralized management
+3. **Configuration Versioning**: Support for versioned configurations and rollback capabilities
+4. **Schema Validation**: JSON Schema-based validation for additional integrity checks
+5. **Environment Variables**: Enhanced support for environment variable overrides
+6. **Configuration Templates**: Support for template-based configuration generation
+7. **Validation Rules Engine**: More sophisticated validation rules based on inter-parameter dependencies
+
+These enhancements would further improve the flexibility, security, and maintainability of the configuration system while supporting more complex deployment scenarios.
\ No newline at end of file
diff --git a/litho.docs/4.Deep-Exploration/LLM Integration Domain.md b/litho.docs/4.Deep-Exploration/LLM Integration Domain.md
new file mode 100644
index 0000000..43127d6
--- /dev/null
+++ b/litho.docs/4.Deep-Exploration/LLM Integration Domain.md
@@ -0,0 +1,335 @@
+# LLM Integration Domain Technical Documentation
+
+## 1. Overview
+
+The **LLM Integration Domain** is a core component of the Cortex-Mem system responsible for managing interactions with Large Language Models (LLMs) to extract insights from content and enable intelligent decision-making for memory operations. This domain serves as the cognitive engine that transforms unstructured text into structured knowledge, enabling advanced memory management capabilities.
+
+This documentation provides comprehensive technical details about the architecture, implementation, and functionality of the LLM Integration Domain based on the analysis of source code and system research materials.
+
+### Key Characteristics
+- **Domain Type**: Core Business Domain
+- **Importance Score**: 9.0/10.0
+- **Complexity Level**: High (8.5/10.0)
+- **Primary Responsibility**: Enabling intelligent processing of memory content through LLM-powered analysis, extraction, and decision-making
+
+```mermaid
+graph TD
+ A[LLM Client] --> B[Information Extraction]
+ A --> C[Memory Intelligence]
+ B --> D[Extract Structured Facts]
+ B --> E[Extract Keywords]
+ B --> F[Extract Entities]
+ C --> G[Analyze Content]
+ C --> H[Assess Similarity]
+ C --> I[Recommend Operations]
+ A --> J[Health Check]
+ A --> K[Rate Limiting]
+```
+
+## 2. Architecture and Components
+
+The LLM Integration Domain consists of three primary sub-modules that work together to provide comprehensive LLM capabilities:
+
+### 2.1 LLM Client Module
+
+The `LLMClient` is the foundational component that manages communication with external LLM services, primarily OpenAI. It implements a robust client interface with support for both traditional text completion and modern structured extraction capabilities.
+
+#### Key Features:
+- **Multi-capability Interface**: Supports text completion, embedding generation, and structured data extraction
+- **Fallback Mechanisms**: Implements graceful degradation when structured extraction fails
+- **Rate Limiting**: Includes built-in rate limiting (1 second delay between batch operations)
+- **Health Monitoring**: Provides health checking through embedding requests
+- **Concurrent Operation**: Designed to be cloned and used concurrently in asynchronous contexts
+
+#### Trait Definition:
+```rust
+#[async_trait]
+pub trait LLMClient: Send + Sync + dyn_clone::DynClone {
+ // Core capabilities
+ async fn complete(&self, prompt: &str) -> Result;
+ async fn embed(&self, text: &str) -> Result>;
+ async fn embed_batch(&self, texts: &[String]) -> Result>>;
+
+ // Information extraction
+ async fn extract_keywords(&self, content: &str) -> Result>;
+ async fn summarize(&self, content: &str, max_length: Option) -> Result;
+
+ // Health and monitoring
+ async fn health_check(&self) -> Result;
+
+ // Structured extraction methods
+ async fn extract_structured_facts(&self, prompt: &str) -> Result;
+ async fn extract_detailed_facts(&self, prompt: &str) -> Result;
+ async fn classify_memory(&self, prompt: &str) -> Result;
+ async fn score_importance(&self, prompt: &str) -> Result;
+}
+```
+
+#### Implementation Details:
+The `OpenAILLMClient` implementation uses the RIG framework to interact with OpenAI services, providing both completion and embedding models. The client is configured through centralized configuration (`LLMConfig` and `EmbeddingConfig`) which specifies API keys, endpoints, model names, temperature settings, and token limits.
+
+Key architectural decisions include:
+- **Separation of Concerns**: Different models are used for completion vs. embedding tasks
+- **Error Resilience**: Comprehensive error handling with fallback mechanisms
+- **Performance Optimization**: Batch processing with controlled rate limiting
+- **Debug Support**: Conditional sleep statements in debug builds to prevent rate limiting issues during development
+
+### 2.2 Information Extraction Module
+
+The Information Extraction module leverages the LLM Client to transform unstructured content into structured knowledge representations. This module focuses on extracting specific types of information from memory content.
+
+#### Primary Capabilities:
+- **Fact Extraction**: Identifies and extracts important facts from conversations
+- **Keyword Identification**: Extracts key terms and phrases from content
+- **Entity Recognition**: Detects named entities such as people, organizations, and locations
+- **Language Detection**: Automatically detects the language of input content
+- **Conversation Analysis**: Analyzes conversation dynamics including topics, sentiment, and user intent
+
+#### Data Structures:
+The module defines several structured response types using Serde serialization and JSON Schema generation:
+
+```rust
+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
+pub struct StructuredFactExtraction {
+ pub facts: Vec,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
+pub struct DetailedFactExtraction {
+ pub facts: Vec,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
+pub struct KeywordExtraction {
+ pub keywords: Vec,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
+pub struct EntityExtraction {
+ pub entities: Vec,
+}
+```
+
+#### Processing Workflow:
+1. **Prompt Construction**: Builds specialized prompts for different extraction tasks
+2. **Structured Extraction**: Uses RIG's extractor_completions_api for reliable structured output
+3. **Fallback Processing**: Falls back to traditional completion if structured extraction fails
+4. **Response Parsing**: Parses and validates the structured responses
+5. **Result Normalization**: Standardizes the extracted information for downstream use
+
+The module implements sophisticated prompt engineering techniques, including:
+- Role-specific prompts for user vs. assistant messages
+- Context-aware extraction strategies
+- Language-preserving extraction (facts are returned in the same language as input)
+
+### 2.3 Memory Intelligence Module
+
+The Memory Intelligence module applies LLM analysis to make strategic decisions about memory operations. This module represents the highest level of cognitive processing in the system, using LLM insights to guide memory lifecycle management.
+
+#### Key Functions:
+- **Content Analysis**: Evaluates memory content for importance, relevance, and quality
+- **Similarity Assessment**: Determines semantic similarity between memories for deduplication
+- **Operation Recommendation**: Recommends optimal actions (create, update, merge, delete) for memory management
+- **Optimization Planning**: Generates plans for memory collection improvement
+
+#### Decision-Making Process:
+The module implements a sophisticated decision framework that considers multiple factors when recommending memory operations:
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant LLMClient
+ participant InformationExtraction
+ participant MemoryIntelligence
+
+ User->>LLMClient: Request text completion
+ LLMClient->>LLMClient: Process request with fallback mechanism
+ LLMClient-->>User: Return completion result
+
+ User->>LLMClient: Request structured extraction
+ LLMClient->>LLMClient: Use rig extractor or fallback to completion
+ LLMClient-->>User: Return structured data
+
+ InformationExtraction->>LLMClient: Extract keywords/facts/entities
+ LLMClient-->>InformationExtraction: Return extracted information
+
+ MemoryIntelligence->>LLMClient: Analyze content for memory decisions
+ LLMClient-->>MemoryIntelligence: Return analysis results
+```
+
+## 3. Integration Patterns
+
+### 3.1 Configuration Integration
+
+The LLM Integration Domain depends on the Configuration Management Domain for critical settings:
+
+```rust
+pub struct OpenAILLMClient {
+ completion_model: Agent,
+ completion_model_name: String,
+ embedding_model: OpenAIEmbeddingModel,
+ client: Client,
+}
+
+impl OpenAILLMClient {
+ pub fn new(llm_config: &LLMConfig, embedding_config: &EmbeddingConfig) -> Result {
+ let client = Client::builder(&llm_config.api_key)
+ .base_url(&llm_config.api_base_url)
+ .build();
+
+ // Configuration-driven model selection
+ let completion_model: Agent = client
+ .completion_model(&llm_config.model_efficient)
+ .completions_api()
+ .into_agent_builder()
+ .temperature(llm_config.temperature as f64)
+ .max_tokens(llm_config.max_tokens as u64)
+ .build();
+ }
+}
+```
+
+Configuration parameters include:
+- API keys and base URLs
+- Model selection (efficient vs. powerful)
+- Temperature and token limits
+- Embedding model specifications
+
+### 3.2 Interaction with Other Domains
+
+The LLM Integration Domain interacts with several other domains in the system:
+
+#### With Memory Management Domain:
+- Provides embedding generation for semantic search
+- Enables content analysis for memory classification
+- Supports intelligent retrieval through semantic understanding
+
+#### With Memory Optimization Domain:
+- Powers duplicate detection through similarity assessment
+- Enables quality scoring for optimization decisions
+- Supports merging recommendations based on content analysis
+
+#### With Access Interface Domain:
+- Processes natural language queries from CLI
+- Interprets search requests from HTTP API
+- Handles agent commands through MCP interface
+
+## 4. Technical Implementation Details
+
+### 4.1 Error Handling and Resilience
+
+The implementation includes comprehensive error handling strategies:
+
+```rust
+async fn extract_keywords(&self, content: &str) -> Result> {
+ let prompt = self.build_keyword_prompt(content);
+
+ match self.extract_keywords_structured(&prompt).await {
+ Ok(keyword_extraction) => {
+ debug!("Extracted {} keywords using rig extractor",
+ keyword_extraction.keywords.len());
+ Ok(keyword_extraction.keywords)
+ }
+ Err(e) => {
+ // Fallback to traditional method if extractor fails
+ debug!("Rig extractor failed, falling back: {}", e);
+
+ #[cfg(debug_assertions)]
+ tokio::time::sleep(std::time::Duration::from_secs(1)).await;
+
+ let response = self.complete(&prompt).await?;
+ let keywords = self.parse_keywords(&response);
+ Ok(keywords)
+ }
+ }
+}
+```
+
+Key resilience features:
+- **Graceful Degradation**: Falls back to traditional completion when structured extraction fails
+- **Rate Limiting**: Implements 1-second delays between batch operations
+- **Comprehensive Logging**: Detailed debug logging for troubleshooting
+- **Input Validation**: Validates inputs before processing
+
+### 4.2 Performance Considerations
+
+The implementation addresses performance through several mechanisms:
+
+1. **Batch Processing**: Efficient handling of multiple embedding requests
+2. **Caching Strategy**: While not explicitly implemented, the design supports future caching layers
+3. **Asynchronous Operations**: Fully async implementation for non-blocking operation
+4. **Resource Management**: Proper cleanup of resources and connections
+
+### 4.3 Security and Privacy
+
+The implementation includes security considerations:
+
+- **API Key Protection**: Keys are passed through secure configuration
+- **Input Sanitization**: Removes code blocks and potentially harmful content
+- **Data Minimization**: Only extracts necessary information
+- **Privacy Awareness**: Prompts emphasize not revealing model information
+
+## 5. Usage Examples
+
+### 5.1 Creating an LLM Client
+
+```rust
+let llm_client = create_llm_client(&llm_config, &embedding_config)?;
+```
+
+### 5.2 Extracting Information from Content
+
+```rust
+// Extract keywords
+let keywords = llm_client.extract_keywords("I love hiking in the mountains").await?;
+
+// Generate embeddings
+let embedding = llm_client.embed("memory content").await?;
+
+// Extract structured facts
+let facts = llm_client.extract_structured_facts(&prompt).await?;
+```
+
+### 5.3 Using in Memory Processing Workflow
+
+```rust
+// In memory creation process
+let embedding = llm_client.embed(&content).await?;
+let keywords = llm_client.extract_keywords(&content).await?;
+let entities = llm_client.extract_entities(&prompt).await?;
+```
+
+## 6. Best Practices and Recommendations
+
+### 6.1 Configuration Guidelines
+
+1. **Model Selection**: Use efficient models for high-volume operations
+2. **Temperature Settings**: Lower temperatures for more consistent extraction
+3. **Token Limits**: Set appropriate limits based on expected content length
+4. **Rate Limiting**: Configure according to your LLM provider's limits
+
+### 6.2 Performance Optimization
+
+1. **Batch Requests**: Group embedding requests when possible
+2. **Connection Pooling**: Reuse client instances across operations
+3. **Caching**: Implement caching for frequently accessed content
+4. **Monitoring**: Track API usage and costs
+
+### 6.3 Error Handling
+
+1. **Implement Retry Logic**: For transient failures
+2. **Monitor Health**: Regularly check service availability
+3. **Graceful Degradation**: Provide fallback behavior when LLM services are unavailable
+4. **Detailed Logging**: Capture sufficient context for debugging
+
+## 7. Future Enhancements
+
+Potential improvements to the LLM Integration Domain:
+
+1. **Multiple Provider Support**: Extend beyond OpenAI to include other LLM providers
+2. **Advanced Caching**: Implement intelligent caching of embeddings and extractions
+3. **Adaptive Prompting**: Dynamically adjust prompts based on content characteristics
+4. **Cost Optimization**: Implement strategies to minimize API costs
+5. **Enhanced Security**: Add encryption for sensitive content processing
+
+The LLM Integration Domain represents a sophisticated implementation of LLM-powered intelligence in a memory management system, providing the cognitive capabilities that enable advanced AI agent functionality.
\ No newline at end of file
diff --git a/litho.docs/4.Deep-Exploration/Memory Management Domain.md b/litho.docs/4.Deep-Exploration/Memory Management Domain.md
index 3fd5acd..ae2513f 100644
--- a/litho.docs/4.Deep-Exploration/Memory Management Domain.md
+++ b/litho.docs/4.Deep-Exploration/Memory Management Domain.md
@@ -1,260 +1,422 @@
-# Memory Management Domain Technical Implementation Documentation
+# Memory Management Domain Technical Documentation
-## 1. Overview and Architecture
+## 1. Overview
-The **Memory Management Domain** serves as the central orchestrator for all memory-related operations in the `cortex-mem` system, providing a comprehensive lifecycle management solution for AI agent memories. This domain implements a modular, service-oriented architecture with clear separation of concerns, enabling intelligent storage, retrieval, optimization, and analysis of persistent knowledge.
+The **Memory Management Domain** serves as the central orchestrator for all memory operations within the Cortex-Mem system, providing a comprehensive solution for AI agent memory persistence, retrieval, and optimization. This domain enables intelligent software agents to maintain context across interactions by managing structured knowledge through advanced processing pipelines that leverage large language models (LLMs) and vector-based storage.
-### Core Responsibilities
-- Orchestrate CRUD operations for memory entities
-- Manage advanced processing pipelines (extraction, classification, deduplication)
-- Coordinate LLM-driven decision making for memory updates
-- Provide semantic search capabilities with relevance ranking
-- Enable batch operations and transactional integrity
+At its core, the Memory Manager integrates multiple specialized components including vector storage, LLM services, fact extraction, importance evaluation, duplicate detection, and content classification. The architecture follows a dependency injection pattern where external services are provided at construction time, enabling flexible composition and testability while maintaining separation of concerns.
+
+### Key Characteristics
+- **Modular Design**: Clear separation between core logic and supporting processors
+- **LLM-Augmented Processing**: Intelligent metadata generation using language models
+- **Semantic Capabilities**: Vector-based similarity search with hybrid scoring
+- **Lifecycle Management**: Full CRUD operations with automated enhancement
+- **Optimization Ready**: Built-in support for deduplication and quality improvement
+
+## 2. Architecture Diagram
+
+```mermaid
+graph TD
+ A[External Services] --> B[MemoryManager]
+ B --> C[VectorStore]
+ B --> D[LLMClient]
+ B --> E[FactExtractor]
+ B --> F[MemoryUpdater]
+ B --> G[ImportanceEvaluator]
+ B --> H[DuplicateDetector]
+ B --> I[MemoryClassifier]
+ C --> J[(Persistent Storage)]
+ D --> K[Language Model API]
+ E --> L[Fact Extraction]
+ F --> M[Memory Update Logic]
+ G --> N[Importance Scoring]
+ H --> O[Deduplication]
+ I --> P[Content Classification]
+ B --> Q[Core Operations]
+ Q --> R[CRUD Operations]
+ Q --> S[Search & Retrieval]
+ Q --> T[Statistics & Health]
+```
+
+## 3. Core Components and Interfaces
+
+### 3.1 MemoryManager
+
+The `MemoryManager` struct is the primary entry point for all memory operations, coordinating interactions between various subsystems:
-### Architectural Pattern
-The domain follows a **composition pattern** where the `MemoryManager` class acts as the primary facade, integrating specialized components through dependency injection:
```rust
pub struct MemoryManager {
vector_store: Box,
llm_client: Box,
- fact_extractor: Box,
- memory_updater: Box,
- importance_evaluator: Box,
- duplicate_detector: Box,
- memory_classifier: Box
+ config: MemoryConfig,
+ fact_extractor: Box,
+ memory_updater: Box,
+ importance_evaluator: Box,
+ duplicate_detector: Box,
+ memory_classifier: Box,
}
```
-This design enables pluggable implementations while maintaining loose coupling between components.
+#### Construction Pattern
+The manager uses dependency injection to receive essential services:
+```rust
+impl MemoryManager {
+ pub fn new(
+ vector_store: Box,
+ llm_client: Box,
+ config: MemoryConfig,
+ ) -> Self { /* ... */ }
+}
+```
----
+This approach allows for flexible configuration and testing, with internal processors created from cloned references to the injected dependencies.
-## 2. Key Components and Their Interactions
+### 3.2 Key Interfaces
-### 2.1 MemoryManager - Central Orchestrator
+#### VectorStore Interface
+Provides persistent storage capabilities:
+- `insert(&self, memory: &Memory) -> Result<()>`: Store a memory record
+- `search(&self, query_vector: &[f32], filters: &Filters, limit: usize) -> Result>`: Semantic similarity search
+- `get(&self, id: &str) -> Result