A production-ready, self-hosted Model Context Protocol (MCP) server that provides persistent, intelligent memory for Claude Code and other AI assistants. Features async/await architecture, knowledge graph intelligence, smart text chunking, and enterprise-grade security. Built with Docker Compose for one-command deployment.
- π One-Command Deployment - Start the entire stack with a single script
- π 100% Self-Hosted - No external API dependencies (when using Ollama)
- π Token-Based Authentication - Secure multi-user access with PostgreSQL-backed token management
- π Multi-LLM Support - Works with Ollama, OpenAI, or Anthropic
- π― Project Isolation - Automatic memory isolation per project directory
- π Semantic Search - Vector-based search with pgvector
- β‘ 13 MCP Tools - Complete memory management + intelligence analysis
- π Dual Transport Support - Modern HTTP Stream (recommended) + legacy SSE transport
- π³ Docker Compose - Easy orchestration of all services
- π§ͺ Comprehensive Tests - Automated test suite included
- π Audit Logging - Track all authentication attempts and token usage
- π Knowledge Graphs - Link memories with typed relationships (RELATES_TO, DEPENDS_ON, SUPERSEDES, etc.)
- π Temporal Tracking - Track how knowledge evolves over time
- ποΈ Architecture Mapping - Map system components and dependencies
- π Impact Analysis - Understand cascading effects of changes
- π Decision Tracking - Record technical decisions with pros/cons/alternatives
- π― Topic Clustering - Automatically detect knowledge groups
- β Quality Scoring - Trust scores based on validations and citations
- π Intelligence Analysis - Comprehensive health reports with actionable recommendations
- βοΈ Semantic Chunking - Automatically splits large text at paragraph/sentence boundaries
- π Context Preservation - 150-character overlap between chunks maintains context continuity
- β‘ Performance Optimization - Prevents timeouts on large text inputs with 8B+ embedding models
- π·οΈ Chunk Metadata - Full tracking with chunk index, total chunks, size, and overlap indicators
- π Session Continuity - All chunks share the same
run_idfor related memory grouping - π― Transparent Operation - Small texts (<1000 chars) bypass chunking for optimal performance
ββββββββββββββββ
β Claude Code β (Your IDE with MCP client)
ββββββββ¬ββββββββ
β HTTP Stream (recommended): http://localhost:8080/mcp
β SSE (legacy): http://localhost:8080/sse
β + Token Authentication Headers
β
ββββββββββββββββ
β MCP Server β Port 8080 (FastMCP)
β (Python) β β’ 13 MCP Tools (5 core + 8 intelligence)
β β β’ Token Validation
β β β’ Dual Transport Support
ββββββββ¬ββββββββ
β HTTP REST API
β
ββββββββββββββββ
β Mem0 Server β Port 8000 (FastAPI)
β (FastAPI) β β’ 28 REST Endpoints (13 core + 15 intelligence)
β β β’ Multi-LLM Support
β β β’ Vector + Graph Storage
β β β’ Memory Intelligence System
β β β’ Async/Await Architecture with Background Tasks
ββββββββ¬ββββββββ
β
βββββ΄βββββ¬βββββββββββ¬βββββββ
β β β β
ββββββββββ ββββββ βββββββ ββββββββ
βPostgresβ βNeo4jβ βAuth β βOllamaβ
βpgvectorβ βGraphβ βTokenβ β LLM β
βVector β βIntel-β βStoreβ β β
βSearch β βligenceβ β β β β
ββββββββββ ββββββ βββββββ ββββββββ
The Mem0 server uses FastAPI's async/await architecture for optimal performance:
- Non-blocking I/O: Handles multiple requests concurrently without blocking
- Background Neo4j Sync: Memories stored immediately in PostgreSQL, then synced to Neo4j asynchronously
- Retry Logic: Automatic retry with exponential backoff (7 attempts: 1s, 2s, 4s, 8s, 16s, 32s)
- Immediate Response: API returns instantly without waiting for graph sync
- Fault Tolerance: If Neo4j sync fails, memory still accessible via PostgreSQL vector search
This architecture ensures fast response times even when processing complex graph operations.
-
Docker & Docker Compose installed
docker --version docker compose version
-
Ollama Server with models (or OpenAI/Anthropic API key)
# On your Ollama server: ollama pull qwen3:8b ollama pull qwen3-embedding:8b
# 1. Clone or copy this directory
cd /path/to/mem0-mcp
# 2. Create configuration file
cp .env.example .env
# 3. Edit .env with your Ollama server address (if needed)
nano .env # Update OLLAMA_BASE_URL
# 4. Start everything!
./scripts/start.shThat's it! The script will:
- β Start PostgreSQL with pgvector
- β Start Neo4j graph database
- β Start Mem0 REST API server
- β Start MCP server for Claude Code
- β Wait for all services to be healthy
Step 1: Run Database Migrations
./scripts/migrate-auth.shStep 2: Create Your Authentication Token
python3 scripts/mcp-token.py create \
--user-id your.email@company.com \
--name "Your Name" \
--email your.email@company.comThis will output your token and setup instructions. Copy the MEM0_TOKEN value.
Step 3: Add to Your Shell Profile
Add these lines to ~/.zshrc or ~/.bashrc:
export MEM0_TOKEN='mcp_abc123...' # Your token from step 2
export MEM0_USER_ID='your.email@company.com'Then reload:
source ~/.zshrc # or ~/.bashrcRecommended: Using Claude CLI (Easiest)
# Add mem0 server with HTTP Stream transport (recommended)
claude mcp add mem0 http://localhost:8080/mcp/ -t http \
-H "X-MCP-Token: ${MEM0_TOKEN}" \
-H "X-MCP-UserID: ${MEM0_USER_ID}"
# Verify it's configured
claude mcp listAlternative: Manual Configuration
Add this to your Claude Code MCP configuration file:
File: ~/.config/claude-code/config.json
{
"mcpServers": {
"mem0": {
"url": "http://localhost:8080/mcp/",
"transport": "http",
"headers": {
"X-MCP-Token": "your-token-here",
"X-MCP-UserID": "your.email@company.com"
}
}
}
}Legacy SSE Transport (Backward Compatibility)
# Using CLI
claude mcp add mem0 http://localhost:8080/sse/ -t http \
-H "X-MCP-Token: ${MEM0_TOKEN}" \
-H "X-MCP-UserID: ${MEM0_USER_ID}"Important:
- Always include the trailing slash in URLs:
/mcp/or/sse/(not/mcpor/sse) - HTTP Stream transport (
/mcp/) is recommended as it's the modern MCP protocol - SSE (
/sse/) is maintained for backward compatibility
Restart Claude Code and you're ready to go!
# Start the stack
./scripts/start.sh
# View logs
./scripts/logs.sh # All services
./scripts/logs.sh mem0 # Specific service
# Check health
./scripts/health.sh
# Run tests
./scripts/test.sh
# Stop the stack
./scripts/stop.sh
# Restart
./scripts/restart.sh
# Clean all data (β οΈ destructive)
./scripts/clean.shOnce connected, you can use these commands in Claude Code:
"Store this code in memory: [your code snippet]"
"Search my memories for Python functions"
"Show all my stored coding preferences"
"Delete memory with ID [id]"
"Show history of memory [id]"
- add_coding_preference - Store code snippets and implementation details
- search_coding_preferences - Semantic search through memories
- get_all_coding_preferences - Retrieve all stored memories
- delete_memory - Delete specific memory by ID
- get_memory_history - View change history
- link_memories - Create typed relationships between memories (build knowledge graphs)
- get_related_memories - Graph traversal to discover connected context
- analyze_memory_intelligence π - GAME-CHANGER: Comprehensive intelligence report with health scores, clusters, and recommendations
- create_component - Map system architecture with component nodes
- link_component_dependency - Define dependencies between components
- analyze_component_impact - Analyze cascading effects of changes
- create_decision - Track technical decisions with pros/cons/alternatives
- get_decision_rationale - Retrieve decision context and reasoning
All tools automatically use authentication credentials from your MCP configuration headers.
"Link these two memories as related"
"Show me all memories related to authentication"
"Analyze my project's knowledge graph health"
"Create a component called Database with type Infrastructure"
"What would be impacted if I change the Authentication component?"
"Record this decision: Use PostgreSQL, pros: ACID compliance, cons: complexity"
# List all tokens
python3 scripts/mcp-token.py list
# List tokens for specific user
python3 scripts/mcp-token.py list --user-id john.doe@company.com
# Create a new token
python3 scripts/mcp-token.py create \
--user-id john.doe@company.com \
--name "John Doe" \
--email john.doe@company.com
# Revoke (disable) a token
python3 scripts/mcp-token.py revoke mcp_abc123
# Re-enable a token
python3 scripts/mcp-token.py enable mcp_abc123
# Delete a token permanently
python3 scripts/mcp-token.py delete mcp_abc123
# View audit log
python3 scripts/mcp-token.py audit --days 30
# View user statistics
python3 scripts/mcp-token.py stats john.doe@company.com./tests/test_auth.shThis tests: missing headers, invalid tokens, user ID mismatches, valid authentication, token revocation, and re-enabling.
LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://192.168.1.2:11434
OLLAMA_LLM_MODEL=qwen3:8b
OLLAMA_EMBEDDING_MODEL=qwen3-embedding:8b
OLLAMA_EMBEDDING_DIMS=4096Supported Models:
- LLM: llama3, qwen3, mistral, phi3, etc.
- Embeddings: qwen3-embedding (4096d), nomic-embed-text (768d), all-minilm (384d)
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-proj-...
OPENAI_LLM_MODEL=gpt-4o
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_EMBEDDING_DIMS=1536LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
# Still need embeddings from Ollama or OpenAI:
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
OLLAMA_EMBEDDING_DIMS=768Control how memories are isolated per project:
# Auto mode (recommended) - Auto-detect project from directory
PROJECT_ID_MODE=auto
# Manual mode - Set explicitly per project
PROJECT_ID_MODE=manual
DEFAULT_USER_ID=my_project_name
# Global mode - Share all memories
PROJECT_ID_MODE=global
DEFAULT_USER_ID=shared_memoryFor faster performance, use smaller embedding models:
# Fast: 768 dimensions (enables HNSW indexing)
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
OLLAMA_EMBEDDING_DIMS=768
# Slower but more accurate: 4096 dimensions (HNSW disabled)
OLLAMA_EMBEDDING_MODEL=qwen3-embedding:8b
OLLAMA_EMBEDDING_DIMS=4096Note: pgvector's HNSW index is limited to 2000 dimensions. For larger dimensions, the system automatically disables HNSW (slower but still functional).
The MCP server automatically handles large text inputs through intelligent semantic chunking to prevent timeouts and optimize performance.
How It Works:
- Small texts (β€1000 characters): Sent directly to Mem0 API (fast path, no chunking overhead)
- Large texts (>1000 characters): Automatically chunked at semantic boundaries with context preservation
Chunking Strategy:
- Paragraph-based splitting: Text is first split at paragraph boundaries (double newlines)
- Sentence-based fallback: If paragraphs exceed 1000 characters, they're split at sentence boundaries
- Context preservation: 150-character overlap between chunks maintains semantic continuity
- Session tracking: All chunks from the same text share a single
run_idfor relationship tracking
Chunk Metadata:
Each chunk includes comprehensive metadata for traceability:
{
"chunk_index": 0, // Position in sequence (0-indexed)
"total_chunks": 5, // Total number of chunks in this text
"chunk_size": 982, // Number of characters in this chunk
"has_overlap": true // Whether this chunk includes overlap from previous chunk
}Configuration:
Chunking parameters are configurable via .env file:
# Smart Text Chunking Configuration
CHUNK_MAX_SIZE=1000 # Maximum characters per chunk
CHUNK_OVERLAP_SIZE=150 # Overlap between chunks for context continuityTo adjust chunking behavior:
- Edit
.envfile with your preferred values - Restart MCP server:
docker compose restart mcp
Benefits:
- β Prevents timeouts - No more 30-second timeout errors with large code snippets or documentation
- β Maintains context - 150-character overlap ensures semantic relationships aren't lost at boundaries
- β Transparent operation - Users don't need to manually split text; it happens automatically
- β Performance optimized - Small texts bypass chunking entirely for zero overhead
- β Full traceability - Metadata allows reconstruction and tracking of chunked memories
- β Extended timeout - MCP client timeout increased from 30s to 180s for large text processing
Implementation Details:
- Location:
mcp-server/text_chunker.py(chunking algorithm) - Integration:
mcp-server/main.pyinadd_coding_preference()function - Transport: All chunks sent sequentially via HTTP to Mem0 REST API
- Storage: Each chunk stored as separate memory with linking metadata
Example:
# User stores large code file (5000 characters)
# System automatically:
# 1. Detects text > 1000 chars
# 2. Splits into 5 semantic chunks at paragraph boundaries
# 3. Adds 150-char overlap between chunks
# 4. Sends chunks sequentially with metadata
# 5. All chunks share same run_id for session tracking
# 6. Returns success message indicating chunking occurred| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/docs |
GET | OpenAPI documentation |
/memories |
POST | Create memory |
/memories |
GET | Get all memories |
/memories/{id} |
GET | Get specific memory |
/memories/{id} |
PUT | Update memory |
/memories/{id} |
DELETE | Delete memory |
/memories/{id}/history |
GET | Get history |
/search |
POST | Semantic search |
/reset |
POST | Reset all memories |
/configure |
POST | Configure Mem0 |
| Endpoint | Method | Description |
|---|---|---|
/graph/link |
POST | Link memories with relationships |
/graph/related/{id} |
GET | Get related memories (graph traversal) |
/graph/path |
GET | Find path between memories |
/graph/evolution/{topic} |
GET | Track knowledge evolution |
/graph/superseded |
GET | Find obsolete memories |
/graph/thread/{id} |
GET | Get conversation thread |
/graph/component |
POST | Create component node |
/graph/component/dependency |
POST | Link component dependencies |
/graph/component/link-memory |
POST | Link memory to component |
/graph/impact/{name} |
GET | Analyze component impact |
/graph/decision |
POST | Create decision with pros/cons |
/graph/decision/{id} |
GET | Get decision rationale |
/graph/communities |
GET | Detect memory communities |
/graph/trust-score/{id} |
GET | Calculate trust score |
/graph/intelligence |
GET | π Comprehensive intelligence analysis |
| Endpoint | Description |
|---|---|
/mcp |
HTTP Stream endpoint (recommended) |
/sse |
SSE endpoint (legacy) |
/ |
Health check |
Access the Neo4j browser at http://localhost:7474
- Username:
neo4j - Password:
mem0graph
# Run all tests
./scripts/test.sh
# Individual test suites
./tests/test_api.sh # REST API tests
./tests/test_mcp.sh # MCP server tests
./tests/test_integration.sh # Full integration test
./tests/test_memory_intelligence_fixed.sh # Memory Intelligence integration test
./tests/test_mcp_intelligence.sh # MCP Intelligence verification
./tests/test_auth.sh # Authentication tests
./tests/test_ownership_simple.sh # Memory ownership testsDetailed documentation is available in the docs/ directory:
- QUICKSTART.md - Quick start guide with authentication setup
- AUTHENTICATION.md - Complete authentication guide
- SECURITY.md - Security features and best practices
- ARCHITECTURE.md - System design and components
- API.md - Complete API reference
- MCP_TOOLS.md - MCP tools usage guide
- CONFIGURATION.md - All configuration options
- TROUBLESHOOTING.md - Common issues and solutions
- PERFORMANCE.md - Performance optimization
The Mem0 MCP Server implements enterprise-grade security:
All memory operations validate ownership:
- β Users can only access their own memories
- β Read, update, delete, and history operations are protected
- β Automatic validation at both REST API and MCP tool levels
# User A cannot access User B's memory
curl "http://localhost:8000/memories/{memory_id}?user_id=user_b"
# Returns: 403 Forbidden - "Access denied"-
Change default passwords in
.env:POSTGRES_PASSWORD=<strong-password> NEO4J_PASSWORD=<strong-password>
-
Rotate authentication tokens regularly:
python3 scripts/mcp-token.py create --user-id user@company.com
-
Restrict network access - Don't expose ports publicly
-
Use HTTPS - Add TLS termination via reverse proxy (nginx, Traefik)
-
Monitor audit logs:
python3 scripts/mcp-token.py audit --days 7
-
Test security:
./tests/test_ownership_simple.sh ./tests/test_auth.sh
For complete security documentation, see SECURITY.md.
"Missing authentication headers"
- Ensure
MEM0_TOKENandMEM0_USER_IDare exported in your shell - Verify Claude Code config has headers section
- Restart your shell and Claude Code
"Invalid authentication token"
- Check token exists:
python3 scripts/mcp-token.py list - Verify token is not expired or disabled
- Ensure you're using the correct token value
"User ID mismatch"
- Token belongs to different user
- Check which user owns the token:
python3 scripts/mcp-token.py list - Create a new token for your user ID
"Token has been disabled"
- Token was revoked
- Re-enable:
python3 scripts/mcp-token.py enable <token> - Or create a new token
Server doesn't show in claude mcp list
- Check the URL has a trailing slash:
http://localhost:8080/mcp/(not/mcp) - Verify environment variables are set:
echo $MEM0_TOKEN $MEM0_USER_ID - Remove and re-add:
claude mcp remove mem0then add again - Check server is running:
docker compose psandcurl http://localhost:8080/
# Check logs
./scripts/logs.sh
# Check health
./scripts/health.sh
# Ensure ports are free
lsof -i :8000 # Mem0 API
lsof -i :8080 # MCP Server
lsof -i :5432 # PostgreSQL
lsof -i :7474 # Neo4j-
Use smaller embedding model:
OLLAMA_EMBEDDING_MODEL=nomic-embed-text OLLAMA_EMBEDDING_DIMS=768
-
Switch to OpenAI:
LLM_PROVIDER=openai OPENAI_API_KEY=sk-...
-
Pre-warm Ollama models - Keep them loaded in memory
-
Check Ollama connectivity:
curl http://192.168.1.2:11434/api/tags
-
Verify models are available:
ollama list
-
Check mem0 logs:
./scripts/logs.sh mem0
See TROUBLESHOOTING.md for more help.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Mem0 - Memory layer for AI applications
- Model Context Protocol - MCP specification
- FastMCP - FastMCP framework
- pgvector - Vector similarity search for Postgres
- Neo4j - Graph database
- Documentation: See the
docs/directory - Issues: Open an issue on GitHub
- Questions: Check TROUBLESHOOTING.md
Made with β€οΈ for the AI community