Event-sourced memory system for AI agents. No LLM in the write path β just reliable episode storage with semantic search.
Most AI memory systems couple write reliability to LLM availability by performing entity extraction at write time. Engram takes a different approach: store episodes reliably first, search them semantically, and defer any expensive derived structures (knowledge graphs, entity extraction) to an optional second layer.
The result: writes never fail, search is fast, and you get a single portable binary with no runtime dependencies.
- Three search modes β find memories by meaning (vector), by exact words (keyword), or both at once (hybrid)
- Graceful fallback β keyword search works even when the embedding service is unavailable; hybrid degrades gracefully
- Fast queries β DuckDB HNSW indexing for sub-100ms vector search
- Zero external APIs β all embeddings generated locally via Ollama
- Single binary β portable across Linux, macOS, and Windows
- MCP native β integrates directly with Claude Desktop, Claude Code, and Cursor
- Ollama running locally (or remotely) with an embedding model
- Go 1.25+ (only if building from source)
Download from the releases page for your platform:
| Platform | Binary |
|---|---|
| macOS (Apple Silicon) | engram-darwin-arm64 |
| macOS (Intel) | engram-darwin-amd64 |
| Linux (x86_64) | engram-linux-amd64 |
| Linux (ARM64) | engram-linux-arm64 |
| Windows | engram-windows-amd64.exe |
# macOS/Linux: make it executable
chmod +x engram-*
mv engram-* engramgit clone https://github.com/OscillateLabsLLC/engram
cd engram
# Using just (recommended β install from https://github.com/casey/just)
just setup # install deps, pull embedding model, build
# Or manually
go build -o engram ./cmd/engram/main.goollama pull nomic-embed-textStart the server:
engram serveEngram starts on port 3490 and prints the SSE endpoint URL. All MCP clients connect to this single server -- no database locking conflicts. See docs/mcp-integration.md for instructions on running as a background service on macOS, Linux, and Windows.
Configure via environment variables:
| Variable | Description | Default |
|---|---|---|
DUCKDB_PATH |
Path to DuckDB database file | ./engram.duckdb |
OLLAMA_URL |
Ollama API endpoint | http://localhost:11434 |
EMBEDDING_MODEL |
Embedding model name | nomic-embed-text |
ENGRAM_PORT |
Server port | 3490 |
ENGRAM_SERVER_URL |
Server URL (used by stdio proxy) | http://localhost:3490 |
See .env.example for a template.
Engram integrates with Claude Desktop, Claude Code, and Cursor via the Model Context Protocol (MCP).
-
Start the server (see background service docs for persistent setup):
engram serve
-
Connect your client. Most clients support SSE directly:
Cursor (
.cursor/mcp.json):{ "mcpServers": { "engram-memory": { "url": "http://localhost:3490/mcp/sse" } } }Claude Desktop (stdio proxy, for clients that require stdio):
{ "mcpServers": { "engram-memory": { "command": "/absolute/path/to/engram", "args": ["stdio"], "env": { "ENGRAM_SERVER_URL": "http://localhost:3490" } } } }
For detailed integration instructions, available MCP tools, and troubleshooting, see docs/mcp-integration.md.
# macOS/Windows
just docker-up
# Linux
just docker-up-linuxFor detailed deployment instructions including Docker Compose, Kubernetes, and production configurations, see docs/deployment.md.
engram/
βββ cmd/engram/ # Entry point (serve / stdio subcommands)
βββ internal/
β βββ api/ # HTTP + MCP SSE server
β βββ db/ # DuckDB operations + VSS
β βββ embedding/ # Ollama client
β βββ mcp/ # MCP tool definitions
β βββ models/ # Data models
β βββ proxy/ # stdio-to-SSE proxy
βββ scripts/ # Build and test scripts
βββ .github/workflows/ # CI/CD (build + release)
βββ Dockerfile # Container image
- Server-first:
engram serveowns DuckDB exclusively, exposes MCP over SSE + REST API - Thin stdio proxy:
engram stdiobridges stdin/stdout to the server for clients that require stdio (e.g., Claude Desktop) - DuckDB with VSS extension for vector similarity search (HNSW indexing)
- Ollama for local embedding generation (768-dimensional,
nomic-embed-text)
For a deeper dive into the architecture, see docs/architecture.md.
- Writes never fail (if the database is up)
- No LLM in the write path β embeddings only, and those are retryable
- Episode log is source of truth β everything else is derived
- Simple over clever β vector search covers 80% of use cases
- Portable β single binary, single database file
- MCP Integration Guide - Client setup, available tools, troubleshooting
- Deployment Guide - Docker Compose, Kubernetes, production deployment
- Architecture - Technical deep dive into system design
The project includes unit and integration tests:
# Run all tests
just test
# Run with coverage
just test-coverageSee CONTRIBUTING.md for development setup, code style, and how to submit pull requests.
MIT