Local semantic memory for AI assistants. Zero-cost, <50ms, hybrid BM25+vector search.
Works with Claude Code, Claude Desktop, Claude Chat, Codex, Cursor, ChatGPT, OpenClaw, and anything that can call HTTP or MCP.
Start here:
# 1. Clone and build
git clone git@github.com:divyekant/memories.git
cd memories
docker compose -f docker-compose.snippet.yml up -d
# 2. Verify
curl http://localhost:8900/health
# 3. Add a memory
curl -X POST http://localhost:8900/memory/add \
-H "Content-Type: application/json" \
-d '{"text": "Always use TypeScript strict mode", "source": "standards.md"}'
# 4. Search
curl -X POST http://localhost:8900/search \
-H "Content-Type: application/json" \
-d '{"query": "TypeScript config", "k": 3, "hybrid": true}'The service runs at http://localhost:8900. API docs at http://localhost:8900/docs. Web UI at http://localhost:8900/ui.
The built-in UI at /ui provides:
- Dashboard — memory stats, extraction metrics, server info
- Memories — browse, search, filter, and manage memories with list+detail or grid view
- Extractions — extraction job stats and token usage
- API Keys — configure authentication
- Settings — provider config, server info, theme toggle (dark/light/system), export and maintenance
No build step — vanilla JS + CSS served directly from webui/.
The memories CLI provides full access to the API from your terminal.
pip install -e .
# Or if using the Docker image, the CLI is included# Search
memories search "TypeScript config"
# Add a memory
memories add "Always use strict mode" --source standards
# List memories
memories list --source standards
# Check novelty before adding
memories is-novel "TypeScript strict mode"
# Batch operations
memories batch add memories.jsonl
# Admin
memories admin stats
memories admin health
# Backups
memories backup create
memories backup list
# Full help
memories --help# Export all memories
memories export -o backup.jsonl
# Export filtered by source
memories export --source "claude-code/" -o project.jsonl
# Export with date range
memories export --source "proj/" --since 2026-01-01 -o recent.jsonl
# Import (clean migration)
memories import backup.jsonl
# Import with smart dedup
memories import backup.jsonl --strategy smart
# Import with source remapping
memories import backup.jsonl --source-remap "old/=new/"The CLI auto-detects when piped and outputs JSON:
# JSON output for agents (automatic when piped)
memories search "auth" | jq '.data.results[0].text'
# Force JSON in any context
memories --json search "auth"
# Force human-readable when piped
memories --pretty list# Set server URL
memories config set url http://localhost:8900
# Set API key
memories config set api_key your-key-here
# View resolved config
memories config showConfig resolution: CLI flags > ~/.config/memories/config.json > env vars > defaults.
AI Client (Claude, Codex, Cursor, ChatGPT, OpenClaw)
|
|-- MCP protocol (Claude Code / Desktop / Codex / Cursor)
|-- REST API (everything else)
v
MCP Server (mcp-server/index.js)
|
v
Memories Service (Docker :8900)
|-- FastAPI REST API
|-- Hybrid Search (Memories vector + BM25 keyword, RRF fusion)
|-- Markdown-aware chunking
|-- Auto-backups
v
Persistent Storage (data/)
|-- vector_index.bin (Memories vector index snapshot)
|-- metadata.json (memory text + metadata)
|-- backups/ (auto, keeps last 10)
Detailed docs:
The MCP server gives Claude Code native memory_search, memory_add, memory_extract, memory_delete, memory_delete_batch, memory_delete_by_source, memory_count, memory_list, memory_stats, and memory_is_novel tools.
Setup:
- Install the MCP server dependencies:
cd memories/mcp-server
npm install- Add to
~/.claude/settings.json:
{
"mcpServers": {
"memories": {
"command": "node",
"args": ["/path/to/memories/mcp-server/index.js"],
"env": {
"MEMORIES_URL": "http://localhost:8900",
"MEMORIES_API_KEY": "your-api-key-here"
}
}
}
}-
Restart Claude Code. The tools are now available in every project.
-
(Optional) Install the Memories skill for disciplined memory capture and proactive recall:
mkdir -p ~/.claude/skills/memories
ln -s /path/to/memories/skills/memories ~/.claude/skills/memoriesThe skill teaches the assistant three responsibilities: when to search (proactive recall), when and how to store (hybrid memory_add + memory_extract), and when to maintain (updates, deletes, cleanup via AUDN). It adds ~11% token overhead but improves memory discipline by ~43% in eval benchmarks.
Usage (Claude Code will call these automatically when relevant):
- "Search my memory for authentication patterns"
- "Remember that we decided to use Prisma for the ORM"
- "Check if this pattern is already in memory before adding it"
- "Show me all memories from the bug-fixes source"
For a single project only, create .mcp.json in the project root instead of editing settings.json.
Same MCP server, different config file.
Setup:
- Install dependencies (same as above):
cd memories/mcp-server
npm install- Add to
~/Library/Application Support/Claude/claude_desktop_config.json(macOS) or%APPDATA%\Claude\claude_desktop_config.json(Windows):
{
"mcpServers": {
"memories": {
"command": "node",
"args": ["/path/to/memories/mcp-server/index.js"],
"env": {
"MEMORIES_URL": "http://localhost:8900",
"MEMORIES_API_KEY": "your-api-key-here"
}
}
}
}- Restart the Claude Desktop app. Memory tools appear in chat and cowork mode.
Claude Chat on the web does not support MCP directly. Two options:
Option A: Remote MCP via Cloudflare Tunnel (recommended)
If you expose the Memories service via a tunnel (e.g., memory.yourdomain.com), you can use Claude's remote MCP connector feature to connect to it. See the Remote Access section below.
Option B: Manual curl in prompts
Paste curl commands in your messages and ask Claude to interpret the results:
Search my memory service for React patterns:
curl -X POST https://memory.yourdomain.com/search \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_KEY" \
-d '{"query": "React patterns", "k": 5, "hybrid": true}'
Codex supports MCP natively via ~/.codex/config.toml.
Setup:
- Install dependencies:
cd memories/mcp-server
npm install- Add to
~/.codex/config.toml:
[mcp_servers.memories]
command = "node"
args = ["/path/to/memories/mcp-server/index.js"]
[mcp_servers.memories.env]
MEMORIES_URL = "http://localhost:8900"
MEMORIES_API_KEY = "your-api-key-here"If your API key is prefix-scoped and does not allow codex/*, set hook source overrides in ~/.config/memories/env:
MEMORIES_SOURCE_PREFIX="your-authorized-prefix"
# or exact source:
# MEMORIES_SOURCE="your-authorized-prefix/your-project"- Restart Codex. The
memory_search,memory_add,memory_extract,memory_delete,memory_delete_by_source,memory_count,memory_list,memory_stats,memory_is_novel, and other tools will be available.
Automatic memory layer for Codex:
cd memories/mcp-server
npm install
cd ..
./integrations/claude-code/install.sh --codexThis configures:
- MCP server registration in
~/.codex/config.toml notifyhook script at~/.codex/hooks/memory/memory-codex-notify.shfor after-turn extraction- default
developer_instructions(if not already set) to biasmemory_searchusage on each turn - hook env loading from
~/.config/memories/env(orMEMORIES_ENV_FILE) forMEMORIES_URL,MEMORIES_API_KEY, and optional source overrides (MEMORIES_SOURCE_PREFIX,MEMORIES_SOURCE)
The installer requires jq, curl, and a running Memories service (/health must respond).
If ~/.codex/config.toml already has a notify = [...] entry, the installer will not overwrite it —
merge the Memories notify script into that array manually.
For scoped API keys, set MEMORIES_SOURCE_PREFIX (or MEMORIES_SOURCE) so hook writes stay inside authorized prefixes.
Codex currently exposes an after-turn notify hook, not Claude's 5-event hook surface.
Usage (Codex will discover the tools automatically):
- "Search memory for how we handle error logging"
- "Store this architecture decision in memory"
- "List all memories from the project-setup source"
Cursor supports MCP with the same server.
Setup:
- Install dependencies:
cd memories/mcp-server
npm install- Add to Cursor MCP config:
- Global:
~/.cursor/mcp.json - Project:
.cursor/mcp.json
{
"mcpServers": {
"memories": {
"command": "node",
"args": ["/path/to/memories/mcp-server/index.js"],
"env": {
"MEMORIES_URL": "http://localhost:8900",
"MEMORIES_API_KEY": "your-api-key-here"
}
}
}
}- Restart Cursor.
Cursor also supports the full hook lifecycle via its "Third-party skills" feature. Run ./integrations/claude-code/install.sh --cursor to install hooks alongside the MCP config.
ChatGPT uses Custom Actions (OpenAPI schema) rather than MCP. This requires exposing the Memories service over the internet.
Prerequisites: Memories service accessible via HTTPS (see Remote Access).
Setup:
-
Enable API key auth on the Memories service (set
API_KEYenv var in docker-compose). -
In ChatGPT, go to Explore GPTs > Create a GPT > Configure > Actions.
-
Import this OpenAPI schema (replace
memory.yourdomain.comwith your URL):
openapi: 3.0.0
info:
title: Memories
version: 2.0.0
description: Semantic memory search and storage
servers:
- url: https://memory.yourdomain.com
paths:
/search:
post:
operationId: searchMemory
summary: Search memories by semantic similarity
requestBody:
required: true
content:
application/json:
schema:
type: object
required: [query]
properties:
query:
type: string
description: Natural language search query
k:
type: integer
default: 5
description: Number of results
hybrid:
type: boolean
default: true
description: Use hybrid BM25+vector search
responses:
'200':
description: Search results
/memory/add:
post:
operationId: addMemory
summary: Store a new memory
requestBody:
required: true
content:
application/json:
schema:
type: object
required: [text, source]
properties:
text:
type: string
description: Memory content
source:
type: string
description: Source identifier
deduplicate:
type: boolean
default: true
responses:
'200':
description: Memory added
/memory/is-novel:
post:
operationId: isNovel
summary: Check if text is already known
requestBody:
required: true
content:
application/json:
schema:
type: object
required: [text]
properties:
text:
type: string
threshold:
type: number
default: 0.88
responses:
'200':
description: Novelty check result
/memories:
get:
operationId: listMemories
summary: Browse stored memories with pagination
parameters:
- name: offset
in: query
schema:
type: integer
default: 0
- name: limit
in: query
schema:
type: integer
default: 20
maximum: 5000
- name: source
in: query
description: Source prefix filter
schema:
type: string
responses:
'200':
description: List of memories
delete:
operationId: deleteMemoriesByPrefix
summary: Bulk delete all memories matching a source prefix
parameters:
- name: source
in: query
required: true
description: Source prefix to match
schema:
type: string
responses:
'200':
description: Delete count
/memories/count:
get:
operationId: countMemories
summary: Count memories optionally filtered by source prefix
parameters:
- name: source
in: query
description: Source prefix filter
schema:
type: string
responses:
'200':
description: Memory count
/stats:
get:
operationId: getStats
summary: Memory index statistics
responses:
'200':
description: Index stats-
Under Authentication, choose API Key with header name
X-API-Key. -
Add instructions to the GPT system prompt:
You have access to a persistent memory system. Use it to:
- Search for relevant context before answering questions (searchMemory)
- Store important decisions, patterns, and learnings (addMemory)
- Check if something is already known before adding (isNovel)
- Browse what's stored (listMemories)
Always search memory at the start of conversations to load context.
OpenClaw uses a Skill (SKILL.md) with shell helper functions that call the REST API directly.
Setup:
- Create the skill directory and copy the skill file:
mkdir -p ~/.openclaw/skills/memories
cp integrations/openclaw-skill.md ~/.openclaw/skills/memories/SKILL.mdOr see the full SKILL.md in this repo at integrations/openclaw-skill.md.
- Set the API key in your shell profile (
~/.zshrcor~/.bashrc):
export MEMORIES_API_KEY="your-api-key-here"The SKILL.md reads $MEMORIES_API_KEY from the environment — the key is never stored in the skill file itself.
Key commands available to OpenClaw agents:
memory_search_memories "query" [k] [threshold] [hybrid]
memory_add_memories "text" "source" [deduplicate]
memory_is_novel "text" [threshold]
memory_delete_memories <id>
memory_delete_source_memories "pattern"
memory_delete_by_prefix "source_prefix"
memory_count_memories [source_prefix]
memory_list_memories [offset] [limit] [source]
memory_rebuild_index
memory_dedup_memories [dry_run] [threshold]
memory_stats
memory_health
memory_backup [prefix]
memory_restore "backup_name"All functions use jq for safe JSON construction and read auth from $MEMORIES_API_KEY env var (no hardcoded secrets).
To use Memories from anywhere (Claude Chat web, ChatGPT, mobile, other machines), expose it via a Cloudflare Tunnel or similar.
- Enable API key auth in your docker-compose:
environment:
- API_KEY=your-secret-key-hereRebuild and restart: docker compose build memories && docker compose up -d memories
- Add to your Cloudflare tunnel config (e.g., in
~/.cloudflared/config.yml):
ingress:
- hostname: memory.yourdomain.com
service: http://localhost:8900- Update MCP server env to use the remote URL:
{
"env": {
"MEMORIES_URL": "https://memory.yourdomain.com",
"MEMORIES_API_KEY": "your-secret-key-here"
}
}Now every client — Claude Code on your laptop, Cursor, Claude Desktop on your phone, ChatGPT, OpenClaw — all hit the same memory store running on your Mac mini.
Memories supports multiple API keys with role-based access control:
- Three tiers:
read-only(search/list),read-write(search/list + add/delete),admin(full access + key management) - Prefix scoping: keys can be restricted to specific source prefixes for tenant isolation
- Key management: create, list, update, and revoke keys via
POST/GET/PATCH/DELETE /api/keysor the Web UI (admin-only) - Backward compatible: the existing
API_KEYenv var still works as an implicit admin key
See the multi-auth design doc for details.
All endpoints accept/return JSON. Auth via X-API-Key header.
POST /search
{"query": "...", "k": 5, "hybrid": true, "threshold": 0.3, "vector_weight": 0.7, "source_prefix": "team/project/"}
POST /search/batch
{"queries": [{"query": "...", "k": 5}, {"query": "...", "hybrid": true}]}
POST /memory/add
{"text": "...", "source": "file.md", "deduplicate": true}
POST /memory/add-batch
{"memories": [{"text": "...", "source": "..."}, ...], "deduplicate": true}
DELETE /memory/{id}
DELETE /memories?source=<prefix> # Bulk delete by source prefix; returns {"count": N}
POST /memory/delete-batch {"ids": [1, 2, 3]}
POST /memory/delete-by-source {"source_pattern": "credentials"}
POST /memory/delete-by-prefix {"source_prefix": "team/project/"}
GET /memory/{id}
POST /memory/get-batch {"ids": [1, 2, 3]}
POST /memory/upsert
{"text":"...", "source":"team/project/file", "key":"entity-1", "metadata": {"owner":"team"}}
POST /memory/upsert-batch
{"memories":[{"text":"...", "source":"...", "key":"..."}]}
PATCH /memory/{id}
{"text":"optional", "source":"optional", "metadata_patch":{"tag":"v2"}}
POST /memory/is-novel
{"text": "...", "threshold": 0.88}
GET /memories?offset=0&limit=20&source=filter # limit up to 5000; source uses prefix matching
GET /memories/count?source=<prefix> # returns {"count": N}
POST /memory/deduplicate
{"threshold": 0.90, "dry_run": true}
POST /index/build {"sources": ["file1.md", "file2.md"]}
GET /stats
GET /health
GET /health/ready
GET /metrics
POST /maintenance/embedder/reload
GET /backups
POST /backup?prefix=manual
POST /restore {"backup_name": "manual_20260213_120000"}
POST /memory/extract {"messages": "...", "source": "proj", "context": "stop"} # 202 queued
GET /memory/extract/{job_id}
GET /extract/status
Full OpenAPI schema at http://localhost:8900/docs.
POST /memory/compare(pairwise conflict scoring for concurrent agent writes)POST /memory/resolve-conflicts(policy-driven merge: latest/manual/model)POST /memory/lock+DELETE /memory/lock/{key}(explicit lock reservation APIs)POST /memory/events+GET /memory/events/stream(change feed for agent synchronization)POST /search/stream(progressive search responses for very large corpora)POST /memory/ttl(time-bound memories with auto-expiry)
When connected via MCP (Claude Code, Claude Desktop, Codex, Cursor), these tools are available:
| Tool | Description |
|---|---|
memory_search |
Hybrid search (BM25 + vector). Default mode. |
memory_add |
Store a memory with auto-dedup. |
memory_extract |
LLM-based extraction with AUDN (Add/Update/Delete/Noop) from conversation text. |
memory_delete |
Delete by ID. |
memory_delete_batch |
Delete multiple IDs in one operation. |
memory_delete_by_source |
Bulk delete all memories matching a source prefix. |
memory_count |
Count memories, optionally filtered by source prefix. |
memory_list |
Browse with pagination and source prefix filter. |
memory_stats |
Index stats (count, model, last updated). |
memory_is_novel |
Check if text is already known. |
| Variable | Default | Description |
|---|---|---|
DATA_DIR |
/data |
Persistent storage path |
WORKSPACE_DIR |
/workspace |
Read-only workspace for index rebuilds |
API_KEY |
(empty) | API key for auth. Empty = no auth. |
EMBED_PROVIDER |
onnx |
Embedding provider: onnx (local) or openai (BYOK) |
EMBED_MODEL |
(unset) | Provider-specific embedding model override |
MODEL_NAME |
all-MiniLM-L6-v2 |
Default ONNX model used when EMBED_PROVIDER=onnx and EMBED_MODEL is unset |
MODEL_CACHE_DIR |
(unset; Docker image sets /data/model-cache) |
Optional writable cache path for downloaded model files |
PRELOADED_MODEL_CACHE_DIR |
(unset; Docker image sets /opt/model-cache) |
Optional read-only cache to seed MODEL_CACHE_DIR when empty |
MAX_BACKUPS |
10 |
Number of backups to keep |
MAX_EXTRACT_MESSAGE_CHARS |
120000 |
Max characters accepted by /memory/extract |
EXTRACT_MAX_INFLIGHT |
2 |
Max concurrent extraction jobs |
MEMORY_TRIM_ENABLED |
true |
Run post-extract GC/allocator trim |
MEMORY_TRIM_COOLDOWN_SEC |
15 |
Minimum seconds between trim attempts |
MEMORY_TRIM_PERIODIC_SEC |
5 |
Periodic trim probe interval (seconds). Set 0 to disable background trim loop. |
EMBEDDER_AUTO_RELOAD_ENABLED |
false |
Enable periodic auto-reload of in-process embedder runtime |
EMBEDDER_AUTO_RELOAD_RSS_KB_THRESHOLD |
1200000 |
RSS threshold (KB) required before auto-reload decisions |
EMBEDDER_AUTO_RELOAD_CHECK_SEC |
15 |
Seconds between auto-reload checks |
EMBEDDER_AUTO_RELOAD_HIGH_STREAK |
3 |
Consecutive high-RSS checks required before trigger |
EMBEDDER_AUTO_RELOAD_MIN_INTERVAL_SEC |
900 |
Cooldown between reload attempts |
EMBEDDER_AUTO_RELOAD_WINDOW_SEC |
3600 |
Rolling window size for reload cap |
EMBEDDER_AUTO_RELOAD_MAX_PER_WINDOW |
2 |
Max reloads allowed per rolling window |
EMBEDDER_AUTO_RELOAD_MAX_ACTIVE_REQUESTS |
2 |
Skip reload when active HTTP requests exceed this |
EMBEDDER_AUTO_RELOAD_MAX_QUEUE_DEPTH |
0 |
Skip reload when extract queue depth exceeds this |
METRICS_LATENCY_SAMPLES |
200 |
Per-route latency sample window for /metrics percentiles |
METRICS_TREND_SAMPLES |
120 |
Memory trend sample window exposed by /metrics |
PORT |
8000 |
Internal service port |
Default compose files now include:
mem_limit: ${MEMORIES_MEM_LIMIT:-3g}to bound container memory growthMALLOC_ARENA_MAX=2to reduce glibc arena fragmentation in multithreaded workloadsMALLOC_TRIM_THRESHOLD_=131072andMALLOC_MMAP_THRESHOLD_=131072to encourage earlier allocator release- extraction env passthrough (
EXTRACT_PROVIDER,EXTRACT_MODEL, provider keys/URL) so deploys keep extraction enabled when set in shell or.env - embedder auto-reload env passthrough with anti-loop defaults (
EMBEDDER_AUTO_RELOAD_*)
| Variable | Default | Description |
|---|---|---|
MEMORIES_URL |
http://localhost:8900 |
Memories service URL |
MEMORIES_API_KEY |
(empty) | API key if auth is enabled |
Memories supports automatic retrieval/extraction, with client-specific behavior:
- Claude Code: full 5-hook lifecycle (session start, each prompt, stop, pre-compact, session end)
- Cursor: same 5-hook lifecycle via Third-party skills (loads from
~/.claude/settings.json) - Codex: native
notifyhook after each completed turn + MCP/developer instructions for retrieval - OpenClaw: skill-driven retrieval/extraction flow
| Event | Hook | What happens |
|---|---|---|
| Session start | memory-recall.sh |
Loads project-specific memories into context |
| Every prompt | memory-query.sh |
Retrieves memories relevant to the question |
| After response | memory-extract.sh |
Extracts facts and stores via AUDN pipeline |
| Before compaction | memory-flush.sh |
Aggressive extraction before context loss |
| Session end | memory-commit.sh |
Final extraction pass |
Cursor compatibility note: Cursor sends workspace_roots[] (not cwd) and transcript_path (not inline messages) in hook payloads. The hook scripts handle both formats automatically — no separate configuration needed.
| Event | Mechanism | What happens |
|---|---|---|
| After each completed turn | notify -> memory-codex-notify.sh |
Sends user+assistant exchange to /memory/extract asynchronously (loads hook env file, handles snake/camel/kebab payload variants, supports transcript fallback and source overrides) |
| On new turns | MCP tools + developer instructions | Encourages focused memory_search before implementation-heavy responses |
Codex does not currently expose the Claude-style SessionStart/UserPromptSubmit/PreCompact/SessionEnd hook callbacks in config.toml.
Prerequisites:
jqandcurlinstalled (required by installer)- running Memories service (
curl -s http://localhost:8900/health | jq .) - if installing Codex integration, MCP deps installed:
cd memories/mcp-server
npm installOne-command auto-detect installer (recommended):
./integrations/claude-code/install.sh --autoThis detects and configures any available targets on your machine:
- Claude Code hooks (
~/.claude/settings.json) - Codex native config (
~/.codex/config.toml) - OpenClaw skill (
~/.openclaw/skills/memories/SKILL.md)
Cursor is supported via manual MCP config (~/.cursor/mcp.json or .cursor/mcp.json).
The installer writes runtime config to:
~/.config/memories/envfor hook vars (MEMORIES_URL, optionalMEMORIES_API_KEY, optionalMEMORIES_SOURCE_PREFIX/MEMORIES_SOURCEfor Codex notify source control)- repo
.envfor extraction vars (EXTRACT_PROVIDER, provider keys/URL)
Target only Claude, Cursor, or Codex:
./integrations/claude-code/install.sh --claude
./integrations/claude-code/install.sh --cursor
./integrations/claude-code/install.sh --codexTarget only OpenClaw:
./integrations/claude-code/install.sh --openclawLLM-assisted setup: Feed integrations/QUICKSTART-LLM.md to your AI assistant and it will configure everything automatically.
| Provider | Cost | AUDN | Speed |
|---|---|---|---|
| Anthropic (recommended) | ~$0.001/turn | Full (Add/Update/Delete/Noop) | ~1-2s |
| OpenAI | ~$0.001/turn | Full | ~1-2s |
| ChatGPT Subscription | Free (uses your subscription) | Full | ~1-2s |
| Ollama | Free | Full | ~5s |
| Skip | Free | None | N/A |
Extraction is optional. Without it, retrieval still works.
By default, automatic write hooks do not store new memories when extraction is disabled.
If you want a degraded automatic-write mode, set EXTRACT_FALLBACK_ADD=true to enable a strict
heuristic + novelty-check fallback that writes at most a small number of high-confidence facts
when extraction is disabled or the configured provider fails at runtime (for example rate limits/timeouts).
AUDN is the memory decision loop:
ADD: store a genuinely new factUPDATE: refine an existing memory that is close but outdated/incompleteDELETE: remove a stale/conflicting memoryNOOP: ignore non-useful or duplicate facts
Why it matters:
- cleaner memory store over time (less duplicate/stale data)
- better retrieval quality in later sessions
- less "memory drift" when decisions change
- Anthropic/OpenAI extraction: small usage cost (typically around ~$0.001/turn), full AUDN quality.
- ChatGPT Subscription extraction: no additional API cost (uses your existing subscription), full AUDN quality.
- Ollama extraction: no API cost, full AUDN quality (with JSON format constraint).
- Retrieval only (
EXTRACT_PROVIDERunset): no extraction model cost. - Optional fallback writes (
EXTRACT_FALLBACK_ADD=true): add-only, heuristic extraction path (no AUDN update/delete) used when extraction is disabled or provider calls fail at runtime.
Use these to keep extraction spend bounded:
MAX_EXTRACT_MESSAGE_CHARS: hard cap on transcript size per requestEXTRACT_MAX_FACTS: limits facts considered from each extractionEXTRACT_MAX_FACT_CHARS: caps per-fact payload sizeEXTRACT_SIMILAR_TEXT_CHARSandEXTRACT_SIMILAR_PER_FACT: limit context passed into AUDN
POST /memory/extract is async-first. It enqueues work and returns 202 with a job_id.
Poll GET /memory/extract/{job_id} for queued, running, completed, or failed.
If the queue is full, the API returns 429 with a Retry-After header.
When extraction is disabled and EXTRACT_FALLBACK_ADD=true, /memory/extract runs an immediate
fallback add path and still returns a job object. When extraction is configured but fails at runtime,
the queued worker also falls back to add-only mode when EXTRACT_FALLBACK_ADD=true.
The Dockerfile publishes two runtime targets:
core(default): search/add/list endpoints, no Anthropic/OpenAI SDKsextract: includes Anthropic/OpenAI SDKs for/memory/extract
Build both images directly:
docker build --target core -t memories:core .
docker build --target extract -t memories:extract .Use compose with either target:
# Default (core target)
docker compose up -d --build memories
# Extraction-ready target
MEMORIES_IMAGE_TARGET=extract docker compose up -d --build memoriesBy default, images do not bake model weights. On first run, the service downloads them into
MODEL_CACHE_DIR (/data/model-cache in Docker), so later restarts reuse the volume cache.
If you want a fully preloaded image (faster first boot, larger pull), set PRELOAD_MODEL=true:
docker build --target core --build-arg PRELOAD_MODEL=true -t memories:core .
docker build --target extract --build-arg PRELOAD_MODEL=true -t memories:extract .Ollama uses HTTP directly and does not need the extra SDKs, so core is enough for Ollama extraction.
| Variable | Default | Description |
|---|---|---|
EXTRACT_PROVIDER |
(none) | anthropic, openai, chatgpt-subscription, ollama, or empty to disable |
EXTRACT_MODEL |
(per provider) | Model override |
ANTHROPIC_API_KEY |
(none) | Required for Anthropic provider (standard key or sk-ant-oat01- OAuth token) |
OPENAI_API_KEY |
(none) | Required for OpenAI provider |
CHATGPT_REFRESH_TOKEN |
(none) | Required for ChatGPT Subscription provider (from python -m memories auth chatgpt) |
CHATGPT_CLIENT_ID |
(none) | Required for ChatGPT Subscription provider |
OLLAMA_URL |
http://host.docker.internal:11434 |
Ollama server URL (on Linux, use http://localhost:11434) |
EXTRACT_FALLBACK_ADD |
false |
Enable add-only fallback writes when extraction is disabled or provider calls fail at runtime |
EXTRACT_FALLBACK_MAX_FACTS |
1 |
Max fallback facts to store per extract request |
EXTRACT_FALLBACK_MIN_FACT_CHARS |
24 |
Minimum candidate fact length for fallback |
EXTRACT_FALLBACK_MAX_FACT_CHARS |
280 |
Maximum candidate fact length for fallback |
EXTRACT_FALLBACK_NOVELTY_THRESHOLD |
0.88 |
Novelty threshold used by fallback add mode |
EXTRACT_QUEUE_MAX |
EXTRACT_MAX_INFLIGHT * 20 |
Maximum queued extraction jobs before backpressure (429) |
EXTRACT_JOB_RETENTION_SEC |
300 |
How long completed/failed extraction jobs stay queryable |
EXTRACT_JOBS_MAX |
200 |
Hard cap on stored extraction job records (finished jobs evicted first) |
EXTRACT_MAX_FACTS |
30 |
Maximum facts kept from a single extraction |
EXTRACT_MAX_FACT_CHARS |
500 |
Max length per extracted fact |
EXTRACT_SIMILAR_TEXT_CHARS |
280 |
Max similar-memory text length passed into AUDN |
EXTRACT_SIMILAR_PER_FACT |
5 |
Similar memories included per fact during AUDN |
Extraction can create short-lived allocation spikes (large transcripts, large LLM JSON payloads, concurrent requests).
Mitigations built in:
/memory/extractrequest size limit (MAX_EXTRACT_MESSAGE_CHARS)- bounded in-flight extraction (
EXTRACT_MAX_INFLIGHT) - post-extract + periodic memory reclamation (
MEMORY_TRIM_ENABLED,MEMORY_TRIM_COOLDOWN_SEC,MEMORY_TRIM_PERIODIC_SEC) - optional auto-reload controller for the embedder runtime (
EMBEDDER_AUTO_RELOAD_*) - bounded AUDN payload sizes (
EXTRACT_MAX_FACTS,EXTRACT_MAX_FACT_CHARS,EXTRACT_SIMILAR_TEXT_CHARS)
Observability:
/metricsincludesembedder_reload.autoandembedder_reload.manualcounters/state- manual reload endpoint:
POST /maintenance/embedder/reload
Reference benchmark: docs/benchmarks/2026-02-17-memory-reclamation.md
./integrations/claude-code/install.sh --uninstallThen optionally remove MEMORIES_* from ~/.config/memories/env and EXTRACT_* from repo .env.
Memories has three layers of backup protection:
The service automatically saves a snapshot after every write operation. The 10 most recent auto-backups are kept in the Docker volume under data/backups/.
# List backups
curl -H "X-API-Key: $MEMORIES_API_KEY" http://localhost:8900/backups
# Create manual backup
curl -X POST -H "X-API-Key: $MEMORIES_API_KEY" http://localhost:8900/backup?prefix=manual
# Restore from backup
curl -X POST -H "X-API-Key: $MEMORIES_API_KEY" http://localhost:8900/restore \
-H "Content-Type: application/json" \
-d '{"backup_name": "manual_20260214_120000"}'A cron job creates timestamped copies of the Memories index every 30 minutes. Snapshots are stored outside the Docker volume (default: ~/backups/memories/) with 30-day retention.
# Install the cron job
./scripts/install-cron.sh install
# Check status
./scripts/install-cron.sh status
# Run a backup manually
./scripts/backup.sh
# Dry run (no changes)
./scripts/backup.sh --testEnvironment variables (all optional, sensible defaults):
| Variable | Default | Description |
|---|---|---|
MEMORIES_URL |
http://localhost:8900 |
Service URL |
MEMORIES_API_KEY |
(empty) | API key if auth is enabled |
MEMORIES_DATA_DIR |
./data (relative to repo) |
Docker volume data path |
BACKUP_DIR |
~/backups/memories |
Where to store snapshots |
RETENTION_DAYS |
30 |
Days to keep local snapshots |
If you set GDRIVE_ACCOUNT, each backup automatically uploads the latest snapshot to Google Drive as a compressed tar.gz. Uploads are throttled to once per hour. 7-day retention on Drive.
Prerequisites:
- Install gog CLI
- Authenticate:
gog auth add your-email@gmail.com --services drive - Set env var in your shell profile:
export GDRIVE_ACCOUNT="your-email@gmail.com"Environment variables (all optional):
| Variable | Default | Description |
|---|---|---|
GDRIVE_ACCOUNT |
(none) | Google account email. Required to enable GDrive. |
GDRIVE_FOLDER_NAME |
memories-backups |
Folder name on Drive |
UPLOAD_INTERVAL_MIN |
55 |
Minimum minutes between uploads |
GDRIVE_RETENTION_DAYS |
7 |
Days to keep backups on Drive |
Manual usage:
# Setup (create Drive folder + test auth)
./scripts/backup-gdrive.sh --setup
# Upload now (skip throttle)
./scripts/backup-gdrive.sh --force
# Dry run
./scripts/backup-gdrive.sh --test
# Only clean up old backups on Drive
./scripts/backup-gdrive.sh --cleanupFor S3/MinIO/R2 backends, build with cloud sync enabled:
ENABLE_CLOUD_SYNC=true docker compose up -d --build memoriesSee CLOUD_SYNC_README.md for configuration details.
memories/
app.py # FastAPI REST API
memory_engine.py # Memories engine (search, chunking, BM25, backups)
onnx_embedder.py # ONNX Runtime embedder (replaces PyTorch)
llm_provider.py # LLM provider abstraction (Anthropic/OpenAI/ChatGPT Subscription/Ollama)
llm_extract.py # Extraction pipeline with AUDN
chatgpt_oauth.py # ChatGPT OAuth2+PKCE token exchange helpers
key_store.py # SQLite-backed API key store (SHA-256 hashing)
auth_context.py # Request-scoped role and prefix enforcement
memories_auth.py # CLI auth tool (python -m memories auth chatgpt/status)
__main__.py # Entry point for python -m memories
Dockerfile # Multi-stage Docker build (core/extract targets)
pyproject.toml # Python dependencies (uv)
uv.lock # Locked dependency resolutions
docker-compose.snippet.yml
docs/
api.md # Complete REST API reference
architecture.md # System architecture and runtime flows
decisions.md # Key design decisions and tradeoffs
benchmarks/ # Reproducible benchmark notes
mcp-server/
index.js # MCP server (wraps REST API as tools)
package.json
scripts/
backup.sh # Cron backup (local snapshots)
backup-gdrive.sh # Optional Google Drive upload
install-cron.sh # Cron job installer
webui/
index.html # Memory browser entry page (/ui)
styles.css # UI styling
app.js # Browser-side pagination/filter logic
integrations/
claude-code/
install.sh # Auto-detect installer (Claude/Codex/Cursor/OpenClaw)
hooks/ # Claude Code 5-hook scripts + hooks.json
codex/
memory-codex-notify.sh # Codex notify hook script (after-turn extraction)
claude-code.md # Claude Code guide
openclaw-skill.md # OpenClaw SKILL.md
QUICKSTART-LLM.md # LLM-friendly setup guide
tests/
test_memory_engine.py # Memory engine tests
test_llm_provider.py # LLM provider tests (incl. ChatGPT Subscription)
test_chatgpt_oauth.py # OAuth PKCE + token exchange tests
test_memories_auth.py # CLI auth tool tests
test_llm_extract.py # Extraction pipeline tests
test_extract_api.py # API endpoint tests
test_web_ui.py # Web UI route/static tests
skills/
memories/
SKILL.md # Claude Code skill for memory discipline
eval/
__main__.py # CLI entrypoint (python -m eval)
models.py # Pydantic data models (Scenario, EvalReport, etc.)
loader.py # YAML scenario loader
scorer.py # Deterministic rubric scorer
judge.py # LLM-as-judge for non-deterministic rubrics
memories_client.py # Memories API client for eval runner
cc_executor.py # Claude Code executor with project isolation
runner.py # Orchestrates with/without-memory runs
reporter.py # JSON reporter and summary formatter
config.yaml # Default eval configuration
scenarios/ # YAML test scenarios by category
results/ # JSON eval reports (.gitignored)
tests/ # 82 tests covering all eval components
data/ # .gitignored — persistent index + backups
Memories includes a built-in eval harness that measures how much Memories improves AI assistant performance. It runs controlled A/B tests: each scenario executes via Claude Code (claude -p) both with and without Memories, then scores the outputs against deterministic rubrics.
# Run all scenarios (via wrapper script)
./eval/run.sh
# Or directly via Python
python -m eval
# Run a specific category
python -m eval --category coding
# Run a single scenario
python -m eval --scenario coding-001 -v| Category | With Memory | Without Memory | Delta |
|---|---|---|---|
| Coding | 1.00 | 0.00 | +1.00 |
| Recall | 1.00 | 0.20 | +0.80 |
| Compounding | 1.00 | 0.27 | +0.73 |
| Overall | 1.00 | 0.14 | +0.86 |
11 scenarios across 3 categories. Each scenario uses fictional project context ("Voltis") with arbitrary, non-derivable facts — values like hvt_client, vtctl deploy-gate, VTX_LEGACY_DSN, port 7443, and 73% that Claude cannot guess from naming patterns or training data.
- Coding tasks (4 scenarios) — Does the agent apply project-specific tools and conventions?
- Knowledge recall (4 scenarios) — Can the agent recall exact config values and decisions?
- Compounding value (3 scenarios) — Can the agent synthesize multiple memories to diagnose problems?
- Purges stale auto-memory from prior eval runs (
~/.claude/projects/cc_eval*) - Clears eval memories, creates an isolated temp project (no CLAUDE.md, no
.claude/) - Runs the prompt without Memories via
claude -p --strict-mcp-config(empty MCP) → scores against rubrics - Seeds scenario memories, runs the prompt with Memories via
claude -p --strict-mcp-config(Memories MCP only) → scores again - Computes efficacy delta = score_with - score_without
- Aggregates across categories with configurable weights
--strict-mcp-configensures Claude loads only the MCP config provided (or none), ignoring global settings- Fresh temp directories per run — no CLAUDE.md, no
.claude/, no conversation history - Auto-memory cleanup removes
~/.claude/projects/cc_eval*dirs at startup and after each run - Scenario memories cleared before each run via Memories API
Results are saved as JSON in eval/results/ and printed as a human-readable summary.
See the design doc for full details.
| Metric | Value |
|---|---|
| Docker image size | ~430MB core / ~436MB extract (no baked model cache by default) |
| Search latency | <50ms |
| Add latency | ~100ms (includes backup) |
| Model loading | Cold boot downloads model once; warm boots reuse /data/model-cache |
| Memory footprint | ~180-260MB baseline; higher during extraction bursts |
| Index size | ~1.5KB per memory |
Uses ONNX Runtime for inference instead of PyTorch — same model (all-MiniLM-L6-v2), same embeddings, 68% smaller image.
Tested on Mac mini M4 Pro, 16GB RAM.
# Install dependencies
uv sync # core only
uv sync --extra extract # with extraction (Anthropic SDK)
uv sync --extra cloud # with cloud sync (boto3)
# Run tests
uv run pytest -q
# Local dev server
uv run uvicorn app:app --reload
# Docker
docker build --target core -t memories:core .
docker build --target extract -t memories:extract .When changing memory/index behavior: add or update tests, validate backup/restore still works, validate extraction if touching extraction paths, update README and/or docs/architecture.md.
- Auto-rebuild on file changes (watch mode)
- Multi-index support (different projects)
- Memory tagging system
- Search filters by date/type (source filter exists)
- Scheduled index rebuilds via cron
- No hardcoded credentials in docs/examples
- Public docs avoid product-specific assumptions unless the file is intentionally integration-specific
- Benchmarks describe workload profile and caveats
- Versioned behavior changes documented in README