Memories

Local semantic memory for AI assistants. Zero-cost, <50ms, hybrid BM25+vector search.

Works with Claude Code, Claude Desktop, Claude Chat, Codex, Cursor, ChatGPT, OpenClaw, and anything that can call HTTP or MCP.

Start here:

API Quick Start

# 1. Clone and build
git clone git@github.com:divyekant/memories.git
cd memories
docker compose -f docker-compose.snippet.yml up -d

# 2. Verify
curl http://localhost:8900/health

# 3. Add a memory
curl -X POST http://localhost:8900/memory/add \
  -H "Content-Type: application/json" \
  -d '{"text": "Always use TypeScript strict mode", "source": "standards.md"}'

# 4. Search
curl -X POST http://localhost:8900/search \
  -H "Content-Type: application/json" \
  -d '{"query": "TypeScript config", "k": 3, "hybrid": true}'

The service runs at http://localhost:8900. API docs at http://localhost:8900/docs. Web UI at http://localhost:8900/ui.

Web UI

The built-in UI at /ui provides:

Dashboard — memory stats, extraction metrics, server info
Memories — browse, search, filter, and manage memories with list+detail or grid view
Extractions — extraction job stats and token usage
API Keys — configure authentication
Settings — provider config, server info, theme toggle (dark/light/system), export and maintenance

No build step — vanilla JS + CSS served directly from webui/.

CLI

The memories CLI provides full access to the API from your terminal.

Install

pip install -e .
# Or if using the Docker image, the CLI is included

Usage

# Search
memories search "TypeScript config"

# Add a memory
memories add "Always use strict mode" --source standards

# List memories
memories list --source standards

# Check novelty before adding
memories is-novel "TypeScript strict mode"

# Batch operations
memories batch add memories.jsonl

# Admin
memories admin stats
memories admin health

# Backups
memories backup create
memories backup list

# Full help
memories --help

Export & Import

# Export all memories
memories export -o backup.jsonl

# Export filtered by source
memories export --source "claude-code/" -o project.jsonl

# Export with date range
memories export --source "proj/" --since 2026-01-01 -o recent.jsonl

# Import (clean migration)
memories import backup.jsonl

# Import with smart dedup
memories import backup.jsonl --strategy smart

# Import with source remapping
memories import backup.jsonl --source-remap "old/=new/"

Agent Integration

The CLI auto-detects when piped and outputs JSON:

# JSON output for agents (automatic when piped)
memories search "auth" | jq '.data.results[0].text'

# Force JSON in any context
memories --json search "auth"

# Force human-readable when piped
memories --pretty list

Configuration

# Set server URL
memories config set url http://localhost:8900

# Set API key
memories config set api_key your-key-here

# View resolved config
memories config show

Config resolution: CLI flags > ~/.config/memories/config.json > env vars > defaults.

Architecture

AI Client (Claude, Codex, Cursor, ChatGPT, OpenClaw)
    |
    |-- MCP protocol (Claude Code / Desktop / Codex / Cursor)
    |-- REST API (everything else)
    v
MCP Server (mcp-server/index.js)
    |
    v
Memories Service (Docker :8900)
    |-- FastAPI REST API
    |-- Hybrid Search (Memories vector + BM25 keyword, RRF fusion)
    |-- Markdown-aware chunking
    |-- Auto-backups
    v
Persistent Storage (data/)
    |-- vector_index.bin (Memories vector index snapshot)
    |-- metadata.json (memory text + metadata)
    |-- backups/ (auto, keeps last 10)

Detailed docs:

Integration Guides

Claude Code (CLI)

The MCP server gives Claude Code native memory_search, memory_add, memory_extract, memory_delete, memory_delete_batch, memory_delete_by_source, memory_count, memory_list, memory_stats, and memory_is_novel tools.

Setup:

Install the MCP server dependencies:

cd memories/mcp-server
npm install

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "memories": {
      "command": "node",
      "args": ["/path/to/memories/mcp-server/index.js"],
      "env": {
        "MEMORIES_URL": "http://localhost:8900",
        "MEMORIES_API_KEY": "your-api-key-here"
      }
    }
  }
}

Restart Claude Code. The tools are now available in every project.
(Optional) Install the Memories skill for disciplined memory capture and proactive recall:

mkdir -p ~/.claude/skills/memories
ln -s /path/to/memories/skills/memories ~/.claude/skills/memories

The skill teaches the assistant three responsibilities: when to search (proactive recall), when and how to store (hybrid memory_add + memory_extract), and when to maintain (updates, deletes, cleanup via AUDN). It adds ~11% token overhead but improves memory discipline by ~43% in eval benchmarks.

Usage (Claude Code will call these automatically when relevant):

"Search my memory for authentication patterns"
"Remember that we decided to use Prisma for the ORM"
"Check if this pattern is already in memory before adding it"
"Show me all memories from the bug-fixes source"

For a single project only, create .mcp.json in the project root instead of editing settings.json.

Claude Desktop (Chat / Cowork)

Same MCP server, different config file.

Setup:

Install dependencies (same as above):

cd memories/mcp-server
npm install

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "memories": {
      "command": "node",
      "args": ["/path/to/memories/mcp-server/index.js"],
      "env": {
        "MEMORIES_URL": "http://localhost:8900",
        "MEMORIES_API_KEY": "your-api-key-here"
      }
    }
  }
}

Restart the Claude Desktop app. Memory tools appear in chat and cowork mode.

Claude Chat (Web at claude.ai)

Claude Chat on the web does not support MCP directly. Two options:

Option A: Remote MCP via Cloudflare Tunnel (recommended)

If you expose the Memories service via a tunnel (e.g., memory.yourdomain.com), you can use Claude's remote MCP connector feature to connect to it. See the Remote Access section below.

Option B: Manual curl in prompts

Paste curl commands in your messages and ask Claude to interpret the results:

Search my memory service for React patterns:

curl -X POST https://memory.yourdomain.com/search \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{"query": "React patterns", "k": 5, "hybrid": true}'

Codex (OpenAI)

Codex supports MCP natively via ~/.codex/config.toml.

Setup:

Install dependencies:

cd memories/mcp-server
npm install

Add to ~/.codex/config.toml:

[mcp_servers.memories]
command = "node"
args = ["/path/to/memories/mcp-server/index.js"]

[mcp_servers.memories.env]
MEMORIES_URL = "http://localhost:8900"
MEMORIES_API_KEY = "your-api-key-here"

If your API key is prefix-scoped and does not allow codex/*, set hook source overrides in ~/.config/memories/env:

MEMORIES_SOURCE_PREFIX="your-authorized-prefix"
# or exact source:
# MEMORIES_SOURCE="your-authorized-prefix/your-project"

Restart Codex. The memory_search, memory_add, memory_extract, memory_delete, memory_delete_by_source, memory_count, memory_list, memory_stats, memory_is_novel, and other tools will be available.

Automatic memory layer for Codex:

cd memories/mcp-server
npm install
cd ..
./integrations/claude-code/install.sh --codex

This configures:

MCP server registration in ~/.codex/config.toml
notify hook script at ~/.codex/hooks/memory/memory-codex-notify.sh for after-turn extraction
default developer_instructions (if not already set) to bias memory_search usage on each turn
hook env loading from ~/.config/memories/env (or MEMORIES_ENV_FILE) for MEMORIES_URL, MEMORIES_API_KEY, and optional source overrides (MEMORIES_SOURCE_PREFIX, MEMORIES_SOURCE)

The installer requires jq, curl, and a running Memories service (/health must respond). If ~/.codex/config.toml already has a notify = [...] entry, the installer will not overwrite it — merge the Memories notify script into that array manually. For scoped API keys, set MEMORIES_SOURCE_PREFIX (or MEMORIES_SOURCE) so hook writes stay inside authorized prefixes.

Codex currently exposes an after-turn notify hook, not Claude's 5-event hook surface.

Usage (Codex will discover the tools automatically):

"Search memory for how we handle error logging"
"Store this architecture decision in memory"
"List all memories from the project-setup source"

Cursor

Cursor supports MCP with the same server.

Setup:

Install dependencies:

cd memories/mcp-server
npm install

Add to Cursor MCP config:

Global: ~/.cursor/mcp.json
Project: .cursor/mcp.json

{
  "mcpServers": {
    "memories": {
      "command": "node",
      "args": ["/path/to/memories/mcp-server/index.js"],
      "env": {
        "MEMORIES_URL": "http://localhost:8900",
        "MEMORIES_API_KEY": "your-api-key-here"
      }
    }
  }
}

Restart Cursor.

Cursor also supports the full hook lifecycle via its "Third-party skills" feature. Run ./integrations/claude-code/install.sh --cursor to install hooks alongside the MCP config.

ChatGPT (Custom GPT)

ChatGPT uses Custom Actions (OpenAPI schema) rather than MCP. This requires exposing the Memories service over the internet.

Prerequisites: Memories service accessible via HTTPS (see Remote Access).

Setup:

Enable API key auth on the Memories service (set API_KEY env var in docker-compose).
In ChatGPT, go to Explore GPTs > Create a GPT > Configure > Actions.
Import this OpenAPI schema (replace memory.yourdomain.com with your URL):

openapi: 3.0.0
info:
  title: Memories
  version: 2.0.0
  description: Semantic memory search and storage
servers:
  - url: https://memory.yourdomain.com
paths:
  /search:
    post:
      operationId: searchMemory
      summary: Search memories by semantic similarity
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [query]
              properties:
                query:
                  type: string
                  description: Natural language search query
                k:
                  type: integer
                  default: 5
                  description: Number of results
                hybrid:
                  type: boolean
                  default: true
                  description: Use hybrid BM25+vector search
      responses:
        '200':
          description: Search results

  /memory/add:
    post:
      operationId: addMemory
      summary: Store a new memory
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [text, source]
              properties:
                text:
                  type: string
                  description: Memory content
                source:
                  type: string
                  description: Source identifier
                deduplicate:
                  type: boolean
                  default: true
      responses:
        '200':
          description: Memory added

  /memory/is-novel:
    post:
      operationId: isNovel
      summary: Check if text is already known
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [text]
              properties:
                text:
                  type: string
                threshold:
                  type: number
                  default: 0.88
      responses:
        '200':
          description: Novelty check result

  /memories:
    get:
      operationId: listMemories
      summary: Browse stored memories with pagination
      parameters:
        - name: offset
          in: query
          schema:
            type: integer
            default: 0
        - name: limit
          in: query
          schema:
            type: integer
            default: 20
            maximum: 5000
        - name: source
          in: query
          description: Source prefix filter
          schema:
            type: string
      responses:
        '200':
          description: List of memories
    delete:
      operationId: deleteMemoriesByPrefix
      summary: Bulk delete all memories matching a source prefix
      parameters:
        - name: source
          in: query
          required: true
          description: Source prefix to match
          schema:
            type: string
      responses:
        '200':
          description: Delete count

  /memories/count:
    get:
      operationId: countMemories
      summary: Count memories optionally filtered by source prefix
      parameters:
        - name: source
          in: query
          description: Source prefix filter
          schema:
            type: string
      responses:
        '200':
          description: Memory count

  /stats:
    get:
      operationId: getStats
      summary: Memory index statistics
      responses:
        '200':
          description: Index stats

Under Authentication, choose API Key with header name X-API-Key.
Add instructions to the GPT system prompt:

You have access to a persistent memory system. Use it to:
- Search for relevant context before answering questions (searchMemory)
- Store important decisions, patterns, and learnings (addMemory)
- Check if something is already known before adding (isNovel)
- Browse what's stored (listMemories)

Always search memory at the start of conversations to load context.

OpenClaw

OpenClaw uses a Skill (SKILL.md) with shell helper functions that call the REST API directly.

Setup:

Create the skill directory and copy the skill file:

mkdir -p ~/.openclaw/skills/memories
cp integrations/openclaw-skill.md ~/.openclaw/skills/memories/SKILL.md

Or see the full SKILL.md in this repo at integrations/openclaw-skill.md.

Set the API key in your shell profile (~/.zshrc or ~/.bashrc):

export MEMORIES_API_KEY="your-api-key-here"

The SKILL.md reads $MEMORIES_API_KEY from the environment — the key is never stored in the skill file itself.

Key commands available to OpenClaw agents:

memory_search_memories "query" [k] [threshold] [hybrid]
memory_add_memories "text" "source" [deduplicate]
memory_is_novel "text" [threshold]
memory_delete_memories <id>
memory_delete_source_memories "pattern"
memory_delete_by_prefix "source_prefix"
memory_count_memories [source_prefix]
memory_list_memories [offset] [limit] [source]
memory_rebuild_index
memory_dedup_memories [dry_run] [threshold]
memory_stats
memory_health
memory_backup [prefix]
memory_restore "backup_name"

All functions use jq for safe JSON construction and read auth from $MEMORIES_API_KEY env var (no hardcoded secrets).

Remote Access

To use Memories from anywhere (Claude Chat web, ChatGPT, mobile, other machines), expose it via a Cloudflare Tunnel or similar.

Setup with Cloudflare Tunnel

Enable API key auth in your docker-compose:

environment:
  - API_KEY=your-secret-key-here

Rebuild and restart: docker compose build memories && docker compose up -d memories

Add to your Cloudflare tunnel config (e.g., in ~/.cloudflared/config.yml):

ingress:
  - hostname: memory.yourdomain.com
    service: http://localhost:8900

Update MCP server env to use the remote URL:

{
  "env": {
    "MEMORIES_URL": "https://memory.yourdomain.com",
    "MEMORIES_API_KEY": "your-secret-key-here"
  }
}

Now every client — Claude Code on your laptop, Cursor, Claude Desktop on your phone, ChatGPT, OpenClaw — all hit the same memory store running on your Mac mini.

Authentication

Memories supports multiple API keys with role-based access control:

Three tiers: read-only (search/list), read-write (search/list + add/delete), admin (full access + key management)
Prefix scoping: keys can be restricted to specific source prefixes for tenant isolation
Key management: create, list, update, and revoke keys via POST/GET/PATCH/DELETE /api/keys or the Web UI (admin-only)
Backward compatible: the existing API_KEY env var still works as an implicit admin key

See the multi-auth design doc for details.

API Reference

All endpoints accept/return JSON. Auth via X-API-Key header.

Search

POST /search
{"query": "...", "k": 5, "hybrid": true, "threshold": 0.3, "vector_weight": 0.7, "source_prefix": "team/project/"}

POST /search/batch
{"queries": [{"query": "...", "k": 5}, {"query": "...", "hybrid": true}]}

Add Memory

POST /memory/add
{"text": "...", "source": "file.md", "deduplicate": true}

Add Batch

POST /memory/add-batch
{"memories": [{"text": "...", "source": "..."}, ...], "deduplicate": true}

Delete

DELETE /memory/{id}
DELETE /memories?source=<prefix>              # Bulk delete by source prefix; returns {"count": N}
POST /memory/delete-batch     {"ids": [1, 2, 3]}
POST /memory/delete-by-source  {"source_pattern": "credentials"}
POST /memory/delete-by-prefix {"source_prefix": "team/project/"}

Get

GET  /memory/{id}
POST /memory/get-batch {"ids": [1, 2, 3]}

Upsert / Patch

POST  /memory/upsert
{"text":"...", "source":"team/project/file", "key":"entity-1", "metadata": {"owner":"team"}}

POST  /memory/upsert-batch
{"memories":[{"text":"...", "source":"...", "key":"..."}]}

PATCH /memory/{id}
{"text":"optional", "source":"optional", "metadata_patch":{"tag":"v2"}}

Novelty Check

POST /memory/is-novel
{"text": "...", "threshold": 0.88}

Browse

GET /memories?offset=0&limit=20&source=filter   # limit up to 5000; source uses prefix matching
GET /memories/count?source=<prefix>             # returns {"count": N}

Deduplication

POST /memory/deduplicate
{"threshold": 0.90, "dry_run": true}

Index Operations

POST /index/build    {"sources": ["file1.md", "file2.md"]}
GET  /stats
GET  /health
GET  /health/ready
GET  /metrics
POST /maintenance/embedder/reload

Backups

GET  /backups
POST /backup?prefix=manual
POST /restore          {"backup_name": "manual_20260213_120000"}

Extraction

POST /memory/extract    {"messages": "...", "source": "proj", "context": "stop"}  # 202 queued
GET  /memory/extract/{job_id}
GET  /extract/status

Full OpenAPI schema at http://localhost:8900/docs.

Future API Candidates (Swarm Scale)

POST /memory/compare (pairwise conflict scoring for concurrent agent writes)
POST /memory/resolve-conflicts (policy-driven merge: latest/manual/model)
POST /memory/lock + DELETE /memory/lock/{key} (explicit lock reservation APIs)
POST /memory/events + GET /memory/events/stream (change feed for agent synchronization)
POST /search/stream (progressive search responses for very large corpora)
POST /memory/ttl (time-bound memories with auto-expiry)

MCP Tools Reference

When connected via MCP (Claude Code, Claude Desktop, Codex, Cursor), these tools are available:

Tool	Description
`memory_search`	Hybrid search (BM25 + vector). Default mode.
`memory_add`	Store a memory with auto-dedup.
`memory_extract`	LLM-based extraction with AUDN (Add/Update/Delete/Noop) from conversation text.
`memory_delete`	Delete by ID.
`memory_delete_batch`	Delete multiple IDs in one operation.
`memory_delete_by_source`	Bulk delete all memories matching a source prefix.
`memory_count`	Count memories, optionally filtered by source prefix.
`memory_list`	Browse with pagination and source prefix filter.
`memory_stats`	Index stats (count, model, last updated).
`memory_is_novel`	Check if text is already known.

Configuration

Environment Variables

Variable	Default	Description
`DATA_DIR`	`/data`	Persistent storage path
`WORKSPACE_DIR`	`/workspace`	Read-only workspace for index rebuilds
`API_KEY`	(empty)	API key for auth. Empty = no auth.
`EMBED_PROVIDER`	`onnx`	Embedding provider: `onnx` (local) or `openai` (BYOK)
`EMBED_MODEL`	(unset)	Provider-specific embedding model override
`MODEL_NAME`	`all-MiniLM-L6-v2`	Default ONNX model used when `EMBED_PROVIDER=onnx` and `EMBED_MODEL` is unset
`MODEL_CACHE_DIR`	(unset; Docker image sets `/data/model-cache`)	Optional writable cache path for downloaded model files
`PRELOADED_MODEL_CACHE_DIR`	(unset; Docker image sets `/opt/model-cache`)	Optional read-only cache to seed `MODEL_CACHE_DIR` when empty
`MAX_BACKUPS`	`10`	Number of backups to keep
`MAX_EXTRACT_MESSAGE_CHARS`	`120000`	Max characters accepted by `/memory/extract`
`EXTRACT_MAX_INFLIGHT`	`2`	Max concurrent extraction jobs
`MEMORY_TRIM_ENABLED`	`true`	Run post-extract GC/allocator trim
`MEMORY_TRIM_COOLDOWN_SEC`	`15`	Minimum seconds between trim attempts
`MEMORY_TRIM_PERIODIC_SEC`	`5`	Periodic trim probe interval (seconds). Set `0` to disable background trim loop.
`EMBEDDER_AUTO_RELOAD_ENABLED`	`false`	Enable periodic auto-reload of in-process embedder runtime
`EMBEDDER_AUTO_RELOAD_RSS_KB_THRESHOLD`	`1200000`	RSS threshold (KB) required before auto-reload decisions
`EMBEDDER_AUTO_RELOAD_CHECK_SEC`	`15`	Seconds between auto-reload checks
`EMBEDDER_AUTO_RELOAD_HIGH_STREAK`	`3`	Consecutive high-RSS checks required before trigger
`EMBEDDER_AUTO_RELOAD_MIN_INTERVAL_SEC`	`900`	Cooldown between reload attempts
`EMBEDDER_AUTO_RELOAD_WINDOW_SEC`	`3600`	Rolling window size for reload cap
`EMBEDDER_AUTO_RELOAD_MAX_PER_WINDOW`	`2`	Max reloads allowed per rolling window
`EMBEDDER_AUTO_RELOAD_MAX_ACTIVE_REQUESTS`	`2`	Skip reload when active HTTP requests exceed this
`EMBEDDER_AUTO_RELOAD_MAX_QUEUE_DEPTH`	`0`	Skip reload when extract queue depth exceeds this
`METRICS_LATENCY_SAMPLES`	`200`	Per-route latency sample window for `/metrics` percentiles
`METRICS_TREND_SAMPLES`	`120`	Memory trend sample window exposed by `/metrics`
`PORT`	`8000`	Internal service port

Docker Compose guardrails

Default compose files now include:

mem_limit: ${MEMORIES_MEM_LIMIT:-3g} to bound container memory growth
MALLOC_ARENA_MAX=2 to reduce glibc arena fragmentation in multithreaded workloads
MALLOC_TRIM_THRESHOLD_=131072 and MALLOC_MMAP_THRESHOLD_=131072 to encourage earlier allocator release
extraction env passthrough (EXTRACT_PROVIDER, EXTRACT_MODEL, provider keys/URL) so deploys keep extraction enabled when set in shell or .env
embedder auto-reload env passthrough with anti-loop defaults (EMBEDDER_AUTO_RELOAD_*)

MCP Server Environment

Variable	Default	Description
`MEMORIES_URL`	`http://localhost:8900`	Memories service URL
`MEMORIES_API_KEY`	(empty)	API key if auth is enabled

Automatic Memory Layer

Memories supports automatic retrieval/extraction, with client-specific behavior:

Claude Code: full 5-hook lifecycle (session start, each prompt, stop, pre-compact, session end)
Cursor: same 5-hook lifecycle via Third-party skills (loads from ~/.claude/settings.json)
Codex: native notify hook after each completed turn + MCP/developer instructions for retrieval
OpenClaw: skill-driven retrieval/extraction flow

Claude Code / Cursor Hook Lifecycle

Event	Hook	What happens
Session start	`memory-recall.sh`	Loads project-specific memories into context
Every prompt	`memory-query.sh`	Retrieves memories relevant to the question
After response	`memory-extract.sh`	Extracts facts and stores via AUDN pipeline
Before compaction	`memory-flush.sh`	Aggressive extraction before context loss
Session end	`memory-commit.sh`	Final extraction pass

Cursor compatibility note: Cursor sends workspace_roots[] (not cwd) and transcript_path (not inline messages) in hook payloads. The hook scripts handle both formats automatically — no separate configuration needed.

Codex Lifecycle (Native)

Event	Mechanism	What happens
After each completed turn	`notify` -> `memory-codex-notify.sh`	Sends user+assistant exchange to `/memory/extract` asynchronously (loads hook env file, handles snake/camel/kebab payload variants, supports transcript fallback and source overrides)
On new turns	MCP tools + developer instructions	Encourages focused `memory_search` before implementation-heavy responses

Codex does not currently expose the Claude-style SessionStart/UserPromptSubmit/PreCompact/SessionEnd hook callbacks in config.toml.

Quick setup

Prerequisites:

jq and curl installed (required by installer)
running Memories service (curl -s http://localhost:8900/health | jq .)
if installing Codex integration, MCP deps installed:

cd memories/mcp-server
npm install

One-command auto-detect installer (recommended):

./integrations/claude-code/install.sh --auto

This detects and configures any available targets on your machine:

Claude Code hooks (~/.claude/settings.json)
Codex native config (~/.codex/config.toml)
OpenClaw skill (~/.openclaw/skills/memories/SKILL.md)

Cursor is supported via manual MCP config (~/.cursor/mcp.json or .cursor/mcp.json).

The installer writes runtime config to:

~/.config/memories/env for hook vars (MEMORIES_URL, optional MEMORIES_API_KEY, optional MEMORIES_SOURCE_PREFIX / MEMORIES_SOURCE for Codex notify source control)
repo .env for extraction vars (EXTRACT_PROVIDER, provider keys/URL)

Target only Claude, Cursor, or Codex:

./integrations/claude-code/install.sh --claude
./integrations/claude-code/install.sh --cursor
./integrations/claude-code/install.sh --codex

Target only OpenClaw:

./integrations/claude-code/install.sh --openclaw

LLM-assisted setup: Feed integrations/QUICKSTART-LLM.md to your AI assistant and it will configure everything automatically.

Extraction providers

Provider	Cost	AUDN	Speed
Anthropic (recommended)	~$0.001/turn	Full (Add/Update/Delete/Noop)	~1-2s
OpenAI	~$0.001/turn	Full	~1-2s
ChatGPT Subscription	Free (uses your subscription)	Full	~1-2s
Ollama	Free	Full	~5s
Skip	Free	None	N/A

Extraction is optional. Without it, retrieval still works.

By default, automatic write hooks do not store new memories when extraction is disabled. If you want a degraded automatic-write mode, set EXTRACT_FALLBACK_ADD=true to enable a strict heuristic + novelty-check fallback that writes at most a small number of high-confidence facts when extraction is disabled or the configured provider fails at runtime (for example rate limits/timeouts).

AUDN in plain English

AUDN is the memory decision loop:

ADD: store a genuinely new fact
UPDATE: refine an existing memory that is close but outdated/incomplete
DELETE: remove a stale/conflicting memory
NOOP: ignore non-useful or duplicate facts

Why it matters:

cleaner memory store over time (less duplicate/stale data)
better retrieval quality in later sessions
less "memory drift" when decisions change

Cost vs quality

Anthropic/OpenAI extraction: small usage cost (typically around ~$0.001/turn), full AUDN quality.
ChatGPT Subscription extraction: no additional API cost (uses your existing subscription), full AUDN quality.
Ollama extraction: no API cost, full AUDN quality (with JSON format constraint).
Retrieval only (EXTRACT_PROVIDER unset): no extraction model cost.
Optional fallback writes (EXTRACT_FALLBACK_ADD=true): add-only, heuristic extraction path (no AUDN update/delete) used when extraction is disabled or provider calls fail at runtime.

Cost control knobs

Use these to keep extraction spend bounded:

MAX_EXTRACT_MESSAGE_CHARS: hard cap on transcript size per request
EXTRACT_MAX_FACTS: limits facts considered from each extraction
EXTRACT_MAX_FACT_CHARS: caps per-fact payload size
EXTRACT_SIMILAR_TEXT_CHARS and EXTRACT_SIMILAR_PER_FACT: limit context passed into AUDN

Async extraction API

POST /memory/extract is async-first. It enqueues work and returns 202 with a job_id. Poll GET /memory/extract/{job_id} for queued, running, completed, or failed. If the queue is full, the API returns 429 with a Retry-After header. When extraction is disabled and EXTRACT_FALLBACK_ADD=true, /memory/extract runs an immediate fallback add path and still returns a job object. When extraction is configured but fails at runtime, the queued worker also falls back to add-only mode when EXTRACT_FALLBACK_ADD=true.

Docker image targets (core / extract)

The Dockerfile publishes two runtime targets:

core (default): search/add/list endpoints, no Anthropic/OpenAI SDKs
extract: includes Anthropic/OpenAI SDKs for /memory/extract

Build both images directly:

docker build --target core -t memories:core .
docker build --target extract -t memories:extract .

Use compose with either target:

# Default (core target)
docker compose up -d --build memories

# Extraction-ready target
MEMORIES_IMAGE_TARGET=extract docker compose up -d --build memories

By default, images do not bake model weights. On first run, the service downloads them into MODEL_CACHE_DIR (/data/model-cache in Docker), so later restarts reuse the volume cache.

If you want a fully preloaded image (faster first boot, larger pull), set PRELOAD_MODEL=true:

docker build --target core --build-arg PRELOAD_MODEL=true -t memories:core .
docker build --target extract --build-arg PRELOAD_MODEL=true -t memories:extract .

Ollama uses HTTP directly and does not need the extra SDKs, so core is enough for Ollama extraction.

Extraction environment variables

Variable	Default	Description
`EXTRACT_PROVIDER`	(none)	`anthropic`, `openai`, `chatgpt-subscription`, `ollama`, or empty to disable
`EXTRACT_MODEL`	(per provider)	Model override
`ANTHROPIC_API_KEY`	(none)	Required for Anthropic provider (standard key or `sk-ant-oat01-` OAuth token)
`OPENAI_API_KEY`	(none)	Required for OpenAI provider
`CHATGPT_REFRESH_TOKEN`	(none)	Required for ChatGPT Subscription provider (from `python -m memories auth chatgpt`)
`CHATGPT_CLIENT_ID`	(none)	Required for ChatGPT Subscription provider
`OLLAMA_URL`	`http://host.docker.internal:11434`	Ollama server URL (on Linux, use `http://localhost:11434`)
`EXTRACT_FALLBACK_ADD`	`false`	Enable add-only fallback writes when extraction is disabled or provider calls fail at runtime
`EXTRACT_FALLBACK_MAX_FACTS`	`1`	Max fallback facts to store per extract request
`EXTRACT_FALLBACK_MIN_FACT_CHARS`	`24`	Minimum candidate fact length for fallback
`EXTRACT_FALLBACK_MAX_FACT_CHARS`	`280`	Maximum candidate fact length for fallback
`EXTRACT_FALLBACK_NOVELTY_THRESHOLD`	`0.88`	Novelty threshold used by fallback add mode
`EXTRACT_QUEUE_MAX`	`EXTRACT_MAX_INFLIGHT * 20`	Maximum queued extraction jobs before backpressure (`429`)
`EXTRACT_JOB_RETENTION_SEC`	`300`	How long completed/failed extraction jobs stay queryable
`EXTRACT_JOBS_MAX`	`200`	Hard cap on stored extraction job records (finished jobs evicted first)
`EXTRACT_MAX_FACTS`	`30`	Maximum facts kept from a single extraction
`EXTRACT_MAX_FACT_CHARS`	`500`	Max length per extracted fact
`EXTRACT_SIMILAR_TEXT_CHARS`	`280`	Max similar-memory text length passed into AUDN
`EXTRACT_SIMILAR_PER_FACT`	`5`	Similar memories included per fact during AUDN

Burst memory behavior

Extraction can create short-lived allocation spikes (large transcripts, large LLM JSON payloads, concurrent requests).

Mitigations built in:

/memory/extract request size limit (MAX_EXTRACT_MESSAGE_CHARS)
bounded in-flight extraction (EXTRACT_MAX_INFLIGHT)
post-extract + periodic memory reclamation (MEMORY_TRIM_ENABLED, MEMORY_TRIM_COOLDOWN_SEC, MEMORY_TRIM_PERIODIC_SEC)
optional auto-reload controller for the embedder runtime (EMBEDDER_AUTO_RELOAD_*)
bounded AUDN payload sizes (EXTRACT_MAX_FACTS, EXTRACT_MAX_FACT_CHARS, EXTRACT_SIMILAR_TEXT_CHARS)

Observability:

/metrics includes embedder_reload.auto and embedder_reload.manual counters/state
manual reload endpoint: POST /maintenance/embedder/reload

Reference benchmark: docs/benchmarks/2026-02-17-memory-reclamation.md

Uninstall

./integrations/claude-code/install.sh --uninstall

Then optionally remove MEMORIES_* from ~/.config/memories/env and EXTRACT_* from repo .env.

Backup & Recovery

Memories has three layers of backup protection:

1. Auto-backup (built-in)

The service automatically saves a snapshot after every write operation. The 10 most recent auto-backups are kept in the Docker volume under data/backups/.

# List backups
curl -H "X-API-Key: $MEMORIES_API_KEY" http://localhost:8900/backups

# Create manual backup
curl -X POST -H "X-API-Key: $MEMORIES_API_KEY" http://localhost:8900/backup?prefix=manual

# Restore from backup
curl -X POST -H "X-API-Key: $MEMORIES_API_KEY" http://localhost:8900/restore \
  -H "Content-Type: application/json" \
  -d '{"backup_name": "manual_20260214_120000"}'

2. Scheduled local snapshots (cron)

A cron job creates timestamped copies of the Memories index every 30 minutes. Snapshots are stored outside the Docker volume (default: ~/backups/memories/) with 30-day retention.

# Install the cron job
./scripts/install-cron.sh install

# Check status
./scripts/install-cron.sh status

# Run a backup manually
./scripts/backup.sh

# Dry run (no changes)
./scripts/backup.sh --test

Environment variables (all optional, sensible defaults):

Variable	Default	Description
`MEMORIES_URL`	`http://localhost:8900`	Service URL
`MEMORIES_API_KEY`	(empty)	API key if auth is enabled
`MEMORIES_DATA_DIR`	`./data` (relative to repo)	Docker volume data path
`BACKUP_DIR`	`~/backups/memories`	Where to store snapshots
`RETENTION_DAYS`	`30`	Days to keep local snapshots

3. Off-site backup to Google Drive (optional)

If you set GDRIVE_ACCOUNT, each backup automatically uploads the latest snapshot to Google Drive as a compressed tar.gz. Uploads are throttled to once per hour. 7-day retention on Drive.

Prerequisites:

Install gog CLI
Authenticate: gog auth add your-email@gmail.com --services drive
Set env var in your shell profile:

export GDRIVE_ACCOUNT="your-email@gmail.com"

Environment variables (all optional):

Variable	Default	Description
`GDRIVE_ACCOUNT`	(none)	Google account email. Required to enable GDrive.
`GDRIVE_FOLDER_NAME`	`memories-backups`	Folder name on Drive
`UPLOAD_INTERVAL_MIN`	`55`	Minimum minutes between uploads
`GDRIVE_RETENTION_DAYS`	`7`	Days to keep backups on Drive

Manual usage:

# Setup (create Drive folder + test auth)
./scripts/backup-gdrive.sh --setup

# Upload now (skip throttle)
./scripts/backup-gdrive.sh --force

# Dry run
./scripts/backup-gdrive.sh --test

# Only clean up old backups on Drive
./scripts/backup-gdrive.sh --cleanup

Alternative: S3-compatible cloud sync

For S3/MinIO/R2 backends, build with cloud sync enabled:

ENABLE_CLOUD_SYNC=true docker compose up -d --build memories

See CLOUD_SYNC_README.md for configuration details.

Project Structure

memories/
  app.py                  # FastAPI REST API
  memory_engine.py        # Memories engine (search, chunking, BM25, backups)
  onnx_embedder.py        # ONNX Runtime embedder (replaces PyTorch)
  llm_provider.py         # LLM provider abstraction (Anthropic/OpenAI/ChatGPT Subscription/Ollama)
  llm_extract.py          # Extraction pipeline with AUDN
  chatgpt_oauth.py        # ChatGPT OAuth2+PKCE token exchange helpers
  key_store.py            # SQLite-backed API key store (SHA-256 hashing)
  auth_context.py         # Request-scoped role and prefix enforcement
  memories_auth.py        # CLI auth tool (python -m memories auth chatgpt/status)
  __main__.py             # Entry point for python -m memories
  Dockerfile              # Multi-stage Docker build (core/extract targets)
  pyproject.toml          # Python dependencies (uv)
  uv.lock                 # Locked dependency resolutions
  docker-compose.snippet.yml
  docs/
    api.md                # Complete REST API reference
    architecture.md       # System architecture and runtime flows
    decisions.md          # Key design decisions and tradeoffs
    benchmarks/           # Reproducible benchmark notes
  mcp-server/
    index.js              # MCP server (wraps REST API as tools)
    package.json
  scripts/
    backup.sh             # Cron backup (local snapshots)
    backup-gdrive.sh      # Optional Google Drive upload
    install-cron.sh       # Cron job installer
  webui/
    index.html            # Memory browser entry page (/ui)
    styles.css            # UI styling
    app.js                # Browser-side pagination/filter logic
  integrations/
    claude-code/
      install.sh          # Auto-detect installer (Claude/Codex/Cursor/OpenClaw)
      hooks/              # Claude Code 5-hook scripts + hooks.json
    codex/
      memory-codex-notify.sh # Codex notify hook script (after-turn extraction)
    claude-code.md        # Claude Code guide
    openclaw-skill.md     # OpenClaw SKILL.md
    QUICKSTART-LLM.md     # LLM-friendly setup guide
  tests/
    test_memory_engine.py # Memory engine tests
    test_llm_provider.py  # LLM provider tests (incl. ChatGPT Subscription)
    test_chatgpt_oauth.py # OAuth PKCE + token exchange tests
    test_memories_auth.py # CLI auth tool tests
    test_llm_extract.py   # Extraction pipeline tests
    test_extract_api.py   # API endpoint tests
    test_web_ui.py        # Web UI route/static tests
  skills/
    memories/
      SKILL.md            # Claude Code skill for memory discipline
  eval/
    __main__.py           # CLI entrypoint (python -m eval)
    models.py             # Pydantic data models (Scenario, EvalReport, etc.)
    loader.py             # YAML scenario loader
    scorer.py             # Deterministic rubric scorer
    judge.py              # LLM-as-judge for non-deterministic rubrics
    memories_client.py    # Memories API client for eval runner
    cc_executor.py        # Claude Code executor with project isolation
    runner.py             # Orchestrates with/without-memory runs
    reporter.py           # JSON reporter and summary formatter
    config.yaml           # Default eval configuration
    scenarios/            # YAML test scenarios by category
    results/              # JSON eval reports (.gitignored)
    tests/                # 82 tests covering all eval components
  data/                   # .gitignored — persistent index + backups

Efficacy Eval

Memories includes a built-in eval harness that measures how much Memories improves AI assistant performance. It runs controlled A/B tests: each scenario executes via Claude Code (claude -p) both with and without Memories, then scores the outputs against deterministic rubrics.

# Run all scenarios (via wrapper script)
./eval/run.sh

# Or directly via Python
python -m eval

# Run a specific category
python -m eval --category coding

# Run a single scenario
python -m eval --scenario coding-001 -v

Results

Category	With Memory	Without Memory	Delta
Coding	1.00	0.00	+1.00
Recall	1.00	0.20	+0.80
Compounding	1.00	0.27	+0.73
Overall	1.00	0.14	+0.86

11 scenarios across 3 categories. Each scenario uses fictional project context ("Voltis") with arbitrary, non-derivable facts — values like hvt_client, vtctl deploy-gate, VTX_LEGACY_DSN, port 7443, and 73% that Claude cannot guess from naming patterns or training data.

What it measures

Coding tasks (4 scenarios) — Does the agent apply project-specific tools and conventions?
Knowledge recall (4 scenarios) — Can the agent recall exact config values and decisions?
Compounding value (3 scenarios) — Can the agent synthesize multiple memories to diagnose problems?

How it works

Purges stale auto-memory from prior eval runs (~/.claude/projects/cc_eval*)
Clears eval memories, creates an isolated temp project (no CLAUDE.md, no .claude/)
Runs the prompt without Memories via claude -p --strict-mcp-config (empty MCP) → scores against rubrics
Seeds scenario memories, runs the prompt with Memories via claude -p --strict-mcp-config (Memories MCP only) → scores again
Computes efficacy delta = score_with - score_without
Aggregates across categories with configurable weights

Isolation strategy

--strict-mcp-config ensures Claude loads only the MCP config provided (or none), ignoring global settings
Fresh temp directories per run — no CLAUDE.md, no .claude/, no conversation history
Auto-memory cleanup removes ~/.claude/projects/cc_eval* dirs at startup and after each run
Scenario memories cleared before each run via Memories API

Results are saved as JSON in eval/results/ and printed as a human-readable summary.

See the design doc for full details.

Performance

Metric	Value
Docker image size	~430MB core / ~436MB extract (no baked model cache by default)
Search latency	<50ms
Add latency	~100ms (includes backup)
Model loading	Cold boot downloads model once; warm boots reuse `/data/model-cache`
Memory footprint	~180-260MB baseline; higher during extraction bursts
Index size	~1.5KB per memory

Uses ONNX Runtime for inference instead of PyTorch — same model (all-MiniLM-L6-v2), same embeddings, 68% smaller image.

Tested on Mac mini M4 Pro, 16GB RAM.

Development

# Install dependencies
uv sync                              # core only
uv sync --extra extract              # with extraction (Anthropic SDK)
uv sync --extra cloud                # with cloud sync (boto3)

# Run tests
uv run pytest -q

# Local dev server
uv run uvicorn app:app --reload

# Docker
docker build --target core -t memories:core .
docker build --target extract -t memories:extract .

When changing memory/index behavior: add or update tests, validate backup/restore still works, validate extraction if touching extraction paths, update README and/or docs/architecture.md.

Roadmap

Auto-rebuild on file changes (watch mode)
Multi-index support (different projects)
Memory tagging system
Search filters by date/type (source filter exists)
Scheduled index rebuilds via cron

Release Checklist

No hardcoded credentials in docs/examples
Public docs avoid product-specific assumptions unless the file is intentionally integration-specific
Benchmarks describe workload profile and caveats
Versioned behavior changes documented in README

Name		Name	Last commit message	Last commit date
Latest commit History 195 Commits
cli		cli
docs		docs
eval		eval
integrations		integrations
mcp-server		mcp-server
scripts		scripts
skills/memories		skills/memories
tests		tests
webui		webui
.dockerignore		.dockerignore
.gitignore		.gitignore
.kalos.yaml		.kalos.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
GETTING_STARTED.md		GETTING_STARTED.md
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
__main__.py		__main__.py
app.py		app.py
auth_context.py		auth_context.py
chatgpt_oauth.py		chatgpt_oauth.py
cloud_sync.py		cloud_sync.py
consolidator.py		consolidator.py
docker-compose.qdrant-cluster.template.yml		docker-compose.qdrant-cluster.template.yml
docker-compose.snippet.yml		docker-compose.snippet.yml
docker-compose.yml		docker-compose.yml
embedder_reloader.py		embedder_reloader.py
entity_locks.py		entity_locks.py
key_store.py		key_store.py
llm_extract.py		llm_extract.py
llm_provider.py		llm_provider.py
memories_auth.py		memories_auth.py
memory_engine.py		memory_engine.py
onnx_embedder.py		onnx_embedder.py
openai_embedder.py		openai_embedder.py
pyproject.toml		pyproject.toml
qdrant_config.py		qdrant_config.py
qdrant_store.py		qdrant_store.py
runtime_memory.py		runtime_memory.py
usage_tracker.py		usage_tracker.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Memories

API Quick Start

Web UI

CLI

Install

Usage

Export & Import

Agent Integration

Configuration

Architecture

Integration Guides

Claude Code (CLI)

Claude Desktop (Chat / Cowork)

Claude Chat (Web at claude.ai)

Codex (OpenAI)

Cursor

ChatGPT (Custom GPT)

OpenClaw

Remote Access

Setup with Cloudflare Tunnel

Authentication

API Reference

Search

Add Memory

Add Batch

Delete

Get

Upsert / Patch

Novelty Check

Browse

Deduplication

Index Operations

Backups

Extraction

Future API Candidates (Swarm Scale)

MCP Tools Reference

Configuration

Environment Variables

Docker Compose guardrails

MCP Server Environment

Automatic Memory Layer

Claude Code / Cursor Hook Lifecycle

Codex Lifecycle (Native)

Quick setup

Extraction providers

AUDN in plain English

Cost vs quality

Cost control knobs

Async extraction API

Docker image targets (core / extract)

Extraction environment variables

Burst memory behavior

Uninstall

Backup & Recovery

1. Auto-backup (built-in)

2. Scheduled local snapshots (cron)

3. Off-site backup to Google Drive (optional)

Alternative: S3-compatible cloud sync

Project Structure

Efficacy Eval

Results

What it measures

How it works

Isolation strategy

Performance

Development

Roadmap

Release Checklist

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Packages