A brain-inspired memory management system for AI agents that forgets, consolidates, reconsolidates, detects contradictions, suppresses competitors, and abstracts behavioral patterns — capabilities no existing AI memory system (MemGPT, Mem0, Zep, or A-Mem) has shipped in production.
Paper: Why AI Needs to Dream: A Sleep/Wake Memory Architecture for Conversational Agents (April 2026)
| Result | Significance | |
|---|---|---|
| Spacing effect | 9.4x retention advantage at 90 days | First implementation in any agent memory system |
| Consolidation | 126 semantic + 130 schema emerge from 920 episodic | Three-layer hierarchy forms autonomously with 72% storage reduction |
| LongMemEval-M | 42% EM / 52.6% LLM-judge (500 questions, ~10K memories each) | Temporal reasoning (37.6%) is weakest — active area of improvement |
| Longitudinal (90 days) | Full system: 13% → 33% accuracy | No-decay degrades to 9% as 16K memories drown retrieval. 3.5x advantage. |
| Ablation | Exponential decay and linear stability produce 0% EM | Our design choices (hyperbolic + power-law) are the only ones that work |
| Unit tests | 125/125 passing | Pure math verification of all formulas, no DB or LLM needed |
The central finding
On LongMemEval-M (500 questions, ~10K memories per question), the full system scores 42% EM / 52.6% LLM-judge. Temporal reasoning (37.6%) and preferences (6.7%) are the weakest categories — active areas of improvement. Single-session recall is strong (75–79%).
On our 90-day longitudinal benchmark (~16,000 memories), the picture reverses: the full system reaches 33% while systems without forgetting degrade to 9%. This is the crossover effect — once enough memories accumulate, forgetting is not just helpful, it's essential.
No existing benchmark tests this. We derive analytically why the crossover exists and validate it empirically.
Every AI assistant today has the same memory flaw: it stores everything and retrieves by similarity. This is a filing cabinet, not a brain. As conversations accumulate, the system drowns in redundant facts, conflicting information, and outdated priorities — with no mechanism to resolve any of it.
The human brain solves this with six mechanisms that run simultaneously:
- Two-stage consolidation — fast capture in the hippocampus, slow distillation to the neocortex during sleep
- Selective forgetting — memories decay unless reinforced through spaced retrieval
- Reconsolidation — retrieved memories become temporarily unstable and can be refined by new context
- Cognitive dissonance — contradicting beliefs trigger error signals proportional to the conflict
- Retrieval-induced forgetting — recalling a memory suppresses similar competitors, sharpening recall
- Hierarchical compression — raw experience is compressed 2,000,000:1 into schemas and generative models
We implemented all six.
Raw conversation --> Episodic Memory --> Semantic Memory --> Schema
(fast capture) (distilled facts) (behavioral patterns)
| Layer | Example | Created |
|---|---|---|
| Episodic | "On March 15, user said delay the launch because engineering is behind" | After each conversation turn |
| Semantic | "User delayed product launch due to engineering delays (March 2026)" | During consolidation (every 6h) |
| Schema | "User prioritizes engineering readiness over market timing in launch decisions" | When 3+ semantic memories cluster |
Every memory has a strength
1. Idempotent time decay — uses a hyperbolic retention function (Wixted & Ebbesen 1991; Rubin & Wenzel 1996) rather than the exponential originally proposed by Ebbinghaus (1885). The hyperbolic form has a heavier tail — memories linger longer before vanishing, matching 100+ years of empirical forgetting data:
where
2. Spacing-aware retrieval reinforcement — boost scales with time since last access and diminishes near the ceiling:
where
| Scenario | Flat boost (old) | Spacing-aware (new) |
|---|---|---|
| Recalled 30 sec ago, |
|
|
| Recalled 1 day ago, |
||
| Recalled 7 days ago, |
|
|
| Recalled 7 days ago, |
|
3. Evidence-weighted contradiction (Bayesian likelihood-ratio penalty) — penalty modulated by relative strength of new evidence vs old belief:
where
4. Retrieval-induced forgetting (Anderson, Bjork & Bjork 1994) — when a memory is retrieved, similar-but-non-retrieved competitors are mildly suppressed (3%), sharpening recall over time:
Schemas are exempt from suppression. Confirmed across 60+ studies by Murayama et al. (2014).
Memories below
A scheduled background daemon runs every
-
Clustering: DBSCAN on cosine distance matrix
$D_{ij} = 1 - \cos(\mathbf{e}_i, \mathbf{e}_j)$ with$\varepsilon = 0.35$ ,min_samples$= 3$ -
Distillation: Each cluster
$C_k$ with$|C_k| \geq 3$ is distilled by Claude Haiku into one semantic memory - Centrality-weighted decay: Source episodics fade based on distance from cluster centroid — $\gamma_i = 0.5 + 0.4 \cdot (1 - \text{sim}(\mathbf{e}i, \bar{\mathbf{e}}{C_k}))$ — central memories fade more, peripheral ones retain unique details
-
Schema synthesis: Re-cluster semantics (
$\varepsilon = 0.45$ ), synthesize behavioral patterns from clusters of$\geq 3$ - Idempotent decay pass: Hyperbolic curve applied to all memories not accessed in 7+ days
-
Priority snapshots: Compares current priorities with 30-day-old snapshot, classifies as
deliberate_pivot|gradual_drift|stable
Based on Nader, Schafe & LeDoux (2000): when memory
If re-retrieved while already labile, the window extends:
This implements a dual belief-update architecture:
| Pathway | Trigger | Behavior |
|---|---|---|
| Reconsolidation | Memory retrieved | Passive refinement: "I prefer async" |
| Contradiction detection | Explicit conflict | Evidence-weighted superseded_by link |
Reconsolidation catches gradual belief drift that hard contradiction detection would miss. No other production agent memory system implements retrieval-triggered lability windows.
Two-pass detection:
- Real-time (during extraction): Every new decision or preference is checked against existing memories. Claude Haiku identifies semantic conflicts. Old memory receives evidence-weighted strength penalty.
- Offline (during consolidation): Full audit across the memory store for subtle contradictions missed in real-time.
Decision ledger — decisions are first-class objects with:
decision_text+reasoning+domain+outcome- Explicit supersession chains: when a decision is reversed, the old one links to the new one
- The agent can query the ledger by topic or domain to surface prior decisions with their reasoning
The same query produces different results depending on context. The composite query vector blends the current message with recent conversation state:
Candidates are scored by a four-factor product:
where
| Capability | MemGPT/Letta | Mem0 | Zep | A-Mem | MemoryBank | FADEMEM | Agenternal |
|---|---|---|---|---|---|---|---|
| Memory hierarchy | 2 flat layers | Flat | 3 tiers | Flat | Flat | Flat | Episodic --> Semantic --> Schema |
| Forgetting | None | None | Staleness | Activation | Ebbinghaus | Adaptive exp. | Idempotent hyperbolic |
| Spacing effect | None | None | None | None | None | None | Power-law stability + spacing-scaled boost |
| Contradiction model | None | None | None | None | None | Exp. suppression | Evidence-weighted (Bayesian likelihood-ratio) |
| Retrieval-induced forgetting | None | None | None | None | None | None | 3% suppression of close competitors |
| Reconsolidation | None | None | None | None | None | None | 6h lability windows |
| Pattern abstraction | None | None | None | None | None | None | DBSCAN --> behavioral schemas |
| Decision tracking | None | None | None | None | None | None | Ledger with supersession chains |
| Context-sensitive retrieval | None | None | User-aware | None | None | None | Intent + recency + layer weighted |
| Offline consolidation | None | None | None | None | None | None | Scheduled daemon with centrality-weighted decay |
| Priority drift detection | None | None | None | None | None | None | Snapshot comparison + drift classification |
| Component | Technology |
|---|---|
| LLM | Claude Sonnet 4 (streaming) + Claude Haiku 4.5 (extraction, consolidation) |
| Frontend | Next.js 16, React 19, Tailwind CSS 4 |
| Backend | FastAPI, Python 3.12 |
| Database | PostgreSQL 17 + pgvector |
| Embeddings | fastembed BAAI/bge-small-en-v1.5 (384 dims, local ONNX) |
| Clustering | scikit-learn DBSCAN (cosine distance) |
| Search | Claude native web search (web_search_20250305) |
| Deployment | Docker Compose / Railway |
- Docker & Docker Compose
- Anthropic API key
echo "ANTHROPIC_API_KEY=your-key-here" > backend/.envdocker compose up -d --build- Chat: http://localhost:3001
- Memory: http://localhost:3001/memory
- Tasks: http://localhost:3001/tasks
- API docs: http://localhost:8000/docs
curl -X POST http://localhost:8000/api/consolidateIn a new Railway project, create three services:
| Service | How | Root directory | Port |
|---|---|---|---|
| PostgreSQL | "New" > "Database" > "PostgreSQL" | — | auto |
| backend | "New" > "GitHub Repo" > this repo | /backend |
8000 |
| frontend | "New" > "GitHub Repo" > this repo | /frontend |
3000 |
backend:
| Variable | Value |
|---|---|
ANTHROPIC_API_KEY |
Your Anthropic API key |
DATABASE_URL |
Copy from Railway PostgreSQL service (auto-converts postgresql:// to postgresql+asyncpg://) |
CORS_ORIGINS |
https://<your-frontend>.up.railway.app |
frontend:
| Variable | Value |
|---|---|
NEXT_PUBLIC_API_URL |
https://<your-backend>.up.railway.app |
Railway's PostgreSQL supports pgvector. The backend automatically runs CREATE EXTENSION IF NOT EXISTS vector on startup.
- The backend Dockerfile pre-downloads the ONNX embedding model at build time (~100MB) — no cold-start delay
- The consolidation scheduler starts automatically with the backend (every 6h)
- Health check:
GET /api/health - Currently public (no auth) — add authentication before sharing widely
agenternal/
├── docker-compose.yml
├── docs/
│ └── brain-inspired-memory-research.md # Full research document (formulas, literature review)
│
├── backend/
│ ├── main.py # FastAPI + consolidation scheduler
│ ├── config.py
│ ├── agent/
│ │ └── prompts.py # System prompts (response style, memory instructions)
│ ├── memory/
│ │ ├── archival_memory.py # Spacing-aware search + retrieval reinforcement
│ │ ├── background_agent.py # Post-turn extraction + evidence-weighted contradictions
│ │ ├── compression.py # Conversation rolling summaries
│ │ ├── consolidation.py # Sleep replay: clustering, distillation, schema synthesis
│ │ ├── core_memory.py # Always-in-context user profile (4 blocks)
│ │ ├── decisions.py # Decision ledger with supersession chains
│ │ ├── embeddings.py # Local ONNX embedding model
│ │ ├── knowledge_graph.py # Graph RAG with fuzzy entity dedup
│ │ ├── manager.py # Context-sensitive retrieval orchestration
│ │ ├── recall.py # Conversation history search
│ │ ├── reconsolidation.py # Lability windows (Nader et al. 2000)
│ │ └── scheduler.py # Consolidation background task (6h interval)
│ ├── tools/
│ │ └── agent_tools.py # 14 agent tools (memory CRUD, search, delete, insights)
│ ├── api/
│ │ ├── chat.py # SSE streaming with tool use loop
│ │ ├── memory.py # Memory health API
│ │ ├── knowledge.py # Knowledge graph API
│ │ ├── tasks.py # Task management API
│ │ └── onboarding.py # First-time setup flow
│ └── db/
│ └── models.py # 9 tables (conversations, messages, core_memory,
│ # archival_memory, entities, relationships,
│ # tasks, memory_decisions, memory_schemas,
│ # priority_snapshots)
│
└── frontend/
└── src/
├── app/
│ ├── page.tsx # Chat + sidebar + memory panel
│ ├── memory/page.tsx # Memory explorer (core, archival, graph)
│ └── tasks/page.tsx # Task manager
├── components/
│ ├── ChatWindow.tsx # Streaming chat with thinking + tool indicators
│ ├── MessageBubble.tsx # Message rendering with markdown
│ ├── MemoryPanel.tsx # Live memory activity + insights panel
│ ├── Sidebar.tsx # Conversation list
│ ├── KnowledgeGraph.tsx # Force-directed graph visualization
│ └── chat/ # Sub-components (code blocks, thinking, tools, cards)
├── lib/
│ ├── api.ts # API client + SSE streaming
│ └── context/chat-context.tsx # React context (chat state + memory events)
└── types/chat.ts
| Tool | Purpose |
|---|---|
web_search |
DuckDuckGo search for current information |
collect_info |
Interactive form cards for structured input |
core_memory_append |
Append to always-in-context memory |
core_memory_replace |
Update or remove core memory content |
graph_memory_add |
Create/update knowledge graph entities |
graph_memory_search |
Search graph with 1-2 hop traversal |
graph_memory_delete |
Remove entities and their relationships |
archival_memory_insert |
Store facts in long-term memory |
archival_memory_search |
Semantic search over archival memory |
archival_memory_delete |
Remove incorrect memories |
memory_insights |
Query abstracted behavioral patterns |
decision_search |
Search the decision ledger by topic/domain |
conversation_search |
Search past conversations by content |
conversation_search_date |
Search conversations by date range |
POST /api/chat/send— SSE streaming with tool use loopGET /api/chat/conversations— List conversationsGET /api/chat/conversations/:id/messages— Get messagesDELETE /api/chat/conversations/:id— Delete conversation
GET /api/memory/core— Core memory sectionsPUT /api/memory/core— Update core memoryGET /api/memory/archival— Archival memories (with layer, strength)GET /api/memory/search?q=— Semantic searchGET /api/memory/health— Layer stats, schemas, decisions, priority timelineGET /api/memory/labile— Count of currently labile memories
GET /api/knowledge/entities— List entitiesGET /api/knowledge/entities/:id— Entity with relationshipsGET /api/knowledge/graph— Full graph data for visualizationGET /api/knowledge/stats— Graph statistics
POST /api/consolidate— Manually trigger memory consolidationGET /api/health— Service health check
- Ebbinghaus, H. (1885). Über das Gedächtnis. Original forgetting curve.
- Pimsleur, P. (1967). "A memory schedule." Modern Language Journal, 51(2), 73–75. Graduated-interval recall.
- Rescorla, R.A. & Wagner, A.R. (1972). "A theory of Pavlovian conditioning." In Classical Conditioning II, pp. 64–99. Additive prediction-error model.
- Wickelgren, W.A. (1974). "Single-trace fragility theory of memory dynamics." Memory & Cognition, 2(4), 775–780. Power-law forgetting.
- Bjork, R.A. & Bjork, E.L. (1992). "A new theory of disuse." Storage strength vs retrieval strength.
- Wixted, J.T. & Ebbesen, E.B. (1991). "On the form of forgetting." Psychological Science, 2(6), 409–415. Hyperbolic/power-law forgetting curves.
- Anderson, M.C., Bjork, R.A. & Bjork, E.L. (1994). "Remembering can cause forgetting." Journal of Experimental Psychology: LMC, 20(5), 1063–1087. Retrieval-induced forgetting.
- McClelland, J.L., McNaughton, B.L. & O'Reilly, R.C. (1995). "Why there are complementary learning systems." Psychological Review, 102(3), 419–457.
- Wozniak, P.A. & Gorzelańczyk, E.J. (1995). "Two components of long-term memory." Acta Neurobiologiae Experimentalis, 55, 301–305. Two-component stability model.
- Rubin, D.C. & Wenzel, A.E. (1996). "One hundred years of forgetting." Psychological Review, 103(4), 734–760. Meta-analysis: power-law retention.
- Rao, R.P.N. & Ballard, D.H. (1999). "Predictive coding in the visual cortex." Nature Neuroscience, 2(1), 79–87.
- Nader, K., Schafe, G.E. & LeDoux, J.E. (2000). "Fear memories require protein synthesis for reconsolidation." Nature, 406, 722–726.
- Walker, M.P. et al. (2003). "Dissociable stages of memory consolidation and reconsolidation." Nature, 425, 616–620.
- Cepeda, N.J. et al. (2006). "Distributed practice in verbal recall tasks." Psychological Bulletin, 132(3), 354–380. Meta-analysis: spacing effect.
- Karpicke, J.D. & Roediger, H.L. (2008). "The critical importance of retrieval for learning." Science, 319, 966–968.
- Friston, K. (2010). "The free-energy principle." Nature Reviews Neuroscience, 11(2), 127–138. Bayesian brain hypothesis.
- Murayama, K. et al. (2014). "Forgetting as a consequence of retrieval." Psychological Bulletin, 140(5), 1383–1409. RIF meta-analysis.
- Packer et al. (2023). "MemGPT: Towards LLMs as Operating Systems." arXiv:2310.08560
- MemoryBank (2023). "Enhancing LLMs with Long-Term Memory." arXiv:2305.10250
- Zep (2025). "A Temporal Knowledge Graph Architecture for Agent Memory." arXiv:2501.13956
- FADEMEM (2026). "Biologically-Inspired Forgetting and Adaptive Memory." arXiv:2601.18642
- TiMem (2026). "Temporal-Hierarchical Memory Consolidation." arXiv:2601.02845
- TraceMem (2026). "Weaving Narrative Memory Schemata." arXiv:2602.09712
See docs/brain-inspired-memory-research.md for the full research document with LaTeX formulas, literature comparison, and novelty assessment.
MIT