Agenternal

A brain-inspired memory management system for AI agents that forgets, consolidates, reconsolidates, detects contradictions, suppresses competitors, and abstracts behavioral patterns — capabilities no existing AI memory system (MemGPT, Mem0, Zep, or A-Mem) has shipped in production.

Paper: Why AI Needs to Dream: A Sleep/Wake Memory Architecture for Conversational Agents (April 2026)

Key Results

	Result	Significance
Spacing effect	9.4x retention advantage at 90 days	First implementation in any agent memory system
Consolidation	126 semantic + 130 schema emerge from 920 episodic	Three-layer hierarchy forms autonomously with 72% storage reduction
LongMemEval-M	42% EM / 52.6% LLM-judge (500 questions, ~10K memories each)	Temporal reasoning (37.6%) is weakest — active area of improvement
Longitudinal (90 days)	Full system: 13% → 33% accuracy	No-decay degrades to 9% as 16K memories drown retrieval. 3.5x advantage.
Ablation	Exponential decay and linear stability produce 0% EM	Our design choices (hyperbolic + power-law) are the only ones that work
Unit tests	125/125 passing	Pure math verification of all formulas, no DB or LLM needed

The central finding

On LongMemEval-M (500 questions, ~10K memories per question), the full system scores 42% EM / 52.6% LLM-judge. Temporal reasoning (37.6%) and preferences (6.7%) are the weakest categories — active areas of improvement. Single-session recall is strong (75–79%).

On our 90-day longitudinal benchmark (~16,000 memories), the picture reverses: the full system reaches 33% while systems without forgetting degrade to 9%. This is the crossover effect — once enough memories accumulate, forgetting is not just helpful, it's essential.

No existing benchmark tests this. We derive analytically why the crossover exists and validate it empirically.

The Problem

Every AI assistant today has the same memory flaw: it stores everything and retrieves by similarity. This is a filing cabinet, not a brain. As conversations accumulate, the system drowns in redundant facts, conflicting information, and outdated priorities — with no mechanism to resolve any of it.

The human brain solves this with six mechanisms that run simultaneously:

Two-stage consolidation — fast capture in the hippocampus, slow distillation to the neocortex during sleep
Selective forgetting — memories decay unless reinforced through spaced retrieval
Reconsolidation — retrieved memories become temporarily unstable and can be refined by new context
Cognitive dissonance — contradicting beliefs trigger error signals proportional to the conflict
Retrieval-induced forgetting — recalling a memory suppresses similar competitors, sharpening recall
Hierarchical compression — raw experience is compressed 2,000,000:1 into schemas and generative models

We implemented all six.

Memory Architecture

Three-Layer Hierarchy

Raw conversation --> Episodic Memory --> Semantic Memory --> Schema
                    (fast capture)     (distilled facts)   (behavioral patterns)

Layer	Example	Created
Episodic	"On March 15, user said delay the launch because engineering is behind"	After each conversation turn
Semantic	"User delayed product launch due to engineering delays (March 2026)"	During consolidation (every 6h)
Schema	"User prioritizes engineering readiness over market timing in launch decisions"	When 3+ semantic memories cluster

Memory Strength (Hyperbolic Decay + Spacing Effect)

Every memory has a strength $s \in [0.05,; 2.0]$ governed by four forces:

1. Idempotent time decay — uses a hyperbolic retention function (Wixted & Ebbesen 1991; Rubin & Wenzel 1996) rather than the exponential originally proposed by Ebbinghaus (1885). The hyperbolic form has a heavier tail — memories linger longer before vanishing, matching 100+ years of empirical forgetting data:

$$S(m) = 1 + c \cdot (n_m^{\text{spaced}})^p, \quad c = 0.5, ; p = 1.5$$

$$R(m, t) = \frac{1}{1 + k ;\cdot; \dfrac{\Delta t_m}{S(m)}}, \quad k = \frac{1}{H}$$

$$s_m \leftarrow \max!\Big(0.05,;; s_m^{(0)} \cdot R(m, t)\Big)$$

where $s_m^{(0)}$ is the strength at last access (idempotent base), $H = 10$ days is the half-life at baseline stability, and $n_m^{\text{spaced}}$ is the spaced access count — only retrievals with a gap $\geq 12\text{h}$ increment it (Cepeda et al. 2006; Bjork & Bjork 1992). Stability grows as a power law of spaced retrievals (Pimsleur 1967), so each spaced retrieval builds proportionally more stability than the last — unlike linear growth where the 10th retrieval adds the same as the 1st.

2. Spacing-aware retrieval reinforcement — boost scales with time since last access and diminishes near the ceiling:

$$\alpha(m) = \alpha_{\max} \cdot \underbrace{\left(1 - \frac{s_m}{s_{\max}}\right)}_{\text{diminishing at ceiling}} \cdot \underbrace{\left(1 - e^{-\Delta t_m ,/, \tau}\right)}_{\text{spacing effect}}$$

$$s_m \leftarrow \min!\Big(s_{\max},;; s_m + \alpha(m)\Big)$$

where $\alpha_{\max} = 0.15$, $s_{\max} = 2.0$, $\tau = 24\text{h}$.

Scenario	Flat boost (old)	Spacing-aware (new)
Recalled 30 sec ago, $s=1.0$	$+5% = 1.05$	$+0.001$ (near zero)
Recalled 1 day ago, $s=1.0$	$+5% = 1.05$	$+0.059$
Recalled 7 days ago, $s=1.0$	$+5% = 1.05$	$+0.075$ (near max)
Recalled 7 days ago, $s=1.8$	$+5% = 1.89$	$+0.015$ (diminishing)

3. Evidence-weighted contradiction (Bayesian likelihood-ratio penalty) — penalty modulated by relative strength of new evidence vs old belief:

$$\beta(s_{\text{old}},, c_{\text{new}}) = \beta_{\min} + (\beta_{\max} - \beta_{\min}) \cdot e^{-c_{\text{new}} ,/, s_{\text{old}}}$$

$$s_{\text{old}} \leftarrow \max!\Big(0.05,;; s_{\text{old}} \cdot \beta\Big)$$

where $\beta_{\min} = 0.2$ (harshest), $\beta_{\max} = 0.85$ (mildest), and $c_{\text{new}}$ is derived from the memory's category (decisions=1.0, facts=0.9, preferences=0.75, opinions=0.6) rather than LLM self-assessed confidence. Strong beliefs resist weak contradictions; weak beliefs yield to strong evidence. Inspired by Bayesian belief revision (Friston 2010).

4. Retrieval-induced forgetting (Anderson, Bjork & Bjork 1994) — when a memory is retrieved, similar-but-non-retrieved competitors are mildly suppressed (3%), sharpening recall over time:

$$\forall, m_j \notin \text{retrieved}: \quad \text{if } \text{sim}(\mathbf{e}_{m_j}, \mathbf{q}) > 0.7, \quad s_{m_j} \leftarrow s_{m_j} \cdot 0.97$$

Schemas are exempt from suppression. Confirmed across 60+ studies by Murayama et al. (2014).

Memories below $\tau = 0.1$ become invisible to the AI but remain in the database for the user to inspect.

Consolidation ("Sleep Replay")

A scheduled background daemon runs every $T = 6$ hours:

Clustering: DBSCAN on cosine distance matrix $D_{ij} = 1 - \cos(\mathbf{e}_i, \mathbf{e}_j)$ with $\varepsilon = 0.35$, min_samples $= 3$
Distillation: Each cluster $C_k$ with $|C_k| \geq 3$ is distilled by Claude Haiku into one semantic memory
Centrality-weighted decay: Source episodics fade based on distance from cluster centroid — $\gamma_i = 0.5 + 0.4 \cdot (1 - \text{sim}(\mathbf{e}i, \bar{\mathbf{e}}{C_k}))$ — central memories fade more, peripheral ones retain unique details
Schema synthesis: Re-cluster semantics ($\varepsilon = 0.45$), synthesize behavioral patterns from clusters of $\geq 3$
Idempotent decay pass: Hyperbolic curve applied to all memories not accessed in 7+ days
Priority snapshots: Compares current priorities with 30-day-old snapshot, classifies as deliberate_pivot | gradual_drift | stable

Memory Reconsolidation (Lability Windows)

Based on Nader, Schafe & LeDoux (2000): when memory $m$ is retrieved at time $t_r$, it enters a labile state for $W = 6$ hours:

$$m.\text{labile} \leftarrow \text{true}, \quad m.t_{\text{recon}} \leftarrow t_r + W$$

If re-retrieved while already labile, the window extends: $m.t_{\text{recon}} \leftarrow \max(m.t_{\text{recon}},; t_r + W)$. During this window, new conversation context can passively refine the memory content without requiring an explicit contradiction. The labile set is capped at $|\mathcal{L}| \leq 10$.

This implements a dual belief-update architecture:

Pathway	Trigger	Behavior
Reconsolidation	Memory retrieved	Passive refinement: "I prefer async" $\rightarrow$ "I prefer async, except for urgent issues"
Contradiction detection	Explicit conflict	Evidence-weighted $\beta$ penalty + `superseded_by` link

Reconsolidation catches gradual belief drift that hard contradiction detection would miss. No other production agent memory system implements retrieval-triggered lability windows.

Contradiction Detection + Decision Ledger

Two-pass detection:

Real-time (during extraction): Every new decision or preference is checked against existing memories. Claude Haiku identifies semantic conflicts. Old memory receives evidence-weighted strength penalty.
Offline (during consolidation): Full audit across the memory store for subtle contradictions missed in real-time.

Decision ledger — decisions are first-class objects with:

decision_text + reasoning + domain + outcome
Explicit supersession chains: when a decision is reversed, the old one links to the new one
The agent can query the ledger by topic or domain to surface prior decisions with their reasoning

Context-Sensitive Retrieval

The same query produces different results depending on context. The composite query vector blends the current message with recent conversation state:

$$\mathbf{q}_{\text{composite}} = \frac{0.6 \cdot \mathbf{e}_{\text{query}} + 0.4 \cdot \sum_{i=1}^{n} w_i \cdot \mathbf{e}_{\text{recent}_i}}{\left| \cdots \right|}, \quad w_i = \frac{\lambda^{i-1}}{\sum_{j=1}^{n} \lambda^{j-1}}, ; \lambda = 0.5$$

Candidates are scored by a four-factor product:

$$\text{score}(m, q, I) = \text{sim}(\mathbf{e}_m, \mathbf{q}) ;\cdot; s_m ;\cdot; b_{\text{cat}}(m, I) ;\cdot; b_{\text{layer}}(m, I)$$

where $I \in {\texttt{decision}, \texttt{research}, \texttt{tasks}, \texttt{chat}}$ selects the intent-specific boost profile. Decision mode boosts decisions $1.5\times$ and schemas $1.5\times$; research mode boosts semantic and schema layers.

How It Differs

Capability	MemGPT/Letta	Mem0	Zep	A-Mem	MemoryBank	FADEMEM	Agenternal
Memory hierarchy	2 flat layers	Flat	3 tiers	Flat	Flat	Flat	Episodic --> Semantic --> Schema
Forgetting	None	None	Staleness	Activation	Ebbinghaus	Adaptive exp.	Idempotent hyperbolic
Spacing effect	None	None	None	None	None	None	Power-law stability + spacing-scaled boost
Contradiction model	None	None	None	None	None	Exp. suppression	Evidence-weighted (Bayesian likelihood-ratio)
Retrieval-induced forgetting	None	None	None	None	None	None	3% suppression of close competitors
Reconsolidation	None	None	None	None	None	None	6h lability windows
Pattern abstraction	None	None	None	None	None	None	DBSCAN --> behavioral schemas
Decision tracking	None	None	None	None	None	None	Ledger with supersession chains
Context-sensitive retrieval	None	None	User-aware	None	None	None	Intent + recency + layer weighted
Offline consolidation	None	None	None	None	None	None	Scheduled daemon with centrality-weighted decay
Priority drift detection	None	None	None	None	None	None	Snapshot comparison + drift classification

Tech Stack

Component	Technology
LLM	Claude Sonnet 4 (streaming) + Claude Haiku 4.5 (extraction, consolidation)
Frontend	Next.js 16, React 19, Tailwind CSS 4
Backend	FastAPI, Python 3.12
Database	PostgreSQL 17 + pgvector
Embeddings	fastembed BAAI/bge-small-en-v1.5 (384 dims, local ONNX)
Clustering	scikit-learn DBSCAN (cosine distance)
Search	Claude native web search (`web_search_20250305`)
Deployment	Docker Compose / Railway

Quick Start (Local)

Prerequisites

Docker & Docker Compose
Anthropic API key

1. Configure

echo "ANTHROPIC_API_KEY=your-key-here" > backend/.env

2. Start

docker compose up -d --build

3. Open

Chat: http://localhost:3001
Memory: http://localhost:3001/memory
Tasks: http://localhost:3001/tasks
API docs: http://localhost:8000/docs

4. Trigger consolidation manually

curl -X POST http://localhost:8000/api/consolidate

Deploy to Railway

1. Create services

In a new Railway project, create three services:

Service	How	Root directory	Port
PostgreSQL	"New" > "Database" > "PostgreSQL"	—	auto
backend	"New" > "GitHub Repo" > this repo	`/backend`	8000
frontend	"New" > "GitHub Repo" > this repo	`/frontend`	3000

2. Set environment variables

backend:

Variable	Value
`ANTHROPIC_API_KEY`	Your Anthropic API key
`DATABASE_URL`	Copy from Railway PostgreSQL service (auto-converts `postgresql://` to `postgresql+asyncpg://`)
`CORS_ORIGINS`	`https://<your-frontend>.up.railway.app`

frontend:

Variable	Value
`NEXT_PUBLIC_API_URL`	`https://<your-backend>.up.railway.app`

3. Enable pgvector

Railway's PostgreSQL supports pgvector. The backend automatically runs CREATE EXTENSION IF NOT EXISTS vector on startup.

Notes

The backend Dockerfile pre-downloads the ONNX embedding model at build time (~100MB) — no cold-start delay
The consolidation scheduler starts automatically with the backend (every 6h)
Health check: GET /api/health
Currently public (no auth) — add authentication before sharing widely

Project Structure

agenternal/
├── docker-compose.yml
├── docs/
│   └── brain-inspired-memory-research.md   # Full research document (formulas, literature review)
│
├── backend/
│   ├── main.py                             # FastAPI + consolidation scheduler
│   ├── config.py
│   ├── agent/
│   │   └── prompts.py                      # System prompts (response style, memory instructions)
│   ├── memory/
│   │   ├── archival_memory.py              # Spacing-aware search + retrieval reinforcement
│   │   ├── background_agent.py             # Post-turn extraction + evidence-weighted contradictions
│   │   ├── compression.py                  # Conversation rolling summaries
│   │   ├── consolidation.py                # Sleep replay: clustering, distillation, schema synthesis
│   │   ├── core_memory.py                  # Always-in-context user profile (4 blocks)
│   │   ├── decisions.py                    # Decision ledger with supersession chains
│   │   ├── embeddings.py                   # Local ONNX embedding model
│   │   ├── knowledge_graph.py              # Graph RAG with fuzzy entity dedup
│   │   ├── manager.py                      # Context-sensitive retrieval orchestration
│   │   ├── recall.py                       # Conversation history search
│   │   ├── reconsolidation.py              # Lability windows (Nader et al. 2000)
│   │   └── scheduler.py                    # Consolidation background task (6h interval)
│   ├── tools/
│   │   └── agent_tools.py                  # 14 agent tools (memory CRUD, search, delete, insights)
│   ├── api/
│   │   ├── chat.py                         # SSE streaming with tool use loop
│   │   ├── memory.py                       # Memory health API
│   │   ├── knowledge.py                    # Knowledge graph API
│   │   ├── tasks.py                        # Task management API
│   │   └── onboarding.py                   # First-time setup flow
│   └── db/
│       └── models.py                       # 9 tables (conversations, messages, core_memory,
│                                           #   archival_memory, entities, relationships,
│                                           #   tasks, memory_decisions, memory_schemas,
│                                           #   priority_snapshots)
│
└── frontend/
    └── src/
        ├── app/
        │   ├── page.tsx                    # Chat + sidebar + memory panel
        │   ├── memory/page.tsx             # Memory explorer (core, archival, graph)
        │   └── tasks/page.tsx              # Task manager
        ├── components/
        │   ├── ChatWindow.tsx              # Streaming chat with thinking + tool indicators
        │   ├── MessageBubble.tsx           # Message rendering with markdown
        │   ├── MemoryPanel.tsx             # Live memory activity + insights panel
        │   ├── Sidebar.tsx                 # Conversation list
        │   ├── KnowledgeGraph.tsx          # Force-directed graph visualization
        │   └── chat/                       # Sub-components (code blocks, thinking, tools, cards)
        ├── lib/
        │   ├── api.ts                      # API client + SSE streaming
        │   └── context/chat-context.tsx    # React context (chat state + memory events)
        └── types/chat.ts

Agent Tools (14)

Tool	Purpose
`web_search`	DuckDuckGo search for current information
`collect_info`	Interactive form cards for structured input
`core_memory_append`	Append to always-in-context memory
`core_memory_replace`	Update or remove core memory content
`graph_memory_add`	Create/update knowledge graph entities
`graph_memory_search`	Search graph with 1-2 hop traversal
`graph_memory_delete`	Remove entities and their relationships
`archival_memory_insert`	Store facts in long-term memory
`archival_memory_search`	Semantic search over archival memory
`archival_memory_delete`	Remove incorrect memories
`memory_insights`	Query abstracted behavioral patterns
`decision_search`	Search the decision ledger by topic/domain
`conversation_search`	Search past conversations by content
`conversation_search_date`	Search conversations by date range

API Endpoints

Chat

POST /api/chat/send — SSE streaming with tool use loop
GET /api/chat/conversations — List conversations
GET /api/chat/conversations/:id/messages — Get messages
DELETE /api/chat/conversations/:id — Delete conversation

Memory

GET /api/memory/core — Core memory sections
PUT /api/memory/core — Update core memory
GET /api/memory/archival — Archival memories (with layer, strength)
GET /api/memory/search?q= — Semantic search
GET /api/memory/health — Layer stats, schemas, decisions, priority timeline
GET /api/memory/labile — Count of currently labile memories

Knowledge Graph

GET /api/knowledge/entities — List entities
GET /api/knowledge/entities/:id — Entity with relationships
GET /api/knowledge/graph — Full graph data for visualization
GET /api/knowledge/stats — Graph statistics

System

POST /api/consolidate — Manually trigger memory consolidation
GET /api/health — Service health check

References

Brain Science & Cognitive Psychology

Ebbinghaus, H. (1885). Über das Gedächtnis. Original forgetting curve.
Pimsleur, P. (1967). "A memory schedule." Modern Language Journal, 51(2), 73–75. Graduated-interval recall.
Rescorla, R.A. & Wagner, A.R. (1972). "A theory of Pavlovian conditioning." In Classical Conditioning II, pp. 64–99. Additive prediction-error model.
Wickelgren, W.A. (1974). "Single-trace fragility theory of memory dynamics." Memory & Cognition, 2(4), 775–780. Power-law forgetting.
Bjork, R.A. & Bjork, E.L. (1992). "A new theory of disuse." Storage strength vs retrieval strength.
Wixted, J.T. & Ebbesen, E.B. (1991). "On the form of forgetting." Psychological Science, 2(6), 409–415. Hyperbolic/power-law forgetting curves.
Anderson, M.C., Bjork, R.A. & Bjork, E.L. (1994). "Remembering can cause forgetting." Journal of Experimental Psychology: LMC, 20(5), 1063–1087. Retrieval-induced forgetting.
McClelland, J.L., McNaughton, B.L. & O'Reilly, R.C. (1995). "Why there are complementary learning systems." Psychological Review, 102(3), 419–457.
Wozniak, P.A. & Gorzelańczyk, E.J. (1995). "Two components of long-term memory." Acta Neurobiologiae Experimentalis, 55, 301–305. Two-component stability model.
Rubin, D.C. & Wenzel, A.E. (1996). "One hundred years of forgetting." Psychological Review, 103(4), 734–760. Meta-analysis: power-law retention.
Rao, R.P.N. & Ballard, D.H. (1999). "Predictive coding in the visual cortex." Nature Neuroscience, 2(1), 79–87.
Nader, K., Schafe, G.E. & LeDoux, J.E. (2000). "Fear memories require protein synthesis for reconsolidation." Nature, 406, 722–726.
Walker, M.P. et al. (2003). "Dissociable stages of memory consolidation and reconsolidation." Nature, 425, 616–620.
Cepeda, N.J. et al. (2006). "Distributed practice in verbal recall tasks." Psychological Bulletin, 132(3), 354–380. Meta-analysis: spacing effect.
Karpicke, J.D. & Roediger, H.L. (2008). "The critical importance of retrieval for learning." Science, 319, 966–968.
Friston, K. (2010). "The free-energy principle." Nature Reviews Neuroscience, 11(2), 127–138. Bayesian brain hypothesis.
Murayama, K. et al. (2014). "Forgetting as a consequence of retrieval." Psychological Bulletin, 140(5), 1383–1409. RIF meta-analysis.

AI Memory Systems

Packer et al. (2023). "MemGPT: Towards LLMs as Operating Systems." arXiv:2310.08560
MemoryBank (2023). "Enhancing LLMs with Long-Term Memory." arXiv:2305.10250
Zep (2025). "A Temporal Knowledge Graph Architecture for Agent Memory." arXiv:2501.13956
FADEMEM (2026). "Biologically-Inspired Forgetting and Adaptive Memory." arXiv:2601.18642
TiMem (2026). "Temporal-Hierarchical Memory Consolidation." arXiv:2601.02845
TraceMem (2026). "Weaving Narrative Memory Schemata." arXiv:2602.09712

See docs/brain-inspired-memory-research.md for the full research document with LaTeX formulas, literature comparison, and novelty assessment.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 229 Commits
backend		backend
blog		blog
docs		docs
eval		eval
explain		explain
frontend		frontend
paper		paper
to_learn		to_learn
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
ISSUES.md		ISSUES.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

Agenternal

Key Results

The Problem

Memory Architecture

Three-Layer Hierarchy

Memory Strength (Hyperbolic Decay + Spacing Effect)

Consolidation ("Sleep Replay")

Memory Reconsolidation (Lability Windows)

Contradiction Detection + Decision Ledger

Context-Sensitive Retrieval

How It Differs

Tech Stack

Quick Start (Local)

Prerequisites

1. Configure

2. Start

3. Open

4. Trigger consolidation manually

Deploy to Railway

1. Create services

2. Set environment variables

3. Enable pgvector

Notes

Project Structure

Agent Tools (14)

API Endpoints

Chat

Memory

Knowledge Graph

System

References

Brain Science & Cognitive Psychology

AI Memory Systems

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages