research: evaluate RL-learned memory policies (Memory-R1, Mem-alpha)

## Summary

Two papers introduce RL-trained memory management — the single biggest paradigm shift in the agent memory field:

### Memory-R1 (Aug 2025, arxiv 2508.19828)
- RL-trained ADD/UPDATE/DELETE/NOOP operations via PPO/GRPO
- Only 152 training examples needed
- Memory Manager learns optimal operations from outcome-driven reward

### Mem-alpha (Sep 2025, arxiv 2509.25911)
- Trained on 30K token sequences, generalizes to 400K+ (13x training length)
- Reward signal from downstream QA accuracy
- Builds core/episodic/semantic memory with multiple tools

## Why this matters

Aletheia currently uses hand-coded Datalog rules for memory management decisions (what to store, when to decay, what to consolidate). These papers suggest that RL-learned policies can outperform hand-coded rules and generalize to unseen scenarios.

## Proposed evaluation

1. Define aletheia's memory management as an MDP: state = current knowledge graph, actions = store/update/decay/consolidate, reward = downstream task success
2. Benchmark current hand-coded rules against this formulation
3. Evaluate whether a small RL-trained policy (152-30K examples) could improve memory management decisions
4. If promising, integrate as a learned decay/consolidation policy alongside existing rules

## Risk

Self-reinforcing error: agent incorrectly learns to avoid a memory path and never corrects (identified by SAGE, arxiv 2409.00872). Prosoche self-audit would need to detect this.

## Source

Memory-R1 (arxiv 2508.19828), Mem-alpha (arxiv 2509.25911)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research: evaluate RL-learned memory policies (Memory-R1, Mem-alpha) #2850

Summary

Memory-R1 (Aug 2025, arxiv 2508.19828)

Mem-alpha (Sep 2025, arxiv 2509.25911)

Why this matters

Proposed evaluation

Risk

Source

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research: evaluate RL-learned memory policies (Memory-R1, Mem-alpha) #2850

Description

Summary

Memory-R1 (Aug 2025, arxiv 2508.19828)

Mem-alpha (Sep 2025, arxiv 2509.25911)

Why this matters

Proposed evaluation

Risk

Source

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions