-
Notifications
You must be signed in to change notification settings - Fork 198
Description
Idea: Teaching ReMe How to Remember via RL
I came across AReaL — an asynchronous reinforcement learning training framework from Tsinghua IIIS & Ant Group — and think it could be a meaningful complement to ReMe.
The Problem with Heuristics
Right now ReMe uses rule-based/heuristic logic for the hardest memory decisions:
- What to compact vs. what to keep verbatim in a conversation
- What to write to long-term memory vs. what to discard
- How to score/rank retrieved memories for relevance
These are genuinely difficult judgement calls that humans don't agree on, and static heuristics will inevitably get them wrong in edge cases.
What AReaL Brings
AReaL is a flexible, scalable RL training framework that supports GRPO, PPO, DAPO, and others. It's designed specifically for training agentic and reasoning models and supports multi-domain tasks out of the box. Crucially, it's cheap and fast — 2.77× speedup over synchronous RL training.
The Integration Idea
Use AReaL to train a small "memory policy" model that learns to make ReMe's key decisions through reinforcement:
| Decision | Reward Signal |
|---|---|
| What to compact | Downstream task success with/without the summary |
| What to store long-term | Whether the memory was usefully retrieved later |
| How to prioritize retrieval | User satisfaction / answer correctness |
Rather than hardcoding these heuristics, the model learns them from experience — the same way humans learn what's worth remembering.
Why This Fits ReMe
ReMe's modular design (ReMeLight's ReAct-based memory writer, the hybrid retrieval layer, etc.) makes it well-suited to swapping in a learned policy at the decision points without restructuring the whole framework.
References
- AReaL repo: https://github.com/inclusionAI/AReaL
- AReaL paper/docs cover multi-domain agentic RL, which maps well to the memory management domain
Happy to discuss further or help prototype something if there's interest!