Skip to content

Enhancement: Use Reinforcement Learning (AReaL) to Train Smarter Memory Decisions #147

@Insider77Circle

Description

@Insider77Circle

Idea: Teaching ReMe How to Remember via RL

I came across AReaL — an asynchronous reinforcement learning training framework from Tsinghua IIIS & Ant Group — and think it could be a meaningful complement to ReMe.

The Problem with Heuristics

Right now ReMe uses rule-based/heuristic logic for the hardest memory decisions:

  • What to compact vs. what to keep verbatim in a conversation
  • What to write to long-term memory vs. what to discard
  • How to score/rank retrieved memories for relevance

These are genuinely difficult judgement calls that humans don't agree on, and static heuristics will inevitably get them wrong in edge cases.

What AReaL Brings

AReaL is a flexible, scalable RL training framework that supports GRPO, PPO, DAPO, and others. It's designed specifically for training agentic and reasoning models and supports multi-domain tasks out of the box. Crucially, it's cheap and fast — 2.77× speedup over synchronous RL training.

The Integration Idea

Use AReaL to train a small "memory policy" model that learns to make ReMe's key decisions through reinforcement:

Decision Reward Signal
What to compact Downstream task success with/without the summary
What to store long-term Whether the memory was usefully retrieved later
How to prioritize retrieval User satisfaction / answer correctness

Rather than hardcoding these heuristics, the model learns them from experience — the same way humans learn what's worth remembering.

Why This Fits ReMe

ReMe's modular design (ReMeLight's ReAct-based memory writer, the hybrid retrieval layer, etc.) makes it well-suited to swapping in a learned policy at the decision points without restructuring the whole framework.

References

Happy to discuss further or help prototype something if there's interest!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions