feat(openclaw-rl): add reward model feedback training by chengzidl · Pull Request #28 · Gen-Verse/OpenClaw-RL

chengzidl · 2026-03-14T10:40:38Z

Summary

add a persistent feedback store for implicit positives and thumbs-down negatives
add a learned reward model with background training on collected feedback
integrate reward-model scoring into openclaw_api_server.py
expose /v1/feedback for thumbs-down collection
document and enable reward-model env vars in the launch script

Notes

thumbs-down now rewrites the stored label for the same turn instead of duplicating it
kept unrelated local files out of the commit

yinjjiew · 2026-03-14T23:24:46Z

Why do we need to add another reward model and keep training it here? Isn't using a PRM to judge response quality based on next-state environment/user feedback enough in this setting? I think most users who deploy personal agent don't have so many computations.

feat(openclaw-rl): add reward model feedback training

b12864e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(openclaw-rl): add reward model feedback training#28

feat(openclaw-rl): add reward model feedback training#28
chengzidl wants to merge 1 commit intoGen-Verse:mainfrom
chengzidl:feat/user-reward-model

chengzidl commented Mar 14, 2026

Uh oh!

yinjjiew commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chengzidl commented Mar 14, 2026

Summary

Notes

Uh oh!

yinjjiew commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants