Skip to content

feat(openclaw-rl): add reward model feedback training#28

Open
chengzidl wants to merge 1 commit intoGen-Verse:mainfrom
chengzidl:feat/user-reward-model
Open

feat(openclaw-rl): add reward model feedback training#28
chengzidl wants to merge 1 commit intoGen-Verse:mainfrom
chengzidl:feat/user-reward-model

Conversation

@chengzidl
Copy link

Summary

  • add a persistent feedback store for implicit positives and thumbs-down negatives
  • add a learned reward model with background training on collected feedback
  • integrate reward-model scoring into openclaw_api_server.py
  • expose /v1/feedback for thumbs-down collection
  • document and enable reward-model env vars in the launch script

Notes

  • thumbs-down now rewrites the stored label for the same turn instead of duplicating it
  • kept unrelated local files out of the commit

@yinjjiew
Copy link
Member

Why do we need to add another reward model and keep training it here? Isn't using a PRM to judge response quality based on next-state environment/user feedback enough in this setting? I think most users who deploy personal agent don't have so many computations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants