-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Based on the paper, the Simulator(Environment) policy will be updated:
Build weighted env SFT set DΛπ‘ env from Tπ‘ with weights β exp(ππ env (Λπ)) (equation 3).
Update environment πenv via RWR on DΛπ‘ env to maximize πΌ[π env] (equation 3).
Maybe I'm blind, but can you please kindly confirm whether it was in the scope of this repo?
Thank you
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels