Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Users are confused about goal conditioning #393

Open
krzentner opened this issue Mar 22, 2023 · 3 comments
Open

Users are confused about goal conditioning #393

krzentner opened this issue Mar 22, 2023 · 3 comments

Comments

@krzentner
Copy link
Contributor

Meta-World was designed to be both a Meta-RL and a Multi-Task RL benchmark.
One of the awkward consequences of that is that the way goal conditioning is handled is very complicated in Meta-World.
Specifically, all environments in Meta-World are goal conditioned, in every benchmark.
However, goals are hidden in Meta-RL, and visible in Multi-Task RL.
This is intended to make "goal inference" part of the Meta-RL objective.
This allows ML1 to be used in a very similar way to older Meta-RL benchmark tasks (like HalfCheetahVelEnv or Ant Direction).
However, Meta-RL requires that each task be a fully-observable MDP. This requires each "goal" to be considered a different task, and the API reflects this (a ML1 benchmark object contains 50 train task objects, ML10 contains 500 train task objects).

However, Meta-World uses the same API for both Meta-RL and Multi-Task RL. Consequently, using the Benchmark API, the goal is changed by passing one of the task objects to the set_task function.
In particular, many users don't use the Benchmark API, and don't set the seeded_rand_vec flag either (which randomizes the goals on reset using the seed passed to the environment on init).
This leads users to believe the environments are not goal conditioned, even though they definitely are supposed to be (50 goals per task, set by the seed).
I don't know how many inconsistent results have been published because of this confusion, but at least a few.

TL;DR: Meta-RL requires ML10 to have 500 tasks, Multi-Task RL wants MT10 to have 10 tasks with 50 goals. This confuses users.

We should make the documentation and API more clear and harder to mis-use.
A good first start would be renaming the seeded_rand_vec flag, and setting it to True by default in all of the environment constructors when not using Benchmark API. Unfortunately, this is a breaking change, and we haven't published any versioned package, so we should definitely make sure we have published at least one version of the package before we do this.

@krzentner krzentner mentioned this issue Mar 22, 2023
5 tasks
@pseudo-rnd-thoughts
Copy link
Member

Given this confusion between Multi-task and Meta-task RL environments where metaworld treats them as identical and requires the user to differentiate (if I understand @krzentner right).
Could we not separate these out into two env classes? I understand that users might wish to explore using multi-task environments for meta-task rl but we could create converter classes to support this
Thoughts? @reginald-mclean @krzentner

@krzentner
Copy link
Contributor Author

krzentner commented Mar 23, 2023

The individual environment classes are fine. Having a flag which controls goal visibility really is simpler than adding another wrapper. The Benchmark API, however, is a poor fit for Multi-Task RL. We should probably just change the documentation to recommend using the metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE API for Multi-Task RL.

Having said that, the Benchmark API is basically necessary for Meta-RL to work at all, so we need to keep that too.

@krzentner
Copy link
Contributor Author

Example of a paper that assumes that MT10 isn't goal conditioned: https://arxiv.org/abs/2003.13661

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants