-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roadmap #388
Comments
There's a lot to unpack in that roadmap, and I definitely think we should take things step by step.
The rest of the proposed roadmap I'm not so confident in. For example, some multi-task RL algorithms (e.g. task embeddings) require the one-hot vector to not be present, so adding it by default seems unwise. There's also the complication that the meaning of "task" in MT1 and the meaning of "task" in MT50 is quite different, and I don't understand how you're proposing to handle those differences. I very much think that it's a bit too early to be proposing entirely new ways of using the Meta-World benchmark, as the last part of this roadmap describes. There's already an evaluation protocol for Meta-World, and proposing a new one without clarity on how it improves on the previous one is likely to cause confusion. |
Thanks for the feedback. At least in my head, I would have thought we could create a super class for Meta and Multi-task envs. |
KR could you provide us with some insight into the correct evaluation protocol of Meta-World? For multi-task learning algorithms the evaluation should be on the training tasks, and for meta-learning algorithms the evaluation is on the held out test tasks? Both of these evaluations repeated a number of times to generate a success rate. Could you also provide us with the distinction between task in MT1 vs task in MT50? The way that I understood it was that each environment is a task, and then each environment has 50 different initializations to simulate the initial state distribution of a singular environment. Therefore MT1 is a singular task, and MT50 contains 50 tasks, each with 50 different initializations per task. I agree that enabling the seeded_rand_vec flag by default is a good idea. Better to have 1 environment per task where different initializations can be sampled by doing env.reset() rather than 50 environments per task where an initialization must be sampled from somewhere else. |
I wrote a description about tasks, and how Meta-RL and Multi-Task RL differ in #393, that hopefully is more clear than prior explanations. Feel free to ask any followup questions you have. The Meta-RL evaluation procedure is basically the following: for task in test_tasks:
policy = meta_algo.get_policy()
for adaptation_steps in range(n_adaptation_steps):
data = rollout(policy, task)
policy = meta_algo.adapt_policy(policy, data)
task_performance = evaluate(policy) The implementation used in the original Meta-World paper is here. However, it only supports one adaptation step, which is fine for RL^2 and PEARL, but means that MAML only runs one gradient step. |
I think the easiest solution is to consider these changes to be for a 1.0.0 release
MujocoEnv
)gymnasium.make
Proposed idea for environments
These ideas might be wrong but I would be interested in feedback on them
For the multi-task environments, we can consider there existing two "types" of environments.
I would propose that users use the multi-task environment from make, i.e.,
gymnasium.make("metaworld/multi-task-50")
. Additionally, I think we should include the one-hot vector info either inside the multi-task environment that can be enabled or as a wrapper within metaworld such that users don't need to copy the previous implementation (this can allow the wrapper to be updated as API is).This multi-task environment should be a very generic environment such that custom task environments can be used with it.
For meta-environments, we can consider there existing two "modes" of environments or even two different environments.
Similar to multi-task I think we can use make again with a parameter to specify the environment's mode,
gymnasium.make("metaworld/meta-1", mode="training")
. Then this environment can be passed any training library returning the trained agent. Then we can have anevaluate_meta_agent
function that takes the environment and a function for the evaluating the agent policy given an observation and info.The text was updated successfully, but these errors were encountered: