diff --git a/docs/benchmark/benchmark_descriptions.md b/docs/benchmark/benchmark_descriptions.md index 307d6d14..73da50a0 100644 --- a/docs/benchmark/benchmark_descriptions.md +++ b/docs/benchmark/benchmark_descriptions.md @@ -40,7 +40,7 @@ positions are fixed in all tasks to focus solely on the skill acquisition. ### Multi-Task (MT50) -The **MT50** evaluation uses all 50 Meta-World tasks. This is the most +The **MT50** evaluation uses all 50 Metaworld tasks. This is the most challenging multi-task setting and involves no evaluation on test tasks. As with **MT10**, the policy is provided with a one-hot vector indicating the current task, and object and goal positions are fixed. @@ -58,10 +58,10 @@ ability to adapt to or learn new tasks. ### Meta-RL (ML1) The simplest meta-RL setting, **ML1**, involves few-shot adaptation to goal -variation within one task. ML1 uses single Meta-World Tasks, with the +variation within one task. ML1 uses single Metaworld Tasks, with the meta-training "tasks" corresponding to 50 random initial object and goal positions, and meta-testing on 10 held-out positions. We evaluate algorithms -on three individual tasks from Meta-World: *reaching*, *pushing*, and *pick and +on three individual tasks from Metaworld: *reaching*, *pushing*, and *pick and place*, where the variation is over reaching position or goal object position. The goal positions are not provided in the observation, forcing meta-RL algorithms to adapt to the goal through trial-and-error. @@ -86,7 +86,7 @@ input, requiring a meta-RL algorithm to identify the tasks from experience. ### Meta-RL (ML45) -The most difficult environment setting of Meta-World, **ML45**, challenges the +The most difficult environment setting of Metaworld, **ML45**, challenges the agent with few-shot adaptation to new test tasks using 45 meta-training tasks. Similar to ML10, we hold out 5 tasks for testing and meta-train policies on 45 tasks. Object and goal positions are randomized, and training tasks are diff --git a/docs/benchmark/reward_functions.md b/docs/benchmark/reward_functions.md new file mode 100644 index 00000000..f71ed541 --- /dev/null +++ b/docs/benchmark/reward_functions.md @@ -0,0 +1,23 @@ +--- +layout: "contents" +title: Reward Functions +firstpage: +--- + +# Reward Functions + +Metaworld currently implements two types of reward functions that can be selected +by passing the `reward_func_version` keyword argument to the `gym.make(...)` call. + +Supported are currently two versions. + +## Version 1 + +Passing `reward_func_version=v1` configures the benchmark with the primary +reward function of Metaworld, which is actually a version of the +`pick-place-wall` task that is modified to also work for the other tasks. + + +## Version 2 + +TBA