Add tasks descriptions from Yu et al.

- update benchmark descriptions
Farama-Foundation · Sep 6, 2024 · ed3ea35 · ed3ea35
1 parent 9c8a992
commit ed3ea35
Show file tree

Hide file tree

Showing 3 changed files with 193 additions and 19 deletions.
diff --git a/docs/benchmark/benchmark_descriptions.md b/docs/benchmark/benchmark_descriptions.md
@@ -17,7 +17,7 @@ Below, different levels of difficulty are described.
 
 ### Multi-Task (MT1)
 
-In the easiest setting, **MT1**, a single task needs to be learned where the agent must *reach*, *push*, or *pick and place* a goal object.
+In the easiest setting, **MT1**, a single task needs to be learned where the agent must, e.g,  *reach*, *push*, or *pick and place* a goal object.
 There is no testing of generalization involved in this setting.
 
 ```{figure} _static/mt1.gif
@@ -27,10 +27,11 @@ There is no testing of generalization involved in this setting.
 
 ### Multi-Task (MT10)
 
-The **MT10** setting involves learning to solve a diverse set of 10 tasks, as depicted below.
-There is no testing of generalization involved in this setting.
-
-
+The **MT10** evaluation uses 10 tasks: *reach*, *push*, *pick and place*,
+*open door*, *open drawer*, *close drawer*, *press button top-down*,
+*insert peg side*, *open window*, and *open box*. The policy is provided with a
+one-hot vector indicating the current task. The positions of objects and goal
+positions are fixed in all tasks to focus solely on the skill acquisition.
 
 ```{figure} _static/mt10.gif
    :alt: Multi-Task 10 
@@ -39,32 +40,44 @@ There is no testing of generalization involved in this setting.
 
 ### Multi-Task (MT50)
 
-In the **MT50** setting, the agent is challenged to solve the full suite of 50 tasks contained in metaworld.
-This is the most challenging multi-task setting and involves no evaluation on test tasks.
+The **MT50** evaluation uses all 50 Meta-World tasks. This is the most
+challenging multi-task setting and involves no evaluation on test tasks.
+As with **MT10**, the policy is provided with a one-hot vector indicating
+the current task, and object and goal positions are fixed.
 
+See [Task Descriptions](#benchmark/task_descriptions) for more details.
 
 ## Meta-Learning Problems
 
-Meta-RL attempts to evaluate the [transfer learning](https://en.
-wikipedia.org/wiki/Transfer_learning) capabilities of agents learning skills based on a predefined set of training tasks, by evaluating generalization using a hold-out set of test tasks.
-In other words, this setting allows for benchmarking an algorithm's ability to adapt to or learn new tasks.
+Meta-RL attempts to evaluate the [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning)
+capabilities of agents learning skills based on a predefined set of training
+tasks, by evaluating generalization using a hold-out set of test tasks.
+In other words, this setting allows for benchmarking an algorithm's
+ability to adapt to or learn new tasks.
 
 ### Meta-RL (ML1)
 
-The simplest meta-RL setting, **ML1**, involves a single manipulation task, such as *pick and place* of an object with a changing goal location.
-For the test evaluation, unseen goal locations are used to measure generalization capabilities.
-
-
+The simplest meta-RL setting, **ML1**, involves few-shot adaptation to goal
+variation within one task. ML1 uses single Meta-World Tasks, with the
+meta-training "tasks" corresponding to 50 random initial object and goal
+positions, and meta-testing on 10 held-out positions. We evaluate algorithms
+on three individual tasks from Meta-World: *reaching*, *pushing*, and *pick and
+place*, where the variation is over reaching position or goal object position.
+The goal positions are not provided in the observation, forcing meta-RL
+algorithms to adapt to the goal through trial-and-error.
 
 ```{figure} _static/ml1.gif
    :alt: Meta-RL 1 
    :width: 500
 ```
 
-
 ### Meta-RL (ML10)
 
-The meta-learning setting with 10 tasks, **ML10**, involves training on 10 manipulation tasks and evaluating on 5 unseen tasks during the test phase.
+The **ML10** evaluation involves few-shot adaptation to new test tasks with 10
+meta-training tasks. We hold out 5 tasks and meta-train policies on 10 tasks.
+We randomize object and goal positions and intentionally select training tasks
+with structural similarity to the test tasks. Task IDs are not provided as
+input, requiring a meta-RL algorithm to identify the tasks from experience.
 
 ```{figure} _static/ml10.gif
    :alt: Meta-RL 10 
@@ -73,10 +86,14 @@ The meta-learning setting with 10 tasks, **ML10**, involves training on 10 manip
 
 ### Meta-RL (ML45)
 
-The most difficult environment setting of metaworld, **ML45**, challenges the agent to be trained on 45 distinct manipulation tasks and evaluated on 5 test tasks.
-
+The most difficult environment setting of Meta-World, **ML45**, challenges the
+agent with few-shot adaptation to new test tasks using 45 meta-training tasks.
+Similar to ML10, we hold out 5 tasks for testing and meta-train policies on 45
+tasks. Object and goal positions are randomized, and training tasks are
+selected for structural similarity to test tasks. As with ML10, task IDs are
+not provided, requiring the meta-RL algorithm to identify tasks from experience.
 
 ```{figure} _static/ml45.gif
-   :alt: Meta-RL 10 
+   :alt: Meta-RL 45 
    :width: 500
 ```
diff --git a/docs/benchmark/task_descriptions.md b/docs/benchmark/task_descriptions.md
@@ -0,0 +1,156 @@
+---
+layout: "contents"
+title: Task Descriptions
+firstpage:
+---
+
+# Task Descriptions
+## Turn on faucet
+Rotate the faucet counter-clockwise. Randomize faucet positions
+
+## Sweep
+Sweep a puck off the table. Randomize puck positions
+
+## Assemble nut
+Pick up a nut and place it onto a peg. Randomize nut and peg positions
+
+## Turn off faucet
+Rotate the faucet clockwise. Randomize faucet positions
+
+## Push
+Push the puck to a goal. Randomize puck and goal positions
+
+## Pull lever
+Pull a lever down 90 degrees. Randomize lever positions
+
+## Turn dial
+Rotate a dial 180 degrees. Randomize dial positions
+
+## Push with stick
+Grasp a stick and push a box using the stick. Randomize stick positions.
+
+## Get coffee
+Push a button on the coffee machine. Randomize the position of the coffee machine
+
+## Pull handle side
+Pull a handle up sideways. Randomize the handle positions
+
+## Basketball
+Dunk the basketball into the basket. Randomize basketball and basket positions
+
+## Pull with stick
+Grasp a stick and pull a box with the stick. Randomize stick positions
+
+## Sweep into hole
+Sweep a puck into a hole. Randomize puck positions
+
+## Disassemble nut
+Pick a nut out of the a peg. Randomize the nut positions
+
+## Place onto shelf
+Pick and place a puck onto a shelf. Randomize puck and shelf positions
+
+## Push mug
+Push a mug under a coffee machine. Randomize the mug and the machine positions
+
+## Press handle side
+Press a handle down sideways. Randomize the handle positions
+
+## Hammer
+Hammer a screw on the wall. Randomize the hammer and the screw positions
+
+## Slide plate
+Slide a plate into a cabinet. Randomize the plate and cabinet positions
+
+## Slide plate side
+Slide a plate into a cabinet sideways. Randomize the plate and cabinet positions
+
+## Press button wall
+Bypass a wall and press a button. Randomize the button positions
+
+## Press handle
+Press a handle down. Randomize the handle positions
+
+## Pull handle
+Pull a handle up. Randomize the handle positions
+
+## Soccer
+Kick a soccer into the goal. Randomize the soccer and goal positions
+
+## Retrieve plate side
+Get a plate from the cabinet sideways. Randomize plate and cabinet positions
+
+## Retrieve plate
+Get a plate from the cabinet. Randomize plate and cabinet positions
+
+## Close drawer
+Push and close a drawer. Randomize the drawer positions
+
+## Press button top
+Press a button from the top. Randomize button positions
+
+## Reach
+Reach a goal position. Randomize the goal positions
+
+## Press button top wall
+Bypass a wall and press a button from the top. Randomize button positions
+
+## Reach with wall
+Bypass a wall and reach a goal. Randomize goal positions
+
+## Insert peg side
+Insert a peg sideways. Randomize peg and goal positions
+
+## Pull
+Pull a puck to a goal. Randomize puck and goal positions
+
+## Push with wall
+Bypass a wall and push a puck to a goal. Randomize puck and goal positions
+
+## Pick out of hole
+Pick up a puck from a hole. Randomize puck and goal positions
+
+## Pick&place w/ wall
+Pick a puck, bypass a wall and place the puck. Randomize puck and goal positions
+
+## Press button
+Press a button. Randomize button positions
+
+## Pick&place
+Pick and place a puck to a goal. Randomize puck and goal positions
+
+## Pull mug
+Pull a mug from a coffee machine. Randomize the mug and the machine positions
+
+## Unplug peg
+Unplug a peg sideways. Randomize peg positions
+
+## Close window
+Push and close a window. Randomize window positions
+
+## Open window
+Push and open a window. Randomize window positions
+
+## Open door
+Open a door with a revolving joint. Randomize door positions
+
+## Close door
+Close a door with a revolving joint. Randomize door positions
+
+## Open drawer
+Open a drawer. Randomize drawer positions
+
+## Insert hand
+Insert the gripper into a hole.
+
+## Close box
+Grasp the cover and close the box with it. Randomize the cover and box positions
+
+## Lock door
+Lock the door by rotating the lock clockwise. Randomize door positions
+
+## Unlock door
+Unlock the door by rotating the lock counter-clockwise. Randomize door positions
+
+## Pick bin
+Grasp the puck from one bin and place it into another bin. Randomize puck positions
diff --git a/docs/index.md b/docs/index.md
@@ -53,6 +53,7 @@ usage/basic_usage
 benchmark/state_space
 benchmark/action_space
 benchmark/benchmark_descriptions
+benchmark/task_descriptions.md
 benchmark/env_tasks_vs_task_init
 benchmark/reward_functions
 ```