From ed3ea35b8d853ed441492bf39f60879fdbba24b6 Mon Sep 17 00:00:00 2001 From: Frank Roeder Date: Fri, 6 Sep 2024 13:32:53 +0200 Subject: [PATCH] Add tasks descriptions from Yu et al. - update benchmark descriptions --- docs/benchmark/benchmark_descriptions.md | 55 +++++--- docs/benchmark/task_descriptions.md | 156 +++++++++++++++++++++++ docs/index.md | 1 + 3 files changed, 193 insertions(+), 19 deletions(-) create mode 100644 docs/benchmark/task_descriptions.md diff --git a/docs/benchmark/benchmark_descriptions.md b/docs/benchmark/benchmark_descriptions.md index 467285545..307d6d149 100644 --- a/docs/benchmark/benchmark_descriptions.md +++ b/docs/benchmark/benchmark_descriptions.md @@ -17,7 +17,7 @@ Below, different levels of difficulty are described. ### Multi-Task (MT1) -In the easiest setting, **MT1**, a single task needs to be learned where the agent must *reach*, *push*, or *pick and place* a goal object. +In the easiest setting, **MT1**, a single task needs to be learned where the agent must, e.g, *reach*, *push*, or *pick and place* a goal object. There is no testing of generalization involved in this setting. ```{figure} _static/mt1.gif @@ -27,10 +27,11 @@ There is no testing of generalization involved in this setting. ### Multi-Task (MT10) -The **MT10** setting involves learning to solve a diverse set of 10 tasks, as depicted below. -There is no testing of generalization involved in this setting. - - +The **MT10** evaluation uses 10 tasks: *reach*, *push*, *pick and place*, +*open door*, *open drawer*, *close drawer*, *press button top-down*, +*insert peg side*, *open window*, and *open box*. The policy is provided with a +one-hot vector indicating the current task. The positions of objects and goal +positions are fixed in all tasks to focus solely on the skill acquisition. ```{figure} _static/mt10.gif :alt: Multi-Task 10 @@ -39,32 +40,44 @@ There is no testing of generalization involved in this setting. ### Multi-Task (MT50) -In the **MT50** setting, the agent is challenged to solve the full suite of 50 tasks contained in metaworld. -This is the most challenging multi-task setting and involves no evaluation on test tasks. +The **MT50** evaluation uses all 50 Meta-World tasks. This is the most +challenging multi-task setting and involves no evaluation on test tasks. +As with **MT10**, the policy is provided with a one-hot vector indicating +the current task, and object and goal positions are fixed. +See [Task Descriptions](#benchmark/task_descriptions) for more details. ## Meta-Learning Problems -Meta-RL attempts to evaluate the [transfer learning](https://en. -wikipedia.org/wiki/Transfer_learning) capabilities of agents learning skills based on a predefined set of training tasks, by evaluating generalization using a hold-out set of test tasks. -In other words, this setting allows for benchmarking an algorithm's ability to adapt to or learn new tasks. +Meta-RL attempts to evaluate the [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning) +capabilities of agents learning skills based on a predefined set of training +tasks, by evaluating generalization using a hold-out set of test tasks. +In other words, this setting allows for benchmarking an algorithm's +ability to adapt to or learn new tasks. ### Meta-RL (ML1) -The simplest meta-RL setting, **ML1**, involves a single manipulation task, such as *pick and place* of an object with a changing goal location. -For the test evaluation, unseen goal locations are used to measure generalization capabilities. - - +The simplest meta-RL setting, **ML1**, involves few-shot adaptation to goal +variation within one task. ML1 uses single Meta-World Tasks, with the +meta-training "tasks" corresponding to 50 random initial object and goal +positions, and meta-testing on 10 held-out positions. We evaluate algorithms +on three individual tasks from Meta-World: *reaching*, *pushing*, and *pick and +place*, where the variation is over reaching position or goal object position. +The goal positions are not provided in the observation, forcing meta-RL +algorithms to adapt to the goal through trial-and-error. ```{figure} _static/ml1.gif :alt: Meta-RL 1 :width: 500 ``` - ### Meta-RL (ML10) -The meta-learning setting with 10 tasks, **ML10**, involves training on 10 manipulation tasks and evaluating on 5 unseen tasks during the test phase. +The **ML10** evaluation involves few-shot adaptation to new test tasks with 10 +meta-training tasks. We hold out 5 tasks and meta-train policies on 10 tasks. +We randomize object and goal positions and intentionally select training tasks +with structural similarity to the test tasks. Task IDs are not provided as +input, requiring a meta-RL algorithm to identify the tasks from experience. ```{figure} _static/ml10.gif :alt: Meta-RL 10 @@ -73,10 +86,14 @@ The meta-learning setting with 10 tasks, **ML10**, involves training on 10 manip ### Meta-RL (ML45) -The most difficult environment setting of metaworld, **ML45**, challenges the agent to be trained on 45 distinct manipulation tasks and evaluated on 5 test tasks. - +The most difficult environment setting of Meta-World, **ML45**, challenges the +agent with few-shot adaptation to new test tasks using 45 meta-training tasks. +Similar to ML10, we hold out 5 tasks for testing and meta-train policies on 45 +tasks. Object and goal positions are randomized, and training tasks are +selected for structural similarity to test tasks. As with ML10, task IDs are +not provided, requiring the meta-RL algorithm to identify tasks from experience. ```{figure} _static/ml45.gif - :alt: Meta-RL 10 + :alt: Meta-RL 45 :width: 500 ``` diff --git a/docs/benchmark/task_descriptions.md b/docs/benchmark/task_descriptions.md new file mode 100644 index 000000000..0826328f0 --- /dev/null +++ b/docs/benchmark/task_descriptions.md @@ -0,0 +1,156 @@ +--- +layout: "contents" +title: Task Descriptions +firstpage: +--- + +# Task Descriptions +## Turn on faucet +Rotate the faucet counter-clockwise. Randomize faucet positions + +## Sweep +Sweep a puck off the table. Randomize puck positions + +## Assemble nut +Pick up a nut and place it onto a peg. Randomize nut and peg positions + +## Turn off faucet +Rotate the faucet clockwise. Randomize faucet positions + +## Push +Push the puck to a goal. Randomize puck and goal positions + +## Pull lever +Pull a lever down 90 degrees. Randomize lever positions + +## Turn dial +Rotate a dial 180 degrees. Randomize dial positions + +## Push with stick +Grasp a stick and push a box using the stick. Randomize stick positions. + +## Get coffee +Push a button on the coffee machine. Randomize the position of the coffee machine + +## Pull handle side +Pull a handle up sideways. Randomize the handle positions + +## Basketball +Dunk the basketball into the basket. Randomize basketball and basket positions + +## Pull with stick +Grasp a stick and pull a box with the stick. Randomize stick positions + +## Sweep into hole +Sweep a puck into a hole. Randomize puck positions + +## Disassemble nut +Pick a nut out of the a peg. Randomize the nut positions + +## Place onto shelf +Pick and place a puck onto a shelf. Randomize puck and shelf positions + +## Push mug +Push a mug under a coffee machine. Randomize the mug and the machine positions + +## Press handle side +Press a handle down sideways. Randomize the handle positions + +## Hammer +Hammer a screw on the wall. Randomize the hammer and the screw positions + +## Slide plate +Slide a plate into a cabinet. Randomize the plate and cabinet positions + +## Slide plate side +Slide a plate into a cabinet sideways. Randomize the plate and cabinet positions + +## Press button wall +Bypass a wall and press a button. Randomize the button positions + +## Press handle +Press a handle down. Randomize the handle positions + +## Pull handle +Pull a handle up. Randomize the handle positions + +## Soccer +Kick a soccer into the goal. Randomize the soccer and goal positions + +## Retrieve plate side +Get a plate from the cabinet sideways. Randomize plate and cabinet positions + +## Retrieve plate +Get a plate from the cabinet. Randomize plate and cabinet positions + +## Close drawer +Push and close a drawer. Randomize the drawer positions + +## Press button top +Press a button from the top. Randomize button positions + +## Reach +Reach a goal position. Randomize the goal positions + +## Press button top wall +Bypass a wall and press a button from the top. Randomize button positions + +## Reach with wall +Bypass a wall and reach a goal. Randomize goal positions + +## Insert peg side +Insert a peg sideways. Randomize peg and goal positions + +## Pull +Pull a puck to a goal. Randomize puck and goal positions + +## Push with wall +Bypass a wall and push a puck to a goal. Randomize puck and goal positions + +## Pick out of hole +Pick up a puck from a hole. Randomize puck and goal positions + +## Pick&place w/ wall +Pick a puck, bypass a wall and place the puck. Randomize puck and goal positions + +## Press button +Press a button. Randomize button positions + +## Pick&place +Pick and place a puck to a goal. Randomize puck and goal positions + +## Pull mug +Pull a mug from a coffee machine. Randomize the mug and the machine positions + +## Unplug peg +Unplug a peg sideways. Randomize peg positions + +## Close window +Push and close a window. Randomize window positions + +## Open window +Push and open a window. Randomize window positions + +## Open door +Open a door with a revolving joint. Randomize door positions + +## Close door +Close a door with a revolving joint. Randomize door positions + +## Open drawer +Open a drawer. Randomize drawer positions + +## Insert hand +Insert the gripper into a hole. + +## Close box +Grasp the cover and close the box with it. Randomize the cover and box positions + +## Lock door +Lock the door by rotating the lock clockwise. Randomize door positions + +## Unlock door +Unlock the door by rotating the lock counter-clockwise. Randomize door positions + +## Pick bin +Grasp the puck from one bin and place it into another bin. Randomize puck positions diff --git a/docs/index.md b/docs/index.md index ea3b46094..2b30519ed 100644 --- a/docs/index.md +++ b/docs/index.md @@ -53,6 +53,7 @@ usage/basic_usage benchmark/state_space benchmark/action_space benchmark/benchmark_descriptions +benchmark/task_descriptions.md benchmark/env_tasks_vs_task_init benchmark/reward_functions ```