diff --git a/docs/_static/ml1-1.gif b/docs/_static/ml1-1.gif new file mode 100644 index 00000000..7a3b6f4d Binary files /dev/null and b/docs/_static/ml1-1.gif differ diff --git a/docs/_static/ml1.gif b/docs/_static/ml1.gif new file mode 100644 index 00000000..7a3b6f4d Binary files /dev/null and b/docs/_static/ml1.gif differ diff --git a/docs/_static/ml10-1.gif b/docs/_static/ml10-1.gif new file mode 100644 index 00000000..90de3510 Binary files /dev/null and b/docs/_static/ml10-1.gif differ diff --git a/docs/_static/ml10.gif b/docs/_static/ml10.gif new file mode 100644 index 00000000..90de3510 Binary files /dev/null and b/docs/_static/ml10.gif differ diff --git a/docs/_static/ml45-1.gif b/docs/_static/ml45-1.gif new file mode 100644 index 00000000..d549ca0c Binary files /dev/null and b/docs/_static/ml45-1.gif differ diff --git a/docs/_static/ml45.gif b/docs/_static/ml45.gif new file mode 100644 index 00000000..d549ca0c Binary files /dev/null and b/docs/_static/ml45.gif differ diff --git a/docs/_static/mt1-1.gif b/docs/_static/mt1-1.gif new file mode 100644 index 00000000..cb0f9939 Binary files /dev/null and b/docs/_static/mt1-1.gif differ diff --git a/docs/_static/mt1.gif b/docs/_static/mt1.gif new file mode 100644 index 00000000..cb0f9939 Binary files /dev/null and b/docs/_static/mt1.gif differ diff --git a/docs/_static/mt10-1.gif b/docs/_static/mt10-1.gif new file mode 100644 index 00000000..bea6ce71 Binary files /dev/null and b/docs/_static/mt10-1.gif differ diff --git a/docs/benchmark/action_space.md b/docs/benchmark/action_space.md new file mode 100644 index 00000000..b67e30b2 --- /dev/null +++ b/docs/benchmark/action_space.md @@ -0,0 +1,17 @@ +--- +layout: "contents" +title: Action Space +firstpage: +--- + +# Action Space + +The action space of the Sawyer robot is a ```Box(-1.0, 1.0, (4,), float32)```. +An action represents the Cartesian displacement dx, dy, and dz of the end effector, and an additional action for gripper control. + +| Num | Action | Control Min | Control Max | Name (in XML file) | Joint | Unit | +|-----|--------|-------------|-------------|---------------------|-------|------| +| 0 | Displacement of the end effector in x direction (dx) | -1 | 1 | mocap | N/A | position (m) | +| 1 | Displacement of the end effector in y direction (dy) | -1 | 1 | mocap | N/A | position (m) | +| 2 | Displacement of the end effector in z direction (dz) | -1 | 1 | mocap | N/A | position (m) | +| 3 | Gripper adjustment (closing/opening) | -1 | 1 | rightclaw, leftclaw | r_close, l_close | position (normalized) | diff --git a/docs/benchmark/benchmark_descriptions.md b/docs/benchmark/benchmark_descriptions.md new file mode 100644 index 00000000..46728554 --- /dev/null +++ b/docs/benchmark/benchmark_descriptions.md @@ -0,0 +1,82 @@ +--- +layout: "contents" +title: Benchmark Descriptions +firstpage: +--- + +# Benchmark Descriptions + +The benchmark provides a selection of tasks used to study generalization in reinforcement learning (RL). +Different combinations of tasks provide benchmark scenarios suitable for multi-task RL and meta-RL. +Unlike usual RL benchmarks, the training of the agent is strictly split into a training and testing phase. + +## Multi-Task Problems + +The multi-task setting challenges the agent to learn a predefined set of skills simultaneously. +Below, different levels of difficulty are described. + +### Multi-Task (MT1) + +In the easiest setting, **MT1**, a single task needs to be learned where the agent must *reach*, *push*, or *pick and place* a goal object. +There is no testing of generalization involved in this setting. + +```{figure} _static/mt1.gif + :alt: Multi-Task 1 + :width: 500 +``` + +### Multi-Task (MT10) + +The **MT10** setting involves learning to solve a diverse set of 10 tasks, as depicted below. +There is no testing of generalization involved in this setting. + + + +```{figure} _static/mt10.gif + :alt: Multi-Task 10 + :width: 500 +``` + +### Multi-Task (MT50) + +In the **MT50** setting, the agent is challenged to solve the full suite of 50 tasks contained in metaworld. +This is the most challenging multi-task setting and involves no evaluation on test tasks. + + +## Meta-Learning Problems + +Meta-RL attempts to evaluate the [transfer learning](https://en. +wikipedia.org/wiki/Transfer_learning) capabilities of agents learning skills based on a predefined set of training tasks, by evaluating generalization using a hold-out set of test tasks. +In other words, this setting allows for benchmarking an algorithm's ability to adapt to or learn new tasks. + +### Meta-RL (ML1) + +The simplest meta-RL setting, **ML1**, involves a single manipulation task, such as *pick and place* of an object with a changing goal location. +For the test evaluation, unseen goal locations are used to measure generalization capabilities. + + + +```{figure} _static/ml1.gif + :alt: Meta-RL 1 + :width: 500 +``` + + +### Meta-RL (ML10) + +The meta-learning setting with 10 tasks, **ML10**, involves training on 10 manipulation tasks and evaluating on 5 unseen tasks during the test phase. + +```{figure} _static/ml10.gif + :alt: Meta-RL 10 + :width: 500 +``` + +### Meta-RL (ML45) + +The most difficult environment setting of metaworld, **ML45**, challenges the agent to be trained on 45 distinct manipulation tasks and evaluated on 5 test tasks. + + +```{figure} _static/ml45.gif + :alt: Meta-RL 10 + :width: 500 +``` diff --git a/docs/benchmark/env_task_vs_task_init.md b/docs/benchmark/env_task_vs_task_init.md new file mode 100644 index 00000000..e69de29b diff --git a/docs/benchmark/state_space.md b/docs/benchmark/state_space.md new file mode 100644 index 00000000..f648cdbd --- /dev/null +++ b/docs/benchmark/state_space.md @@ -0,0 +1,37 @@ +--- +layout: "contents" +title: State Space +firstpage: +--- + +# State Space + +The observation array consists of the gripper's (end effector's) position and state, alongside the object of interest's position and orientation. This table will detail each component usually present in such environments: + +| Num | Observation Description | Min | Max | Site Name (XML) | Joint Name (XML) | Joint Type | Unit | +|-----|-----------------------------------------------|---------|---------|------------------------|-------------------|------------|-------------| +| 0 | End effector x position in global coordinates | -Inf | Inf | hand | - | - | position (m)| +| 1 | End effector y position in global coordinates | -Inf | Inf | hand | - | - | position (m)| +| 2 | End effector z position in global coordinates | -Inf | Inf | hand | - | - | position (m)| +| 3 | Gripper distance apart | 0.0 | 1.0 | - | - | - | dimensionless| +| 4 | Object x position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)| +| 5 | Object y position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)| +| 6 | Object z position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)| +| 7 | Object x quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion | +| 8 | Object y quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion | +| 9 | Object z quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion | +| 10 | Object w quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion | +| 11 | Previous end effector x position | -Inf | Inf | hand | - | - | position (m)| +| 12 | Previous end effector y position | -Inf | Inf | hand | - | - | position (m)| +| 13 | Previous end effector z position | -Inf | Inf | hand | - | - | position (m)| +| 14 | Previous gripper distance apart | 0.0 | 1.0 | - | - | - | dimensionless| +| 15 | Previous object x position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)| +| 16 | Previous object y position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)| +| 17 | Previous object z position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)| +| 18 | Previous object x quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion | +| 19 | Previous object y quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion | +| 20 | Previous object z quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion | +| 21 | Previous object w quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion | +| 22 | Goal x position | -Inf | Inf | goal (derived) | - | - | position (m)| +| 23 | Goal y position | -Inf | Inf | goal (derived) | - | - | position (m)| +| 24 | Goal z position | -Inf | Inf | goal (derived) | - | - | position (m)| diff --git a/docs/index.md b/docs/index.md index 5bb47f0b..ea3b4609 100644 --- a/docs/index.md +++ b/docs/index.md @@ -47,6 +47,16 @@ rendering/rendering usage/basic_usage ``` +```{toctree} +:hidden: +:caption: Benchmark Information +benchmark/state_space +benchmark/action_space +benchmark/benchmark_descriptions +benchmark/env_tasks_vs_task_init +benchmark/reward_functions +``` + ```{toctree} :hidden: