Skip to content

Commit

Permalink
doc: add benchmark section
Browse files Browse the repository at this point in the history
- add state and action space descriptions
- add benchmark details
  • Loading branch information
frankroeder committed Sep 3, 2024
1 parent 16480c0 commit 9c8a992
Show file tree
Hide file tree
Showing 14 changed files with 146 additions and 0 deletions.
Binary file added docs/_static/ml1-1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/ml1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/ml10-1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/ml10.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/ml45-1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/ml45.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/mt1-1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/mt1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/mt10-1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
17 changes: 17 additions & 0 deletions docs/benchmark/action_space.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
layout: "contents"
title: Action Space
firstpage:
---

# Action Space

The action space of the Sawyer robot is a ```Box(-1.0, 1.0, (4,), float32)```.
An action represents the Cartesian displacement dx, dy, and dz of the end effector, and an additional action for gripper control.

| Num | Action | Control Min | Control Max | Name (in XML file) | Joint | Unit |
|-----|--------|-------------|-------------|---------------------|-------|------|
| 0 | Displacement of the end effector in x direction (dx) | -1 | 1 | mocap | N/A | position (m) |
| 1 | Displacement of the end effector in y direction (dy) | -1 | 1 | mocap | N/A | position (m) |
| 2 | Displacement of the end effector in z direction (dz) | -1 | 1 | mocap | N/A | position (m) |
| 3 | Gripper adjustment (closing/opening) | -1 | 1 | rightclaw, leftclaw | r_close, l_close | position (normalized) |
82 changes: 82 additions & 0 deletions docs/benchmark/benchmark_descriptions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
layout: "contents"
title: Benchmark Descriptions
firstpage:
---

# Benchmark Descriptions

The benchmark provides a selection of tasks used to study generalization in reinforcement learning (RL).
Different combinations of tasks provide benchmark scenarios suitable for multi-task RL and meta-RL.
Unlike usual RL benchmarks, the training of the agent is strictly split into a training and testing phase.

## Multi-Task Problems

The multi-task setting challenges the agent to learn a predefined set of skills simultaneously.
Below, different levels of difficulty are described.

### Multi-Task (MT1)

In the easiest setting, **MT1**, a single task needs to be learned where the agent must *reach*, *push*, or *pick and place* a goal object.
There is no testing of generalization involved in this setting.

```{figure} _static/mt1.gif
:alt: Multi-Task 1
:width: 500
```

### Multi-Task (MT10)

The **MT10** setting involves learning to solve a diverse set of 10 tasks, as depicted below.
There is no testing of generalization involved in this setting.



```{figure} _static/mt10.gif
:alt: Multi-Task 10
:width: 500
```

### Multi-Task (MT50)

In the **MT50** setting, the agent is challenged to solve the full suite of 50 tasks contained in metaworld.
This is the most challenging multi-task setting and involves no evaluation on test tasks.


## Meta-Learning Problems

Meta-RL attempts to evaluate the [transfer learning](https://en.
wikipedia.org/wiki/Transfer_learning) capabilities of agents learning skills based on a predefined set of training tasks, by evaluating generalization using a hold-out set of test tasks.
In other words, this setting allows for benchmarking an algorithm's ability to adapt to or learn new tasks.

### Meta-RL (ML1)

The simplest meta-RL setting, **ML1**, involves a single manipulation task, such as *pick and place* of an object with a changing goal location.
For the test evaluation, unseen goal locations are used to measure generalization capabilities.



```{figure} _static/ml1.gif
:alt: Meta-RL 1
:width: 500
```


### Meta-RL (ML10)

The meta-learning setting with 10 tasks, **ML10**, involves training on 10 manipulation tasks and evaluating on 5 unseen tasks during the test phase.

```{figure} _static/ml10.gif
:alt: Meta-RL 10
:width: 500
```

### Meta-RL (ML45)

The most difficult environment setting of metaworld, **ML45**, challenges the agent to be trained on 45 distinct manipulation tasks and evaluated on 5 test tasks.


```{figure} _static/ml45.gif
:alt: Meta-RL 10
:width: 500
```
Empty file.
37 changes: 37 additions & 0 deletions docs/benchmark/state_space.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
layout: "contents"
title: State Space
firstpage:
---

# State Space

The observation array consists of the gripper's (end effector's) position and state, alongside the object of interest's position and orientation. This table will detail each component usually present in such environments:

| Num | Observation Description | Min | Max | Site Name (XML) | Joint Name (XML) | Joint Type | Unit |
|-----|-----------------------------------------------|---------|---------|------------------------|-------------------|------------|-------------|
| 0 | End effector x position in global coordinates | -Inf | Inf | hand | - | - | position (m)|
| 1 | End effector y position in global coordinates | -Inf | Inf | hand | - | - | position (m)|
| 2 | End effector z position in global coordinates | -Inf | Inf | hand | - | - | position (m)|
| 3 | Gripper distance apart | 0.0 | 1.0 | - | - | - | dimensionless|
| 4 | Object x position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)|
| 5 | Object y position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)|
| 6 | Object z position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)|
| 7 | Object x quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
| 8 | Object y quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
| 9 | Object z quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
| 10 | Object w quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
| 11 | Previous end effector x position | -Inf | Inf | hand | - | - | position (m)|
| 12 | Previous end effector y position | -Inf | Inf | hand | - | - | position (m)|
| 13 | Previous end effector z position | -Inf | Inf | hand | - | - | position (m)|
| 14 | Previous gripper distance apart | 0.0 | 1.0 | - | - | - | dimensionless|
| 15 | Previous object x position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)|
| 16 | Previous object y position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)|
| 17 | Previous object z position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)|
| 18 | Previous object x quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
| 19 | Previous object y quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
| 20 | Previous object z quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
| 21 | Previous object w quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
| 22 | Goal x position | -Inf | Inf | goal (derived) | - | - | position (m)|
| 23 | Goal y position | -Inf | Inf | goal (derived) | - | - | position (m)|
| 24 | Goal z position | -Inf | Inf | goal (derived) | - | - | position (m)|
10 changes: 10 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,16 @@ rendering/rendering
usage/basic_usage
```

```{toctree}
:hidden:
:caption: Benchmark Information
benchmark/state_space
benchmark/action_space
benchmark/benchmark_descriptions
benchmark/env_tasks_vs_task_init
benchmark/reward_functions
```


```{toctree}
:hidden:
Expand Down

0 comments on commit 9c8a992

Please sign in to comment.