Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs work for V3 #503

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 2 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Metaworld documentation
# Meta-World documentation

This directory contains the documentation for Metaworld.
This directory contains the documentation for Meta-World.

For more information about how to contribute to the documentation go to our [CONTRIBUTING.md](https://github.com/Farama-Foundation/Celshast/blob/main/CONTRIBUTING.md)
4 changes: 4 additions & 0 deletions docs/_static/img/metaworld-text.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/img/ml1-1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/img/ml1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/img/ml10-1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/img/ml10.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/img/ml45-1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/img/ml45.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/img/mt1-1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/img/mt1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/img/mt10-1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/img/mt10.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/ml1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/ml10.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/ml45.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/mt1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 22 additions & 0 deletions docs/benchmark/action_space.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
layout: "contents"
title: Action Space
firstpage:
---

# Action Space

In the Meta-World benchmark, the agent must simultaneously solve multiple tasks that could be individually defined by their own Markov decision processes.
As this is solved by current approaches using a single policy/model, it requires the action space for all tasks to have a constant size, hence sharing a common structure.

The action space of the Sawyer robot is a ```Box(-1.0, 1.0, (4,), float32)```.
An action represents the Cartesian displacement `dx`, `dy`, and `dz` of the end-effector, and an additional action for gripper control.

For tasks that do not require the gripper, actions along those dimensions can be masked or ignored and set to a constant value that permanently closes the fingers.

| Num | Action | Control Min | Control Max | Name (in XML file) | Joint | Unit |
|-----|--------|-------------|-------------|---------------------|-------|------|
| 0 | Displacement of the end-effector in x direction (dx) | -1 | 1 | mocap | N/A | position (m) |
| 1 | Displacement of the end-effector in y direction (dy) | -1 | 1 | mocap | N/A | position (m) |
| 2 | Displacement of the end-effector in z direction (dz) | -1 | 1 | mocap | N/A | position (m) |
| 3 | Gripper adjustment (closing/opening) | -1 | 1 | rightclaw, leftclaw | r_close, l_close | position (normalized) |
112 changes: 112 additions & 0 deletions docs/benchmark/benchmark_descriptions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
---
layout: "contents"
title: Benchmark Descriptions
firstpage:
---

# Benchmark Descriptions

The benchmark provides a selection of tasks used to study generalization in reinforcement learning (RL).
Different combinations of tasks provide benchmark scenarios suitable for multi-task RL and meta-RL.
Unlike usual RL benchmarks, the training of the agent is strictly split into training and testing phases.

## Task Configuration

Meta-World distinguishes between parametric and non-parametric variations.
Parametric variations concern the configuration of the goal or object position, such as changing the location of the puck in the `push` task.

```
TODO: Add code snippets
```

Non-parametric variations are implemented by the settings containing multiple tasks, where the agent is faced with challenges like `push` and `open window` that necessitate a different set of skills.


## Multi-Task Problems

The multi-task setting challenges the agent to learn a predefined set of skills simultaneously.
Below, different levels of difficulty are described.


### Multi-Task (MT1)

In the easiest setting, **MT1**, a single task needs to be learned where the agent must, e.g., *reach*, *push*, or *pick place* a goal object.
There is no testing of generalization involved in this setting.

```{figure} ../_static/mt1.gif
:alt: Multi-Task 1
:width: 500
```

### Multi-Task (MT10)

The **MT10** evaluation uses 10 tasks: *reach*, *push*, *pick and place*, *open door*, *open drawer*, *close drawer*, *press button top-down*, *insert peg side*, *open window*, and *open box*.
The policy should be provided with a one-hot vector indicating the current task.
The positions of objects and goal positions are fixed in all tasks to focus solely on skill acquisition. <!-- TODO: check this -->


```{figure} ../_static/mt10.gif
:alt: Multi-Task 10
:width: 500
```

### Multi-Task (MT50)

The **MT50** evaluation uses all 50 Meta-World tasks.
This is the most challenging multi-task setting and involves no evaluation on test tasks.
As with **MT10**, the policy is provided with a one-hot vector indicating the current task, and object and goal positions are fixed.

See [Task Descriptions](task_descriptions) for more details.

## Meta-Learning Problems

Meta-RL attempts to evaluate the [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning)
capabilities of agents learning skills based on a predefined set of training
tasks, by evaluating generalization using a hold-out set of test tasks.
In other words, this setting allows for benchmarking an algorithm's
ability to adapt to or learn new tasks.

### Meta-RL (ML1)

The simplest meta-RL setting, **ML1**, involves few-shot adaptation to goal
variation within one task. ML1 uses single Meta-World Tasks, with the
meta-training "tasks" corresponding to 50 random initial object and goal
positions, and meta-testing on 10 held-out positions. We evaluate algorithms
on three individual tasks from Meta-World: *reaching*, *pushing*, and *pick and
place*, where the variation is over reaching position or goal object position.
The goal positions are not provided in the observation, forcing meta-RL
algorithms to adapt to the goal through trial-and-error.

```{figure} ../_static/ml1.gif
:alt: Meta-RL 1
:width: 500
```

### Meta-RL (ML10)

The **ML10** evaluation involves few-shot adaptation to new test tasks with 10
meta-training tasks. We hold out 5 tasks and meta-train policies on 10 tasks.
We randomize object and goal positions and intentionally select training tasks
with structural similarity to the test tasks. Task IDs are not provided as
input, requiring a meta-RL algorithm to identify the tasks from experience.

```{figure} ../_static/ml10.gif
:alt: Meta-RL 10
:width: 500
```

### Meta-RL (ML45)

The most difficult environment setting of Meta-World, **ML45**, challenges the
agent with few-shot adaptation to new test tasks using 45 meta-training tasks.
Similar to ML10, we hold out 5 tasks for testing and meta-train policies on 45
tasks. Object and goal positions are randomized, and training tasks are
selected for structural similarity to test tasks. As with ML10, task IDs are
not provided, requiring the meta-RL algorithm to identify tasks from experience.

<<<<<<< HEAD

```{figure} ../_static/ml45.gif
:alt: Meta-RL 10
:width: 500
```
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
layout: "contents"
title: Generate data with expert policies
title: Expert Trajectories
firstpage:
---

# Generate data with expert policies
# Expert Trajectories

## Expert Policies
For each individual environment in Meta-World (i.e. reach, basketball, sweep) there are expert policies that solve the task. These policies can be used to generate expert data for imitation learning tasks.
Expand All @@ -14,13 +14,12 @@ The below example provides sample code for the reach environment. This code can


```python
from metaworld import MT1
import gymnasium as gym
import metaworld
from metaworld.policies.sawyer_reach_v3_policy import SawyerReachV3Policy as p

from metaworld.policies.sawyer_reach_v2_policy import SawyerReachV2Policy as p
env = gym.make('MetaWorld/reach-v3')

mt1 = MT1('reach-v2', seed=42)
env = mt1.train_classes['reach-v2']()
env.set_task(mt1.train_tasks[0])
obs, info = env.reset()

policy = p()
Expand Down
7 changes: 7 additions & 0 deletions docs/benchmark/resetting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
layout: "contents"
title: Resetting to a Specific State
firstpage:
---

# Resetting to a Specific State
27 changes: 27 additions & 0 deletions docs/benchmark/reward_functions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
layout: "contents"
title: Reward Functions
firstpage:
---

# Reward Functions

Similar structures are provided with the [action](action_space) and [state space](space_space).
Meta-World provides well-shaped reward functions for the individual tasks that are solveable by current single-task reinforcement learning approaches.
To assure equivalent learning in the settings with multiple tasks, all task rewards have the same magnitude.

## Options

Meta-World currently implements two types of reward functions that can be selected
by passing the `reward_func_version` keyword argument to `gym.make(...)`.

### Version 1

Passing `reward_func_version=v1` configures the benchmark with the primary
reward function of Meta-World, which is actually a version of the
`pick-place-wall` task that is modified to also work for the other tasks.


### Version 2

TBA
17 changes: 17 additions & 0 deletions docs/benchmark/state_space.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
layout: "contents"
title: State Space
firstpage:
---

# State Space


Likewise the [action space](action_space), the state space among the tasks requires maintaining the same structure that allows current approaches to employ a single policy/model.
Meta-World contains tasks that either require manipulation of a single object with a potentially variable goal position (e.g., `reach`, `push`, `pick place`) or two objects with a fixed goal position (e.g., `hammer`, `soccer`, `shelf place`).
To account for such variability, large parts of the observation space are kept as placeholders, e.g., for the second object, if only one object is available.

The observation array consists of the end-effector's 3D Cartesian position and the composition of a single object with its goal coordinates or the positions of the first and second object.
This always results in a 9D state vector.

TODO: Provide table
Loading
Loading