Skip to content

Commit

Permalink
Add reward function description
Browse files Browse the repository at this point in the history
  • Loading branch information
frankroeder committed Sep 6, 2024
1 parent ed3ea35 commit 5b61842
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 4 deletions.
8 changes: 4 additions & 4 deletions docs/benchmark/benchmark_descriptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ positions are fixed in all tasks to focus solely on the skill acquisition.

### Multi-Task (MT50)

The **MT50** evaluation uses all 50 Meta-World tasks. This is the most
The **MT50** evaluation uses all 50 Metaworld tasks. This is the most
challenging multi-task setting and involves no evaluation on test tasks.
As with **MT10**, the policy is provided with a one-hot vector indicating
the current task, and object and goal positions are fixed.
Expand All @@ -58,10 +58,10 @@ ability to adapt to or learn new tasks.
### Meta-RL (ML1)

The simplest meta-RL setting, **ML1**, involves few-shot adaptation to goal
variation within one task. ML1 uses single Meta-World Tasks, with the
variation within one task. ML1 uses single Metaworld Tasks, with the
meta-training "tasks" corresponding to 50 random initial object and goal
positions, and meta-testing on 10 held-out positions. We evaluate algorithms
on three individual tasks from Meta-World: *reaching*, *pushing*, and *pick and
on three individual tasks from Metaworld: *reaching*, *pushing*, and *pick and
place*, where the variation is over reaching position or goal object position.
The goal positions are not provided in the observation, forcing meta-RL
algorithms to adapt to the goal through trial-and-error.
Expand All @@ -86,7 +86,7 @@ input, requiring a meta-RL algorithm to identify the tasks from experience.

### Meta-RL (ML45)

The most difficult environment setting of Meta-World, **ML45**, challenges the
The most difficult environment setting of Metaworld, **ML45**, challenges the
agent with few-shot adaptation to new test tasks using 45 meta-training tasks.
Similar to ML10, we hold out 5 tasks for testing and meta-train policies on 45
tasks. Object and goal positions are randomized, and training tasks are
Expand Down
23 changes: 23 additions & 0 deletions docs/benchmark/reward_functions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
layout: "contents"
title: Reward Functions
firstpage:
---

# Reward Functions

Metaworld currently implements two types of reward functions that can be selected
by passing the `reward_func_version` keyword argument to the `gym.make(...)` call.

Supported are currently two versions.

## Version 1

Passing `reward_func_version=v1` configures the benchmark with the primary
reward function of Metaworld, which is actually a version of the
`pick-place-wall` task that is modified to also work for the other tasks.


## Version 2

TBA

0 comments on commit 5b61842

Please sign in to comment.