Add reward function description

Farama-Foundation · Sep 6, 2024 · 5b61842 · 5b61842
1 parent ed3ea35
commit 5b61842
Show file tree

Hide file tree

Showing 2 changed files with 27 additions and 4 deletions.
diff --git a/docs/benchmark/benchmark_descriptions.md b/docs/benchmark/benchmark_descriptions.md
@@ -40,7 +40,7 @@ positions are fixed in all tasks to focus solely on the skill acquisition.
 
 ### Multi-Task (MT50)
 
-The **MT50** evaluation uses all 50 Meta-World tasks. This is the most
+The **MT50** evaluation uses all 50 Metaworld tasks. This is the most
 challenging multi-task setting and involves no evaluation on test tasks.
 As with **MT10**, the policy is provided with a one-hot vector indicating
 the current task, and object and goal positions are fixed.
@@ -58,10 +58,10 @@ ability to adapt to or learn new tasks.
 ### Meta-RL (ML1)
 
 The simplest meta-RL setting, **ML1**, involves few-shot adaptation to goal
-variation within one task. ML1 uses single Meta-World Tasks, with the
+variation within one task. ML1 uses single Metaworld Tasks, with the
 meta-training "tasks" corresponding to 50 random initial object and goal
 positions, and meta-testing on 10 held-out positions. We evaluate algorithms
-on three individual tasks from Meta-World: *reaching*, *pushing*, and *pick and
+on three individual tasks from Metaworld: *reaching*, *pushing*, and *pick and
 place*, where the variation is over reaching position or goal object position.
 The goal positions are not provided in the observation, forcing meta-RL
 algorithms to adapt to the goal through trial-and-error.
@@ -86,7 +86,7 @@ input, requiring a meta-RL algorithm to identify the tasks from experience.
 
 ### Meta-RL (ML45)
 
-The most difficult environment setting of Meta-World, **ML45**, challenges the
+The most difficult environment setting of Metaworld, **ML45**, challenges the
 agent with few-shot adaptation to new test tasks using 45 meta-training tasks.
 Similar to ML10, we hold out 5 tasks for testing and meta-train policies on 45
 tasks. Object and goal positions are randomized, and training tasks are

diff --git a/docs/benchmark/reward_functions.md b/docs/benchmark/reward_functions.md
@@ -0,0 +1,23 @@
+---
+layout: "contents"
+title: Reward Functions 
+firstpage:
+---
+
+# Reward Functions
+
+Metaworld currently implements two types of reward functions that can be selected
+by passing the `reward_func_version` keyword argument to the `gym.make(...)` call.
+
+Supported are currently two versions.
+
+## Version 1
+
+Passing `reward_func_version=v1` configures the benchmark with the primary
+reward function of Metaworld, which is actually a version of the
+`pick-place-wall` task that is modified to also work for the other tasks.
+
+
+## Version 2
+
+TBA