Skip to content

Commit

Permalink
Update AEC & Parallel docs to include examples, fix all docs links to…
Browse files Browse the repository at this point in the history
… use sphinx links (#1055)
  • Loading branch information
elliottower authored Aug 11, 2023
1 parent a83b98c commit a3c096d
Show file tree
Hide file tree
Showing 20 changed files with 35 additions and 28 deletions.
11 changes: 7 additions & 4 deletions docs/api/aec.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,16 @@ title: AEC

By default, PettingZoo models games as [*Agent Environment Cycle*](https://arxiv.org/abs/2009.13051) (AEC) environments. This allows PettingZoo to represent any type of game multi-agent RL can consider.

For more information, see [About AEC](#about-aec) or [*PettingZoo: A Standard API for Multi-Agent Reinforcement Learning*](https://arxiv.org/pdf/2009.14471.pdf).

[PettingZoo Wrappers](/api/wrappers/pz_wrappers/) can be used to convert between Parallel and AEC environments, with some restrictions (e.g., an AEC env must only update once at the end of each cycle).

## Examples
[PettingZoo Classic](/environments/classic/) provides standard examples of AEC environments for turn-based games, many of which implement [Illegal Action Masking](#action-masking).

We provide a [tutorial](/content/environment_creation/) for creating a simple Rock-Paper-Scissors AEC environment, showing how games with simultaneous actions can also be represented with AEC environments.

[PettingZoo Wrappers](/api/wrappers/pz_wrappers/) can be used to convert between Parallel and AEC environments, with some restrictions (e.g., an AEC env must only update once at the end of each cycle).

For more information, see [About AEC](#about-aec) or [*PettingZoo: A Standard API for Multi-Agent Reinforcement Learning*](https://arxiv.org/pdf/2009.14471.pdf).

## Usage

Expand Down Expand Up @@ -69,12 +72,12 @@ env.close()

Note: action masking is optional, and can be implemented using either `observation` or `info`.

* [PettingZoo Classic](https://pettingzoo.farama.org/environments/classic/) environments store action masks in the `observation` dict:
* [PettingZoo Classic](/environments/classic/) environments store action masks in the `observation` dict:
* `mask = observation["action_mask"]`
* [Shimmy](https://shimmy.farama.org/)'s [OpenSpiel environments](https://shimmy.farama.org/environments/open_spiel/) stores action masks in the `info` dict:
* `mask = info["action_mask"]`

To implement action masking in a custom environment, see [Environment Creation: Action Masking](https://pettingzoo.farama.org/tutorials/environmentcreation/3-action-masking/)
To implement action masking in a custom environment, see [Environment Creation: Action Masking](/tutorials/environmentcreation/3-action-masking/)

For more information on action masking, see [A Closer Look at Invalid Action Masking in Policy Gradient Algorithms](https://arxiv.org/abs/2006.14171) (Huang, 2022)

Expand Down
6 changes: 5 additions & 1 deletion docs/api/parallel.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,11 @@ For a comparison with the AEC API, see [About AEC](https://pettingzoo.farama.org

[PettingZoo Wrappers](/api/wrappers/pz_wrappers/) can be used to convert between Parallel and AEC environments, with some restrictions (e.g., an AEC env must only update once at the end of each cycle).

We provide tutorials for creating two custom Parallel environments: [Rock-Paper-Scissors](https://pettingzoo.farama.org/content/environment_creation/#example-custom-parallel-environment), and a simple [gridworld environment](https://pettingzoo.farama.org/tutorials/environmentcreation/2-environment-logic/)
## Examples

[PettingZoo Butterfly](/environments/butterfly/) provides standard examples of Parallel environments, such as [Pistonball](/environments/butterfly/pistonball).

We provide tutorials for creating two custom Parallel environments: [Rock-Paper-Scissors (Parallel)](https://pettingzoo.farama.org/content/environment_creation/#example-custom-parallel-environment), and a simple [gridworld environment](/tutorials/environmentcreation/2-environment-logic/)

## Usage

Expand Down
2 changes: 1 addition & 1 deletion docs/api/wrappers/supersuit_wrappers.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ title: Supersuit Wrappers

The [SuperSuit](https://github.com/Farama-Foundation/SuperSuit) companion package (`pip install supersuit`) includes a collection of pre-processing functions which can applied to both [AEC](/api/aec/) and [Parallel](/api/parallel/) environments.

To convert [space invaders](https://pettingzoo.farama.org/environments/atari/space_invaders/) to a greyscale observation space and stack the last 4 frames:
To convert [space invaders](/environments/atari/space_invaders/) to a greyscale observation space and stack the last 4 frames:

``` python
from pettingzoo.atari import space_invaders_v2
Expand Down
2 changes: 1 addition & 1 deletion docs/environments/atari.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Install ROMs using [AutoROM](https://github.com/Farama-Foundation/AutoROM), or s

### Usage

To launch a [Space Invaders](https://pettingzoo.farama.org/environments/atari/space_invaders/) environment with random agents:
To launch a [Space Invaders](/environments/atari/space_invaders/) environment with random agents:
```python
from pettingzoo.atari import space_invaders_v2

Expand Down
10 changes: 5 additions & 5 deletions docs/environments/butterfly.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@ Butterfly environments are challenging scenarios created by Farama, using Pygame
All environments require a high degree of coordination and require learning of emergent behaviors to achieve an optimal policy. As such, these environments are currently very challenging to learn.

Environments are highly configurable via arguments specified in their respective documentation:
[Cooperative Pong](https://pettingzoo.farama.org/environments/butterfly/cooperative_pong/),
[Knights Archers Zombies](https://pettingzoo.farama.org/environments/butterfly/knights_archers_zombies/),
[Pistonball](https://pettingzoo.farama.org/environments/butterfly/pistonball/).
[Cooperative Pong](/environments/butterfly/cooperative_pong/),
[Knights Archers Zombies](/environments/butterfly/knights_archers_zombies/),
[Pistonball](/environments/butterfly/pistonball/).

### Installation
The unique dependencies for this set of environments can be installed via:
Expand All @@ -34,7 +34,7 @@ pip install pettingzoo[butterfly]

### Usage

To launch a [Pistonball](https://pettingzoo.farama.org/environments/butterfly/pistonball/) environment with random agents:
To launch a [Pistonball](/environments/butterfly/pistonball/) environment with random agents:
```python
from pettingzoo.butterfly import pistonball_v6

Expand All @@ -49,7 +49,7 @@ while env.agents:
env.close()
```

To launch a [Knights Archers Zombies](https://pettingzoo.farama.org/environments/butterfly/knights_archers_zombies/) environment with interactive user input (see [manual_policy.py](https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/butterfly/knights_archers_zombies/manual_policy.py)):
To launch a [Knights Archers Zombies](/environments/butterfly/knights_archers_zombies/) environment with interactive user input (see [manual_policy.py](https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/butterfly/knights_archers_zombies/manual_policy.py)):
```python
import pygame
from pettingzoo.butterfly import knights_archers_zombies_v10
Expand Down
2 changes: 1 addition & 1 deletion docs/environments/classic.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ pip install pettingzoo[classic]

### Usage

To launch a [Texas Holdem](https://pettingzoo.farama.org/environments/classic/texas_holdem/) environment with random agents:
To launch a [Texas Holdem](/environments/classic/texas_holdem/) environment with random agents:
``` python
from pettingzoo.classic import texas_holdem_v4

Expand Down
2 changes: 1 addition & 1 deletion docs/environments/mpe.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ pip install pettingzoo[mpe]
````

### Usage
To launch a [Simple Tag](https://pettingzoo.farama.org/environments/mpe/simple_tag/) environment with random agents:
To launch a [Simple Tag](/environments/mpe/simple_tag/) environment with random agents:

``` python
from pettingzoo.mpe import simple_tag_v3
Expand Down
2 changes: 1 addition & 1 deletion docs/environments/sisl.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ pip install pettingzoo[sisl]
````

### Usage
To launch a [Waterworld](https://pettingzoo.farama.org/environments/sisl/waterworld/) environment with random agents:
To launch a [Waterworld](/environments/sisl/waterworld/) environment with random agents:

```python
from pettingzoo.sisl import waterworld_v4
Expand Down
2 changes: 1 addition & 1 deletion docs/environments/third_party_envs.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ Interactive PettingZoo implementation of the [Cathedral](https://en.wikipedia.or
[![PettingZoo version dependency](https://img.shields.io/badge/PettingZoo-v1.22.4-blue)]()
[![HuggingFace likes](https://img.shields.io/badge/stars-_2-blue)]()

Play [Connect Four](https://pettingzoo.farama.org/environments/classic/connect_four/) in real-time against an [RLlib](https://docs.ray.io/en/latest/rllib/index.html) agent trained via self-play and PPO.
Play [Connect Four](/environments/classic/connect_four/) in real-time against an [RLlib](https://docs.ray.io/en/latest/rllib/index.html) agent trained via self-play and PPO.
* Online game demo (using [Gradio](https://www.gradio.app/) and [HuggingFace Spaces](https://huggingface.co/docs/hub/spaces-overview)): [link](https://huggingface.co/spaces/ClementBM/connectfour), [tutorial](https://clementbm.github.io/project/2023/03/29/reinforcement-learning-connect-four-rllib.html)


Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ An API standard for multi-agent reinforcement learning.
**PettingZoo is a simple, pythonic interface capable of representing general multi-agent reinforcement learning (MARL) problems.**
PettingZoo includes a wide variety of reference environments, helpful utilities, and tools for creating your own custom environments.

The [AEC API](https://pettingzoo.farama.org/api/aec/) supports sequential turn based environments, while the [Parallel API](https://pettingzoo.farama.org/api/parallel/) supports environments with simultaneous actions.
The [AEC API](/api/aec/) supports sequential turn based environments, while the [Parallel API](/api/parallel/) supports environments with simultaneous actions.

Environments can be interacted with using a similar interface to [Gymnasium](https://gymnasium.farama.org):

Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/cleanrl/advanced_PPO.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: "CleanRL: Advanced PPO"

# CleanRL: Advanced PPO

This tutorial shows how to train [PPO](https://docs.cleanrl.dev/rl-algorithms/ppo/) agents on [Atari](https://pettingzoo.farama.org/environments/butterfly/pistonball/) environments ([Parallel](https://pettingzoo.farama.org/api/parallel/)).
This tutorial shows how to train [PPO](https://docs.cleanrl.dev/rl-algorithms/ppo/) agents on [Atari](/environments/butterfly/pistonball/) environments ([Parallel](/api/parallel/)).
This is a full training script including CLI, logging and integration with [TensorBoard](https://www.tensorflow.org/tensorboard) and [WandB](https://wandb.ai/) for experiment tracking.

This tutorial is mirrored from [CleanRL](https://github.com/vwxyzjn/cleanrl)'s examples. Full documentation and experiment results can be found at [https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_pettingzoo_ma_ataripy](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_pettingzoo_ma_ataripy)
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/cleanrl/implementing_PPO.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: "CleanRL: Implementing PPO"

# CleanRL: Implementing PPO

This tutorial shows how to train [PPO](https://docs.cleanrl.dev/rl-algorithms/ppo/) agents on the [Pistonball](https://pettingzoo.farama.org/environments/butterfly/pistonball/) environment ([Parallel](https://pettingzoo.farama.org/api/parallel/)).
This tutorial shows how to train [PPO](https://docs.cleanrl.dev/rl-algorithms/ppo/) agents on the [Pistonball](/environments/butterfly/pistonball/) environment ([Parallel](/api/parallel/)).

## Environment Setup
To follow this tutorial, you will need to install the dependencies shown below. It is recommended to use a newly-created virtual environment to avoid dependency conflicts.
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/rllib/holdem.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: "RLlib: DQN for Simple Poker"

# RLlib: DQN for Simple Poker

This tutorial shows how to train a [Deep Q-Network](https://docs.ray.io/en/latest/rllib/rllib-algorithms.html#deep-q-networks-dqn-rainbow-parametric-dqn) (DQN) agent on the [Leduc Hold'em](https://pettingzoo.farama.org/environments/classic/leduc_holdem/) environment ([AEC](https://pettingzoo.farama.org/api/aec/)).
This tutorial shows how to train a [Deep Q-Network](https://docs.ray.io/en/latest/rllib/rllib-algorithms.html#deep-q-networks-dqn-rainbow-parametric-dqn) (DQN) agent on the [Leduc Hold'em](/environments/classic/leduc_holdem/) environment ([AEC](/api/aec/)).

After training, run the provided code to watch your trained agent play vs itself. See the [documentation](https://docs.ray.io/en/latest/rllib/rllib-saving-and-loading-algos-and-policies.html) for more information.

Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/rllib/pistonball.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: "RLlib: PPO for Pistonball (Parallel)"

# RLlib: PPO for Pistonball

This tutorial shows how to train [Proximal Policy Optimization](https://docs.ray.io/en/latest/rllib/rllib-algorithms.html#ppo) (PPO) agents on the [Pistonball](https://pettingzoo.farama.org/environments/butterfly/pistonball/) environment ([Parallel](https://pettingzoo.farama.org/api/parallel/)).
This tutorial shows how to train [Proximal Policy Optimization](https://docs.ray.io/en/latest/rllib/rllib-algorithms.html#ppo) (PPO) agents on the [Pistonball](/environments/butterfly/pistonball/) environment ([Parallel](/api/parallel/)).

After training, run the provided code to watch your trained agent play vs itself. See the [documentation](https://docs.ray.io/en/latest/rllib/rllib-saving-and-loading-algos-and-policies.html) for more information.

Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/sb3/connect_four.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: "SB3: Action Masked PPO for Connect Four"

# SB3: Action Masked PPO for Connect Four

This tutorial shows how to train a agents using Maskable [Proximal Policy Optimization](https://sb3-contrib.readthedocs.io/en/master/modules/ppo_mask.html) (PPO) on the [Connect Four](https://pettingzoo.farama.org/environments/classic/chess/) environment ([AEC](https://pettingzoo.farama.org/api/aec/)).
This tutorial shows how to train a agents using Maskable [Proximal Policy Optimization](https://sb3-contrib.readthedocs.io/en/master/modules/ppo_mask.html) (PPO) on the [Connect Four](/environments/classic/chess/) environment ([AEC](/api/aec/)).

It creates a custom Wrapper to convert to a [Gymnasium](https://gymnasium.farama.org/)-like environment which is compatible with [SB3 action masking](https://sb3-contrib.readthedocs.io/en/master/modules/ppo_mask.html).

Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/sb3/kaz.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: "SB3: PPO for Knights-Archers-Zombies"

# SB3: PPO for Knights-Archers-Zombies

This tutorial shows how to train agents using [Proximal Policy Optimization](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html) (PPO) on the [Knights-Archers-Zombies](https://pettingzoo.farama.org/environments/butterfly/knights_archers_zombies/) environment ([AEC](https://pettingzoo.farama.org/api/aec/)).
This tutorial shows how to train agents using [Proximal Policy Optimization](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html) (PPO) on the [Knights-Archers-Zombies](/environments/butterfly/knights_archers_zombies/) environment ([AEC](/api/aec/)).

We use SuperSuit to create vectorized environments, leveraging multithreading to speed up training (see SB3's [vector environments documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html)).

Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/sb3/waterworld.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: "SB3: PPO for Waterworld (Parallel)"

# SB3: PPO for Waterworld

This tutorial shows how to train agents using [Proximal Policy Optimization](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html) (PPO) on the [Waterworld](https://pettingzoo.farama.org/environments/sisl/waterworld/) environment ([Parallel](https://pettingzoo.farama.org/api/parallel/)).
This tutorial shows how to train agents using [Proximal Policy Optimization](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html) (PPO) on the [Waterworld](/environments/sisl/waterworld/) environment ([Parallel](/api/parallel/)).

We use SuperSuit to create vectorized environments, leveraging multithreading to speed up training (see SB3's [vector environments documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html)).

Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/tianshou/advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ title: "Tianshou: CLI and Logging"

# Tianshou: CLI and Logging

This tutorial is a full example using Tianshou to train a [Deep Q-Network](https://tianshou.readthedocs.io/en/master/tutorials/dqn.html) (DQN) agent on the [Tic-Tac-Toe](https://pettingzoo.farama.org/environments/classic/tictactoe/) environment.
This tutorial is a full example using Tianshou to train a [Deep Q-Network](https://tianshou.readthedocs.io/en/master/tutorials/dqn.html) (DQN) agent on the [Tic-Tac-Toe](/environments/classic/tictactoe/) environment.

It extends the code from [Training Agents](https://pettingzoo.farama.org/tutorials/tianshou/intermediate/) to add CLI (using [argparse](https://docs.python.org/3/library/argparse.html)) and logging (using Tianshou's [Logger](https://tianshou.readthedocs.io/en/master/tutorials/logger.html)).
It extends the code from [Training Agents](/tutorials/tianshou/intermediate/) to add CLI (using [argparse](https://docs.python.org/3/library/argparse.html)) and logging (using Tianshou's [Logger](https://tianshou.readthedocs.io/en/master/tutorials/logger.html)).


## Environment Setup
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/tianshou/beginner.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ title: "Tianshou: Basic API Usage"

This tutorial is a simple example of how to use [Tianshou](https://github.com/thu-ml/tianshou) with a PettingZoo environment.

It demonstrates a game betwenen two [random policy](https://tianshou.readthedocs.io/en/master/_modules/tianshou/policy/random.html) agents in the [rock-paper-scissors](https://pettingzoo.farama.org/environments/classic/rps/) environment.
It demonstrates a game betwenen two [random policy](https://tianshou.readthedocs.io/en/master/_modules/tianshou/policy/random.html) agents in the [rock-paper-scissors](/environments/classic/rps/) environment.

## Environment Setup
To follow this tutorial, you will need to install the dependencies shown below. It is recommended to use a newly-created virtual environment to avoid dependency conflicts.
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/tianshou/intermediate.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: "Tianshou: Training Agents"

# Tianshou: Training Agents

This tutorial shows how to use [Tianshou](https://github.com/thu-ml/tianshou) to train a [Deep Q-Network](https://tianshou.readthedocs.io/en/master/tutorials/dqn.html) (DQN) agent to play vs a [random policy](https://tianshou.readthedocs.io/en/master/_modules/tianshou/policy/random.html) agent in the [Tic-Tac-Toe](https://pettingzoo.farama.org/environments/classic/tictactoe/) environment.
This tutorial shows how to use [Tianshou](https://github.com/thu-ml/tianshou) to train a [Deep Q-Network](https://tianshou.readthedocs.io/en/master/tutorials/dqn.html) (DQN) agent to play vs a [random policy](https://tianshou.readthedocs.io/en/master/_modules/tianshou/policy/random.html) agent in the [Tic-Tac-Toe](/environments/classic/tictactoe/) environment.

## Environment Setup
To follow this tutorial, you will need to install the dependencies shown below. It is recommended to use a newly-created virtual environment to avoid dependency conflicts.
Expand Down

0 comments on commit a3c096d

Please sign in to comment.