Skip to content

Commit 9beb324

Browse files
Alex RutherfordAlex Rutherford
Alex Rutherford
authored and
Alex Rutherford
committed
docs updates
1 parent 4d36509 commit 9beb324

File tree

10 files changed

+77
-37
lines changed

10 files changed

+77
-37
lines changed

docs/Environments/mpe.md

Lines changed: 21 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,27 @@
11
# MPE
22

3-
Multi Particle Environments (MPE) are a set of communication oriented environment where particle agents can (sometimes) move, communicate, see each other, push each other around, and interact with fixed landmarks. We implement all of the [PettingZoo MPE Environments](https://pettingzoo.farama.org/environments/mpe/).
3+
Multi Particle Environments (MPE) are a set of communication oriented environment where particle agents can (sometimes) move, communicate, see each other, push each other around, and interact with fixed landmarks.
44

5+
![MPE](https://github.com/FLAIROx/JaxMARL/blob/main/docs/imgs/qmix_MPE_simple_tag_v3.gif?raw=true){ width=300px }
6+
/// caption
7+
MPE Simple Tag
8+
///
59

6-
<div class="collage">
7-
<div class="row" align="left">
8-
<img src="docs/qmix_MPE_simple_tag_v3.gif" alt="MPE Simple Tag" width="30%"/>
9-
<img src="docs/vdn_MPE_simple_spread_v3.gif" alt="MPE Simple Spread" width="30%"/>
10-
<img src="docs/qmix_MPE_simple_speaker_listener_v4.gif" alt="MPE Speaker Listener" width="30%">
11-
</div>
12-
</div>
13-
10+
## Environments
1411

12+
We implement all of the [PettingZoo MPE Environments](https://pettingzoo.farama.org/environments/mpe/):
1513

1614
| Envrionment | JaxMARL Registry Name |
1715
|---|---|
18-
| Simple | `MPE_simple_v3` |
19-
| Simple Push | `MPE_simple_push_v3` |
20-
| Simple Spread | `MPE_simple_spread_v3` |
21-
| Simple Crypto | `MPE_simple_crypto_v3` |
22-
| Simple Speaker Listener | `MPE_simple_speaker_listener_v4` |
23-
| Simple Tag | `MPE_simple_tag_v3` |
24-
| Simple World Comm | `MPE_simple_world_comm_v3` |
25-
| Simple Reference | `MPE_simple_reference_v3` |
26-
| Simple Adversary | `MPE_simple_adversary_v3` |
27-
28-
29-
The implementations follow the PettingZoo code as closely as possible, including sharing variable names and version numbers. There are occasional discrepancies between the PettingZoo code and docs, where this occurs we have followed the code. As our implementation closely follows the PettingZoo code, please refer to their documentation for further information on the environments.
16+
| Simple | MPE_simple_v3 |
17+
| Simple Push | MPE_simple_push_v3 |
18+
| Simple Spread | MPE_simple_spread_v3 |
19+
| Simple Crypto | MPE_simple_crypto_v3 |
20+
| Simple Speaker Listener | MPE_simple_speaker_listener_v4 |
21+
| Simple Tag | MPE_simple_tag_v3 |
22+
| Simple World Comm | MPE_simple_world_comm_v3 |
23+
| Simple Reference | MPE_simple_reference_v3 |
24+
| Simple Adversary | MPE_simple_adversary_v3 |
3025

3126
We additionally include a fully cooperative variant of Simple Tag, first used to evaluate FACMAC. In this environmnet, a number of agents attempt to tag a number of prey, where the prey are controlled by a heuristic AI.
3227

@@ -36,6 +31,10 @@ We additionally include a fully cooperative variant of Simple Tag, first used to
3631
| 6 agents, 2 prey | `MPE_simple_facmac_6a_v1` |
3732
| 9 agents, 3 prey | `MPE_simple_facmac_9a_v1` |
3833

34+
## Implementation notes
35+
36+
The implementations follow the PettingZoo code as closely as possible, including sharing variable names and version numbers. There are occasional discrepancies between the PettingZoo code and docs, where this occurs we have followed the code. As our implementation closely follows the PettingZoo code, please refer to their documentation for further information on the environments.
37+
3938
## Action Space
4039
Following the PettingZoo implementation, we allow for both discrete or continuous action spaces in all MPE envrionments. The environments use discrete actions by default.
4140

@@ -53,7 +52,7 @@ The exact observation varies for each environment, but in general it is a vector
5352
## Visualisation
5453
Check the example `mpe_introduction.py` file in the tutorials folder for an introduction to our implementation of the MPE environments, including an example visualisation. We animate the environment after the state transitions have been collected as follows:
5554

56-
```python
55+
``` python
5756
import jax
5857
from jaxmarl import make
5958
from jaxmarl.environments.mpe import MPEVisualizer

docs/Environments/smax.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,13 @@
11
# SMAX
2-
## Description
3-
SMAX is a purely JAX SMAC-like environment. It, like SMAC, focuses on decentralised unit micromanagement across a range of scenarios. Each scenario features fixed teams.
2+
3+
**SMAX is a purely JAX SMAC-like environment**. It, like SMAC, focuses on decentralised unit micromanagement across a range of scenarios. Each scenario features fixed teams.
4+
5+
![SMAX](https://github.com/FLAIROx/JaxMARL/blob/main/docs/imgs/smax.gif?raw=true){ width=300px }
6+
/// caption
7+
2s3z Scenario
8+
///
9+
10+
411

512
## Scenarios
613

docs/Environments/storm.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,21 @@
11
# STORM
22

3+
4+
![STORM](https://github.com/FLAIROx/JaxMARL/blob/main/docs/imgs/storm.gif?raw=true){ width=250px}
5+
36
Spatial-Temporal Representations of Matrix Games (STORM) is inspired by the "in the Matrix" games in [Melting Pot 2.0](https://arxiv.org/abs/2211.13746), the [STORM](https://openreview.net/forum?id=54F8woU8vhq) environment expands on matrix games by representing them as grid-world scenarios. Agents collect resources which define their strategy during interactions and are rewarded based on a pre-specified payoff matrix. This allows for the embedding of fully cooperative, competitive or general-sum games, such as the prisoner's dilemma.
47

58
Thus, STORM can be used for studying paradigms such as *opponent shaping*, where agents act with the intent to change other agents' learning dynamics. Compared to the Coin Game or matrix games, the grid-world setting presents a variety of new challenges such as partial observability, multi-step agent interactions, temporally-extended actions, and longer time horizons. Unlike the "in the Matrix" games from Melting Pot, STORM features stochasticity, increasing the difficulty
69

10+
## Environment explanation
11+
12+
713

814
## Visualisation
915

1016
We render each timestep and then create a gif from the collection of images. Further examples are provided [here](https://github.com/FLAIROx/JaxMARL/tree/main/jaxmarl/tutorials).
1117

12-
```python
18+
``` python
1319
import jax
1420
import jax.numpy as jnp
1521
from PIL import Image

docs/Installation.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
Before installing, ensure you have the correct [JAX installation](https://github.com/google/jax#installation) for your hardware accelerator. We have tested up to JAX version 0.4.25. The JaxMARL environments can be installed directly from PyPi:
66

7-
``` sh { .yaml .copy }
7+
``` sh
88
pip install jaxmarl
99
```
1010

@@ -13,11 +13,11 @@ pip install jaxmarl
1313
If you would like to also run the algorithms, install the source code as follows:
1414

1515
1. Clone the repository:
16-
``` sh { .yaml .copy }
16+
``` sh
1717
git clone https://github.com/FLAIROx/JaxMARL.git && cd JaxMARL
1818
```
1919
2. Install requirements:
20-
``` sh { .yaml .copy }
20+
``` sh
2121
pip install -e .[algs] && export PYTHONPATH=./JaxMARL:$PYTHONPATH
2222
```
2323
3. For the fastest start, we reccoment using our Dockerfile, the usage of which is outlined below.
24.7 KB
Loading

docs/imgs/mpe_qlearning_speed-1.png

93.6 KB
Loading

docs/imgs/mpe_speedup-1.png

38.4 KB
Loading

docs/imgs/sc2_speedup-1.png

25 KB
Loading

docs/index.md

Lines changed: 33 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -58,21 +58,46 @@ actions = {agent: env.action_space(agent).sample(key_act[i]) for i, agent in enu
5858
obs, state, reward, done, infos = env.step(key_step, state, actions)
5959
```
6060

61-
## Performance Examples
62-
*coming soon*
61+
## JaxMARL's performance
62+
63+
![MPE](imgs/mpe_speedup-1.png){ width=300px}
64+
/// caption
65+
Speed of JaxMARL's training pipeline compared to two popular MARL libraries when training an RNN agent using IPPO on an MPE task.
66+
///
67+
68+
Our paper contains further results but the plot above illustrated the speed ups made possible by JIT-compiling the entire traning loop. JaxMARL is much much faster than traditional approaches, while also producing results consistent with existing implementations.
6369

6470
## Related Works
65-
This works is heavily related to and builds on many other works. We would like to highlight some of the works that we believe would be relevant to readers:
71+
This works is heavily related to and builds on many other works, PureJaxRL provides a [list of projects](https://github.com/luchris429/purejaxrl/blob/main/RESOURCES.md) within the JaxRL ecosystem. Those particularly relevant to multi-agent work are:
72+
73+
JAX-native algorithms:
74+
75+
- [Mava](https://github.com/instadeepai/Mava): JAX implementations of IPPO and MAPPO, two popular MARL algorithms.
76+
- [PureJaxRL](https://github.com/luchris429/purejaxrl): JAX implementation of PPO, and demonstration of end-to-end JAX-based RL training.
77+
78+
JAX-native environments:
79+
80+
- [Gymnax](https://github.com/RobertTLange/gymnax): Implementations of classic RL tasks including classic control, bsuite and MinAtar.
81+
- [Jumanji](https://github.com/instadeepai/jumanji): A diverse set of environments ranging from simple games to NP-hard combinatorial problems.
82+
- [Pgx](https://github.com/sotetsuk/pgx): JAX implementations of classic board games, such as Chess, Go and Shogi.
83+
- [Brax](https://github.com/google/brax): A fully differentiable physics engine written in JAX, features continuous control tasks. We use this as the base for MABrax (as the name suggests!)
84+
- [XLand-MiniGrid](https://github.com/corl-team/xland-minigrid): Meta-RL gridworld environments inspired by XLand and MiniGrid.
85+
86+
Other great JAX related works from our lab are below:
87+
88+
- [JaxIRL](https://github.com/FLAIROx/jaxirl?tab=readme-ov-file): JAX implementation of algorithms for inverse reinforcement learning.
89+
- [Craftax](https://github.com/MichaelTMatthews/Craftax): (Crafter + NetHack) in JAX.
90+
- [JaxUED](https://github.com/DramaCow/jaxued?tab=readme-ov-file): JAX implementations of autocurricula baselines for RL.
91+
- [Kinetix](https://kinetix-env.github.io/): Large-scale training of RL agents in a vast and diverse space of simulated tasks, enabled by JAX.
92+
93+
Other things that could help:
6694

67-
* [Jumanji](https://github.com/instadeepai/jumanji). A suite of JAX-based RL environments. It includes some multi-agent ones such as RobotWarehouse.
68-
* [VectorizedMultiAgentSimulator (VMAS)](https://github.com/proroklab/VectorizedMultiAgentSimulator). It performs similar vectorization for some MARL environments, but is done in PyTorch.
69-
* More to be added soon :)
95+
- [Benchmarl](https://github.com/facebookresearch/BenchMARL): A collection of MARL benchmarks based on TorchRL
7096

71-
More documentation to follow soon!
7297

7398
## Citing JaxMARL
7499
If you use JaxMARL in your work, please cite us as follows:
75-
```bibtex
100+
``` bibtex
76101
@article{flair2023jaxmarl,
77102
title={JaxMARL: Multi-Agent RL Environments in JAX},
78103
author={Alexander Rutherford and Benjamin Ellis and Matteo Gallici and Jonathan Cook and Andrei Lupu and Gardar Ingvarsson and Timon Willi and Akbir Khan and Christian Schroeder de Witt and Alexandra Souly and Saptarashmi Bandyopadhyay and Mikayel Samvelyan and Minqi Jiang and Robert Tjarko Lange and Shimon Whiteson and Bruno Lacerda and Nick Hawes and Tim Rocktaschel and Chris Lu and Jakob Nicolaus Foerster},

mkdocs.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ theme:
55
name: material
66
features:
77
- navigation.sections
8+
- content.code.copy
89
palette:
910
# Dark Mode
1011
- scheme: slate
@@ -40,4 +41,6 @@ markdown_extensions:
4041
pygments_lang_class: true
4142
- pymdownx.inlinehilite
4243
- pymdownx.snippets
43-
- pymdownx.superfences
44+
- pymdownx.superfences
45+
- pymdownx.blocks.caption
46+
- attr_list

0 commit comments

Comments
 (0)