You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/Environments/mpe.md
+21-22Lines changed: 21 additions & 22 deletions
Original file line number
Diff line number
Diff line change
@@ -1,32 +1,27 @@
1
1
# MPE
2
2
3
-
Multi Particle Environments (MPE) are a set of communication oriented environment where particle agents can (sometimes) move, communicate, see each other, push each other around, and interact with fixed landmarks. We implement all of the [PettingZoo MPE Environments](https://pettingzoo.farama.org/environments/mpe/).
3
+
Multi Particle Environments (MPE) are a set of communication oriented environment where particle agents can (sometimes) move, communicate, see each other, push each other around, and interact with fixed landmarks.
The implementations follow the PettingZoo code as closely as possible, including sharing variable names and version numbers. There are occasional discrepancies between the PettingZoo code and docs, where this occurs we have followed the code. As our implementation closely follows the PettingZoo code, please refer to their documentation for further information on the environments.
We additionally include a fully cooperative variant of Simple Tag, first used to evaluate FACMAC. In this environmnet, a number of agents attempt to tag a number of prey, where the prey are controlled by a heuristic AI.
32
27
@@ -36,6 +31,10 @@ We additionally include a fully cooperative variant of Simple Tag, first used to
36
31
| 6 agents, 2 prey |`MPE_simple_facmac_6a_v1`|
37
32
| 9 agents, 3 prey |`MPE_simple_facmac_9a_v1`|
38
33
34
+
## Implementation notes
35
+
36
+
The implementations follow the PettingZoo code as closely as possible, including sharing variable names and version numbers. There are occasional discrepancies between the PettingZoo code and docs, where this occurs we have followed the code. As our implementation closely follows the PettingZoo code, please refer to their documentation for further information on the environments.
37
+
39
38
## Action Space
40
39
Following the PettingZoo implementation, we allow for both discrete or continuous action spaces in all MPE envrionments. The environments use discrete actions by default.
41
40
@@ -53,7 +52,7 @@ The exact observation varies for each environment, but in general it is a vector
53
52
## Visualisation
54
53
Check the example `mpe_introduction.py` file in the tutorials folder for an introduction to our implementation of the MPE environments, including an example visualisation. We animate the environment after the state transitions have been collected as follows:
55
54
56
-
```python
55
+
```python
57
56
import jax
58
57
from jaxmarl import make
59
58
from jaxmarl.environments.mpe import MPEVisualizer
Copy file name to clipboardExpand all lines: docs/Environments/smax.md
+9-2Lines changed: 9 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,13 @@
1
1
# SMAX
2
-
## Description
3
-
SMAX is a purely JAX SMAC-like environment. It, like SMAC, focuses on decentralised unit micromanagement across a range of scenarios. Each scenario features fixed teams.
2
+
3
+
**SMAX is a purely JAX SMAC-like environment**. It, like SMAC, focuses on decentralised unit micromanagement across a range of scenarios. Each scenario features fixed teams.
Spatial-Temporal Representations of Matrix Games (STORM) is inspired by the "in the Matrix" games in [Melting Pot 2.0](https://arxiv.org/abs/2211.13746), the [STORM](https://openreview.net/forum?id=54F8woU8vhq) environment expands on matrix games by representing them as grid-world scenarios. Agents collect resources which define their strategy during interactions and are rewarded based on a pre-specified payoff matrix. This allows for the embedding of fully cooperative, competitive or general-sum games, such as the prisoner's dilemma.
4
7
5
8
Thus, STORM can be used for studying paradigms such as *opponent shaping*, where agents act with the intent to change other agents' learning dynamics. Compared to the Coin Game or matrix games, the grid-world setting presents a variety of new challenges such as partial observability, multi-step agent interactions, temporally-extended actions, and longer time horizons. Unlike the "in the Matrix" games from Melting Pot, STORM features stochasticity, increasing the difficulty
6
9
10
+
## Environment explanation
11
+
12
+
7
13
8
14
## Visualisation
9
15
10
16
We render each timestep and then create a gif from the collection of images. Further examples are provided [here](https://github.com/FLAIROx/JaxMARL/tree/main/jaxmarl/tutorials).
Copy file name to clipboardExpand all lines: docs/Installation.md
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
5
5
Before installing, ensure you have the correct [JAX installation](https://github.com/google/jax#installation) for your hardware accelerator. We have tested up to JAX version 0.4.25. The JaxMARL environments can be installed directly from PyPi:
6
6
7
-
```sh { .yaml .copy }
7
+
```sh
8
8
pip install jaxmarl
9
9
```
10
10
@@ -13,11 +13,11 @@ pip install jaxmarl
13
13
If you would like to also run the algorithms, install the source code as follows:
Speed of JaxMARL's training pipeline compared to two popular MARL libraries when training an RNN agent using IPPO on an MPE task.
66
+
///
67
+
68
+
Our paper contains further results but the plot above illustrated the speed ups made possible by JIT-compiling the entire traning loop. JaxMARL is much much faster than traditional approaches, while also producing results consistent with existing implementations.
63
69
64
70
## Related Works
65
-
This works is heavily related to and builds on many other works. We would like to highlight some of the works that we believe would be relevant to readers:
71
+
This works is heavily related to and builds on many other works, PureJaxRL provides a [list of projects](https://github.com/luchris429/purejaxrl/blob/main/RESOURCES.md) within the JaxRL ecosystem. Those particularly relevant to multi-agent work are:
72
+
73
+
JAX-native algorithms:
74
+
75
+
-[Mava](https://github.com/instadeepai/Mava): JAX implementations of IPPO and MAPPO, two popular MARL algorithms.
76
+
-[PureJaxRL](https://github.com/luchris429/purejaxrl): JAX implementation of PPO, and demonstration of end-to-end JAX-based RL training.
77
+
78
+
JAX-native environments:
79
+
80
+
-[Gymnax](https://github.com/RobertTLange/gymnax): Implementations of classic RL tasks including classic control, bsuite and MinAtar.
81
+
-[Jumanji](https://github.com/instadeepai/jumanji): A diverse set of environments ranging from simple games to NP-hard combinatorial problems.
82
+
-[Pgx](https://github.com/sotetsuk/pgx): JAX implementations of classic board games, such as Chess, Go and Shogi.
83
+
-[Brax](https://github.com/google/brax): A fully differentiable physics engine written in JAX, features continuous control tasks. We use this as the base for MABrax (as the name suggests!)
84
+
-[XLand-MiniGrid](https://github.com/corl-team/xland-minigrid): Meta-RL gridworld environments inspired by XLand and MiniGrid.
85
+
86
+
Other great JAX related works from our lab are below:
87
+
88
+
-[JaxIRL](https://github.com/FLAIROx/jaxirl?tab=readme-ov-file): JAX implementation of algorithms for inverse reinforcement learning.
89
+
-[Craftax](https://github.com/MichaelTMatthews/Craftax): (Crafter + NetHack) in JAX.
90
+
-[JaxUED](https://github.com/DramaCow/jaxued?tab=readme-ov-file): JAX implementations of autocurricula baselines for RL.
91
+
-[Kinetix](https://kinetix-env.github.io/): Large-scale training of RL agents in a vast and diverse space of simulated tasks, enabled by JAX.
92
+
93
+
Other things that could help:
66
94
67
-
*[Jumanji](https://github.com/instadeepai/jumanji). A suite of JAX-based RL environments. It includes some multi-agent ones such as RobotWarehouse.
68
-
*[VectorizedMultiAgentSimulator (VMAS)](https://github.com/proroklab/VectorizedMultiAgentSimulator). It performs similar vectorization for some MARL environments, but is done in PyTorch.
69
-
* More to be added soon :)
95
+
-[Benchmarl](https://github.com/facebookresearch/BenchMARL): A collection of MARL benchmarks based on TorchRL
70
96
71
-
More documentation to follow soon!
72
97
73
98
## Citing JaxMARL
74
99
If you use JaxMARL in your work, please cite us as follows:
75
-
```bibtex
100
+
```bibtex
76
101
@article{flair2023jaxmarl,
77
102
title={JaxMARL: Multi-Agent RL Environments in JAX},
78
103
author={Alexander Rutherford and Benjamin Ellis and Matteo Gallici and Jonathan Cook and Andrei Lupu and Gardar Ingvarsson and Timon Willi and Akbir Khan and Christian Schroeder de Witt and Alexandra Souly and Saptarashmi Bandyopadhyay and Mikayel Samvelyan and Minqi Jiang and Robert Tjarko Lange and Shimon Whiteson and Bruno Lacerda and Nick Hawes and Tim Rocktaschel and Chris Lu and Jakob Nicolaus Foerster},
0 commit comments