This is a collection of RL envs which are frequently used in academic researches. And the repository will be continuously updated.
Welcome to follow and star!
Env Name
- Description Table
- Overview
- Spaces
- Observation Space
- Action Space
- Reward Range
- Useful Links
- Env Repo
- Blog/Doc
- Public Agent
- (optional) Special Subenv
- Scale. Time cost to train a fair policy, with 1 NVIDIA V100 + 32-core CPU.
Micro | Small | Middle | Large |
---|---|---|---|
< 30 minutes | 1-4 hours | 8-24 hours | > 1 day |
Pendulum, CartPole, Gym hybrid | MPE, Slimevolley, MuJoCo | Procgen, D4rl, Atari, SMAC | MineRL, CARLA, GRF |
- State/Observation.
Vector | Image | Nested |
---|---|---|
A list of numbers. | Often 3-channel RGB image. | Like struct in C language: Containing multiple members, and each member can be Vector or Image. |
MPE, MuJoCo | Atari, DMControl | MineRL, CARLA |
- Action.
Discrete | Continuous | Hybrid |
---|---|---|
Integer | Float | Contains both |
Atari, SMAC | MuJoCo, DMControl | Gym hybrid, CARLA |
- Reward.
Many orders of magnitude (Magnitude) | Sparse reward (Sparse) | Multi-reward mixture (Multi) |
---|---|---|
Magnitudes and frequencies of rewards vary wildly between different games or episodes. You can refer to Learning values across many orders of magnitude. | Rewards extrinsic to the agent are extremely sparse, or absent altogether. You can refer to ICM paper and RND paper. | More than one type of reward is measured. You can refer to Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis. |
Centipede of Atari |
Minigrid, SMAC | CARLA |
- Termination.
Finite | Infinite |
---|---|
An episode will end at some point. | An episode will not end until you terminate it. |
Atari, SMAC | HalfCheetah of MuJoCo |
- Others.
Procedural Content Generation (PCG) | Large Difference among sub-envs (LD) | Multi Agent (MA) |
---|---|---|
Sub-environments are randomly created, encouraging agent to robustly learn a relevant skill, other than memorizing specific trajectories. You can refer to Procgen paper. | Different sub-envs vary a lot. You can refer to the radar plot in bsuite. | You must control more than one agent at a time. You can refer to An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective. |
Procgen | MuJoCo, MPE, DMControl | MPE, SMAC, GRF |
Pong | Qbert | SpaceInvaders | MontezumaRevenge |
---|---|---|---|
Scale | Observation | Action | Reward | Termintaion | Others |
---|---|---|---|---|---|
Middle | Image | Discrete | Fluctuate | Finite | LD |
- Overview: Atari 2600 has been the standard environment to test new Reinforcement Learning algorithms since Deep Q-Networks were introduced by Mnih et al. in 2013. Atari 2600 has been a challenging testbed due to its high-dimensional video input (size 210 x 160, frequency 60 Hz) and the discrepancy of tasks between games. The OpenAI Gym wrap Atari 2600 with a more standardized interface, and provides 59 Atari 2600 games as RL environments.
- Spaces (take
Pong
for example)- Observation space:
Box(0, 255, (210, 160, 3), uint8)
- Action space:
Discrete(6)
- Reward range:
(-inf, inf)
- Observation space:
- Useful Links
- Env Repo: The Arcade Learning Environment (ALE)
- Blog/Doc: Official Gym Documentation
- Blog/Doc: DI-engine doc (English | 中文)
- Public Agent: Stable Baselines 3, RL Baselines3 Zoo DQN | PPO | A2C
- Public Agent: DI-engine
- Public Agent: tianshou
- Special Subenv:
MontezumaRevenge
- Key: Sparse reward
- Useful links:
- Blog/Doc: Prediction-Based Rewards (By OpenAI)
- Public Agent: Random Network Distillation (RND)
- Public Agent: Go-Explore
Hopper | HalfCheetah | Ant | Walker2D |
---|---|---|---|
Scale | Observation | Action | Reward | Termintaion | Others |
---|---|---|---|---|---|
Small | Vector | Continuous | Fluctuate, Multi | Finite | LD |
- Overview: Mujoco is a physics engine on robotics, biomechanics, graphics, animation, etc. that require fast and accurate simulation. It is often used as a benchmarking environment for continuous-space Reinforcement Learning algorithms. It is a collection of 20 sub-environments. Commonly used ones are
Ant
,Half Cheetah
,Hopper
,Huanmoid
,Walker2D
, etc. - Spaces (take
Hopper
for example)- Observation space:
Box(-inf, inf, (11, ), float32)
- Action space:
Box(-1.0, 1.0, (3,), float32)
- Reward range:
(-inf, inf)
- Observation space:
- Useful Links
- Env Repo: mujoco (by DeepMind), mujoco-py (by OpenAI)
- Blog/Doc: Official Gym Documentation
- Blog/Doc: DI-engine doc (English | 中文)
- Public Agent: Stable Baselines 3, RL Baselines3 Zoo TD3 | PPO | SAC
- Public Agent: DI-engine
- Public Agent: ChainerRL
(PettingZoo version)
Simple Adversary | Simple Speaker Listener | Simple Spread | Simple World Comm |
---|---|---|---|
Scale | Observation | Action | Reward | Termintaion | Others |
---|---|---|---|---|---|
Small | Vector | Discrete/Continuous | Fluctuate, Multi | Finite | LD, MA |
- Overview: PettingZoo is a library of different multi-agent environments under a single elegant Python API, which is similar to OpenAI Gym library. PettingZoo is for Multi-Agent Eeinforcement Learning, while Gym is for Single-Agent. Multi Particle Environments (MPE) are also integrated in PettingZoo. MPE are a set of communication oriented environment where particle agents can (sometimes) move, communicate, see each other, push each other around, and interact with fixed landmarks.
- Spaces (take
SimpleSpread
for example)- Agent number: 3
- Observation space:
Box(-inf, inf, (18, ), float32)
- Action space:
Discrete(5)
(Discrete) /(0.0, 1.0, (5,), float32)
(Continuous) - Reward range:
(-inf, inf)
- Useful Links
- Env Repo: PettingZoo
- Blog/Doc: Official Documentation
- Blog/Doc: DI-engine doc (中文)
- Public Agent: DI-engine
- Public Agent: MAPPO
3s_vs_5z | 2c_vs_64zg | corridor |
---|---|---|
Scale | Observation | Action | Reward | Termintaion | Others |
---|---|---|---|---|---|
Middle | Vector | Discrete | Sparse | Finite | MA |
- Overview: SMAC is an environment for Multi-Agent collaborative Reinforcement Learning (MARL) on Blizzard StarCraft II, which is short for "StarCraft Multi-Agent Challenge". SMAC uses Blizzard StarCraft II’s machine learning API and DeepMind’s PySC2 to provide a friendly interface for the interaction between agents and StarCraft II. Compared to PySC2, SMAC focuses on a decentralized micro-operation scheme, where each agent of the game is controlled by a separate RL agent.
- Spaces (take
3s_vs_5z
for example)- Agent number: 3
- Observation space:
Box(-inf, inf, (48, ), float32)
(obs) &Box(-inf, inf, (68, ), float32)
(state) - Action space:
Discrete(11)
- Reward range:
(-inf, inf)
- Useful Links
Our purpose is to make this repo even better. If you are interested in contributing, please refer to HERE for instructions in contribution.
Awesome Decision Transformer is released under the Apache 2.0 license.