Awesome RL Env Zoo

This is a collection of RL envs which are frequently used in academic researches. And the repository will be continuously updated.

Welcome to follow and star!

Format and Terminology

Format

Env Name

Description Table
Overview
Spaces
- Observation Space
- Action Space
- Reward Range
Useful Links
- Env Repo
- Blog/Doc
- Public Agent
(optional) Special Subenv

Terminology in the description table

Scale. Time cost to train a fair policy, with 1 NVIDIA V100 + 32-core CPU.

Micro	Small	Middle	Large
< 30 minutes	1-4 hours	8-24 hours	> 1 day
Pendulum, CartPole, Gym hybrid	MPE, Slimevolley, MuJoCo	Procgen, D4rl, Atari, SMAC	MineRL, CARLA, GRF

State/Observation.

Vector	Image	Nested
A list of numbers.	Often 3-channel RGB image.	Like `struct` in C language: Containing multiple members, and each member can be Vector or Image.
MPE, MuJoCo	Atari, DMControl	MineRL, CARLA

Action.

Discrete	Continuous	Hybrid
Integer	Float	Contains both
Atari, SMAC	MuJoCo, DMControl	Gym hybrid, CARLA

Reward.

Many orders of magnitude (Magnitude)	Sparse reward (Sparse)	Multi-reward mixture (Multi)
Magnitudes and frequencies of rewards vary wildly between different games or episodes. You can refer to Learning values across many orders of magnitude.	Rewards extrinsic to the agent are extremely sparse, or absent altogether. You can refer to ICM paper and RND paper.	More than one type of reward is measured. You can refer to Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis.
`Centipede` of Atari	Minigrid, SMAC	CARLA

Termination.

Finite	Infinite
An episode will end at some point.	An episode will not end until you terminate it.
Atari, SMAC	`HalfCheetah` of MuJoCo

Others.

Procedural Content Generation (PCG)	Large Difference among sub-envs (LD)	Multi Agent (MA)
Sub-environments are randomly created, encouraging agent to robustly learn a relevant skill, other than memorizing specific trajectories. You can refer to Procgen paper.	Different sub-envs vary a lot. You can refer to the radar plot in bsuite.	You must control more than one agent at a time. You can refer to An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective.
Procgen	MuJoCo, MPE, DMControl	MPE, SMAC, GRF

Envs

Atari

Pong	Qbert	SpaceInvaders	MontezumaRevenge

Scale	Observation	Action	Reward	Termintaion	Others
Middle	Image	Discrete	Fluctuate	Finite	LD

Overview: Atari 2600 has been the standard environment to test new Reinforcement Learning algorithms since Deep Q-Networks were introduced by Mnih et al. in 2013. Atari 2600 has been a challenging testbed due to its high-dimensional video input (size 210 x 160, frequency 60 Hz) and the discrepancy of tasks between games. The OpenAI Gym wrap Atari 2600 with a more standardized interface, and provides 59 Atari 2600 games as RL environments.
Spaces (take Pong for example)
- Observation space: Box(0, 255, (210, 160, 3), uint8)
- Action space: Discrete(6)
- Reward range: (-inf, inf)
Useful Links
- Env Repo: The Arcade Learning Environment (ALE)
- Blog/Doc: Official Gym Documentation
- Blog/Doc: DI-engine doc (English | 中文)
- Public Agent: Stable Baselines 3, RL Baselines3 Zoo DQN | PPO | A2C
- Public Agent: DI-engine
- Public Agent: tianshou
Special Subenv: MontezumaRevenge
- Key: Sparse reward
- Useful links:
  - Blog/Doc: Prediction-Based Rewards (By OpenAI)
  - Public Agent: Random Network Distillation (RND)
  - Public Agent: Go-Explore

MuJoCo

Hopper	HalfCheetah	Ant	Walker2D

Scale	Observation	Action	Reward	Termintaion	Others
Small	Vector	Continuous	Fluctuate, Multi	Finite	LD

Overview: Mujoco is a physics engine on robotics, biomechanics, graphics, animation, etc. that require fast and accurate simulation. It is often used as a benchmarking environment for continuous-space Reinforcement Learning algorithms. It is a collection of 20 sub-environments. Commonly used ones are Ant, Half Cheetah, Hopper, Huanmoid, Walker2D, etc.
Spaces (take Hopper for example)
- Observation space: Box(-inf, inf, (11, ), float32)
- Action space: Box(-1.0, 1.0, (3,), float32)
- Reward range: (-inf, inf)
Useful Links
- Env Repo: mujoco (by DeepMind), mujoco-py (by OpenAI)
- Blog/Doc: Official Gym Documentation
- Blog/Doc: DI-engine doc (English | 中文)
- Public Agent: Stable Baselines 3, RL Baselines3 Zoo TD3 | PPO | SAC
- Public Agent: DI-engine
- Public Agent: ChainerRL

MPE

(PettingZoo version)

Simple Adversary	Simple Speaker Listener	Simple Spread	Simple World Comm

Scale	Observation	Action	Reward	Termintaion	Others
Small	Vector	Discrete/Continuous	Fluctuate, Multi	Finite	LD, MA

Overview: PettingZoo is a library of different multi-agent environments under a single elegant Python API, which is similar to OpenAI Gym library. PettingZoo is for Multi-Agent Eeinforcement Learning, while Gym is for Single-Agent. Multi Particle Environments (MPE) are also integrated in PettingZoo. MPE are a set of communication oriented environment where particle agents can (sometimes) move, communicate, see each other, push each other around, and interact with fixed landmarks.
Spaces (take SimpleSpread for example)
- Agent number: 3
- Observation space: Box(-inf, inf, (18, ), float32)
- Action space: Discrete(5) (Discrete) / (0.0, 1.0, (5,), float32) (Continuous)
- Reward range: (-inf, inf)
Useful Links
- Env Repo: PettingZoo
- Blog/Doc: Official Documentation
- Blog/Doc: DI-engine doc (中文)
- Public Agent: DI-engine
- Public Agent: MAPPO

SMAC

3s_vs_5z	2c_vs_64zg	corridor

Scale	Observation	Action	Reward	Termintaion	Others
Middle	Vector	Discrete	Sparse	Finite	MA

Overview: SMAC is an environment for Multi-Agent collaborative Reinforcement Learning (MARL) on Blizzard StarCraft II, which is short for "StarCraft Multi-Agent Challenge". SMAC uses Blizzard StarCraft II’s machine learning API and DeepMind’s PySC2 to provide a friendly interface for the interaction between agents and StarCraft II. Compared to PySC2, SMAC focuses on a decentralized micro-operation scheme, where each agent of the game is controlled by a separate RL agent.
Spaces (take 3s_vs_5z for example)
- Agent number: 3
- Observation space: Box(-inf, inf, (48, ), float32) (obs) & Box(-inf, inf, (68, ), float32) (state)
- Action space: Discrete(11)
- Reward range: (-inf, inf)
Useful Links
- Env Repo: SMAC
- Blog/Doc: DI-engine doc (English | 中文)
- Public Agent: DI-engine
- Public Agent: PyMARL2
- Public Agent: RLlib

Contributing

Our purpose is to make this repo even better. If you are interested in contributing, please refer to HERE for instructions in contribution.

License

Awesome Decision Transformer is released under the Apache 2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
image		image
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome RL Env Zoo

Table of Contents

Format and Terminology

Format

Terminology in the description table

Envs

Atari

MuJoCo

MPE

SMAC

Contributing

License

About

Releases

Packages

License

LuciusMos/awesome-rl-env-zoo

Folders and files

Latest commit

History

Repository files navigation

Awesome RL Env Zoo

Table of Contents

Format and Terminology

Format

Terminology in the description table

Envs

Atari

MuJoCo

MPE

SMAC

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages