Skip to content

LuciusMos/awesome-rl-env-zoo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Awesome RL Env Zoo

This is a collection of RL envs which are frequently used in academic researches. And the repository will be continuously updated.

Welcome to follow and star!

Table of Contents

Format and Terminology

Format

Env Name

  • Description Table
  • Overview
  • Spaces
    • Observation Space
    • Action Space
    • Reward Range
  • Useful Links
    • Env Repo
    • Blog/Doc
    • Public Agent
  • (optional) Special Subenv

Terminology in the description table

  1. Scale. Time cost to train a fair policy, with 1 NVIDIA V100 + 32-core CPU.
Micro Small Middle Large
< 30 minutes 1-4 hours 8-24 hours > 1 day
Pendulum, CartPole, Gym hybrid MPE, Slimevolley, MuJoCo Procgen, D4rl, Atari, SMAC MineRL, CARLA, GRF
  1. State/Observation.
Vector Image Nested
A list of numbers. Often 3-channel RGB image. Like struct in C language: Containing multiple members, and each member can be Vector or Image.
MPE, MuJoCo Atari, DMControl MineRL, CARLA
  1. Action.
Discrete Continuous Hybrid
Integer Float Contains both
Atari, SMAC MuJoCo, DMControl Gym hybrid, CARLA
  1. Reward.
Many orders of magnitude (Magnitude) Sparse reward (Sparse) Multi-reward mixture (Multi)
Magnitudes and frequencies of rewards vary wildly between different games or episodes. You can refer to Learning values across many orders of magnitude. Rewards extrinsic to the agent are extremely sparse, or absent altogether. You can refer to ICM paper and RND paper. More than one type of reward is measured. You can refer to Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis.
Centipede of Atari Minigrid, SMAC CARLA
  1. Termination.
Finite Infinite
An episode will end at some point. An episode will not end until you terminate it.
Atari, SMAC HalfCheetah of MuJoCo
  1. Others.
Procedural Content Generation (PCG) Large Difference among sub-envs (LD) Multi Agent (MA)
Sub-environments are randomly created, encouraging agent to robustly learn a relevant skill, other than memorizing specific trajectories. You can refer to Procgen paper. Different sub-envs vary a lot. You can refer to the radar plot in bsuite. You must control more than one agent at a time. You can refer to An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective.
Procgen MuJoCo, MPE, DMControl MPE, SMAC, GRF

Envs

Atari

Pong Qbert SpaceInvaders MontezumaRevenge
Scale Observation Action Reward Termintaion Others
Middle Image Discrete Fluctuate Finite LD
  • Overview: Atari 2600 has been the standard environment to test new Reinforcement Learning algorithms since Deep Q-Networks were introduced by Mnih et al. in 2013. Atari 2600 has been a challenging testbed due to its high-dimensional video input (size 210 x 160, frequency 60 Hz) and the discrepancy of tasks between games. The OpenAI Gym wrap Atari 2600 with a more standardized interface, and provides 59 Atari 2600 games as RL environments.
  • Spaces (take Pong for example)
    • Observation space: Box(0, 255, (210, 160, 3), uint8)
    • Action space: Discrete(6)
    • Reward range: (-inf, inf)
  • Useful Links
  • Special Subenv: MontezumaRevenge

MuJoCo

Hopper HalfCheetah Ant Walker2D
Scale Observation Action Reward Termintaion Others
Small Vector Continuous Fluctuate, Multi Finite LD
  • Overview: Mujoco is a physics engine on robotics, biomechanics, graphics, animation, etc. that require fast and accurate simulation. It is often used as a benchmarking environment for continuous-space Reinforcement Learning algorithms. It is a collection of 20 sub-environments. Commonly used ones are Ant, Half Cheetah, Hopper, Huanmoid, Walker2D, etc.
  • Spaces (take Hopper for example)
    • Observation space: Box(-inf, inf, (11, ), float32)
    • Action space: Box(-1.0, 1.0, (3,), float32)
    • Reward range: (-inf, inf)
  • Useful Links

MPE

(PettingZoo version)

Simple Adversary Simple Speaker Listener Simple Spread Simple World Comm
Scale Observation Action Reward Termintaion Others
Small Vector Discrete/Continuous Fluctuate, Multi Finite LD, MA
  • Overview: PettingZoo is a library of different multi-agent environments under a single elegant Python API, which is similar to OpenAI Gym library. PettingZoo is for Multi-Agent Eeinforcement Learning, while Gym is for Single-Agent. Multi Particle Environments (MPE) are also integrated in PettingZoo. MPE are a set of communication oriented environment where particle agents can (sometimes) move, communicate, see each other, push each other around, and interact with fixed landmarks.
  • Spaces (take SimpleSpread for example)
    • Agent number: 3
    • Observation space: Box(-inf, inf, (18, ), float32)
    • Action space: Discrete(5) (Discrete) / (0.0, 1.0, (5,), float32) (Continuous)
    • Reward range: (-inf, inf)
  • Useful Links

SMAC

3s_vs_5z 2c_vs_64zg corridor
Scale Observation Action Reward Termintaion Others
Middle Vector Discrete Sparse Finite MA
  • Overview: SMAC is an environment for Multi-Agent collaborative Reinforcement Learning (MARL) on Blizzard StarCraft II, which is short for "StarCraft Multi-Agent Challenge". SMAC uses Blizzard StarCraft II’s machine learning API and DeepMind’s PySC2 to provide a friendly interface for the interaction between agents and StarCraft II. Compared to PySC2, SMAC focuses on a decentralized micro-operation scheme, where each agent of the game is controlled by a separate RL agent.
  • Spaces (take 3s_vs_5z for example)
    • Agent number: 3
    • Observation space: Box(-inf, inf, (48, ), float32) (obs) & Box(-inf, inf, (68, ), float32) (state)
    • Action space: Discrete(11)
    • Reward range: (-inf, inf)
  • Useful Links

Contributing

Our purpose is to make this repo even better. If you are interested in contributing, please refer to HERE for instructions in contribution.

License

Awesome Decision Transformer is released under the Apache 2.0 license.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published