Skip to content

The Reinforcement Learning environment for AI research in μRTS, a Real-time Strategy game simulator.

Notifications You must be signed in to change notification settings

j-ufmg/gym-microrts

 
 

Repository files navigation

Gym-μRTS (pronounced "gym-micro-RTS")

This repo contains the source code for the gym wrapper of μRTS authored by Santiago Ontañón.

demo.gif

Get Started

Prerequisites:

  • Python 3.8+
  • Poetry
  • Java 8.0+
  • FFmpeg (for video recording utilities)
$ git clone --recursive https://github.com/vwxyzjn/gym-microrts.git && \
cd gym-microrts 
poetry install
# build microrts
cd gym_microrts/microrts && bash build.sh > build.log && cd ..&& cd ..
python new_hello_world.py

To train an agent, run the following

python experiments/new_ppo_gridnet.py \
    --total-timesteps 100000000 \
    --wandb-project-name gym-microrts \
    --capture-video \
    --seed 1

For running a partial observable example, tune the partial_obs argument.

envs = MicroRTSGridModeVecEnv(..., partial_obs=True)

Technical Paper

Before diving into the code, we highly recommend reading the preprint of our paper: Gym-μRTS: Toward Affordable Deep Reinforcement Learning Research in Real-time Strategy Games

Depreciation note

Note that the experiments in the technical paper above are done with gym_microrts==0.3.2. As we move forward beyond v0.4.x, we are planing to deprecate UAS despite its better performance in the paper. This is because UAS has more complex implementation and makes it really difficult to incorporate selfplay or imitation learning in the future.

Environment Specification

Here is a description of Gym-μRTS's observation and action space:

  • Observation Space. (Box(0, 1, (h, w, 27), int32)) Given a map of size h x w, the observation is a tensor of shape (h, w, n_f), where n_f is a number of feature planes that have binary values. The observation space used in this paper uses 27 feature planes as shown in the following table. A feature plane can be thought of as a concatenation of multiple one-hot encoded features. As an example, if there is a worker with hit points equal to 1, not carrying any resources, owner being Player 1, and currently not executing any actions, then the one-hot encoding features will look like the following:

    [0,1,0,0,0], [1,0,0,0,0], [1,0,0], [0,0,0,0,1,0,0,0], [1,0,0,0,0,0]

    The 27 values of each feature plane for the position in the map of such worker will thus be:

    [0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0]

  • Partial Observation Space. (Box(0, 1, (h, w, 29), int32)) Given a map of size h x w, the observation is a tensor of shape (h, w, n_f), where n_f is a number of feature planes that have binary values. The observation space for partial observability uses 29 feature planes as shown in the following table. A feature plane can be thought of as a concatenation of multiple one-hot encoded features. As an example, if there is a worker with hit points equal to 1, not carrying any resources, owner being Player 1, currently not executing any actions, and not visible to the opponent, then the one-hot encoding features will look like the following:

    [0,1,0,0,0], [1,0,0,0,0], [1,0,0], [0,0,0,0,1,0,0,0], [1,0,0,0,0,0], [1,0]

    The 29 values of each feature plane for the position in the map of such worker will thus be:

    [0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0]

  • Action Space. (MultiDiscrete(concat(h * w * [[6 4 4 4 4 7 a_r]]))) Given a map of size h x w and the maximum attack range a_r=7, the action is an (7hw)-dimensional vector of discrete values as specified in the following table. The first 7 component of the action vector represents the actions issued to the unit at x=0,y=0, and the second 7 component represents actions issued to the unit at x=0,y=1, etc. In these 7 components, the first component is the action type, and the rest of components represent the different parameters different action types can take. Depending on which action type is selected, the game engine will use the corresponding parameters to execute the action. As an example, if the RL agent issues a move south action to the worker at $x=0, y=1$ in a 2x2 map, the action will be encoded in the following way:

    concat([0,0,0,0,0,0,0], [1,2,0,0,0,0,0], [0,0,0,0,0,0,0], [0,0,0,0,0,0,0]] =[0,0,0,0,0,0,0,1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]

image

Known issues

[ ] Rendering does not exactly work in macos. See jpype-project/jpype#906

Papers written using Gym-μRTS

About

The Reinforcement Learning environment for AI research in μRTS, a Real-time Strategy game simulator.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 93.8%
  • Shell 6.1%
  • Dockerfile 0.1%