This repository contains the three environments introduced in 'Physically Embedded Planning Problems: New Challenges for Reinforcement Learning'
If you use this package, please cite our accompanying tech report:
@misc{mirza2020physically,
title={Physically Embedded Planning Problems: New Challenges for Reinforcement Learning},
author={Mehdi Mirza and Andrew Jaegle and Jonathan J. Hunt and Arthur Guez and Saran Tunyasuvunakool and Alistair Muldal and Théophane Weber and Peter Karkus and Sébastien Racanière and Lars Buesing and Timothy Lillicrap and Nicolas Heess},
year={2020},
eprint={2009.05524},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
This repository is divided into 'mujoban' and 'board_games' folders. Both of them are built on top of dm_control which requires MuJoCo. Please follow these instructions to install MuJoCo. Other dependencies can be installed by:
pip3 install -r requirements.txt
The game logic is based on open_spiel. Please install as instructed here. gnugo is required to play the game of Go against a non-random opponent. gnugo can be installed in Ubuntu by:
apt install gnugo
Board game scripts expect gnugo binary to be at: /usr/games/gnugo
. Users can
change this path inside board_games/go_logic.py
This library has only been tested on Ubuntu.
The code snippets below show examples of instantiating each of the environments.
from dm_control import composer
from dm_control.locomotion import walkers
from physics_planning_games.mujoban.mujoban import Mujoban
from physics_planning_games.mujoban.mujoban_level import MujobanLevel
from physics_planning_games.mujoban.boxoban import boxoban_level_generator
walker = walkers.JumpingBallWithHead(add_ears=True, camera_height=0.25)
maze = MujobanLevel(boxoban_level_generator)
task = Mujoban(walker=walker,
maze=maze,
control_timestep=0.1,
top_camera_height=96,
top_camera_width=96)
env = composer.Environment(time_limit=1000, task=task)
from physics_planning_games import board_games
environment_name = 'go_7x7'
env = board_games.load(environment_name=environment_name)
The returned environments are of type of dm_env.Environment
and can be stepped
through as shown here with random actions:
import numpy as np
timestep = env.reset()
action_spec = env.action_spec()
while True:
action = np.stack([
np.random.uniform(low=minimum, high=maximum)
for minimum, maximum in zip(action_spec.minimum, action_spec.maximum)
])
timestep = env.step(action)
For visualization of the environments explore.py
loads them using the viewer
from dm_control.
For more details please refer to the tech report, dm_control and dm_env.