SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

Official codebase for SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning.

This repo was forked from the original repo of sunrise by pokaxpoka.

My contribution consist of the design and implementation of a maze-like environment to test SUNRISE.

Abstract:

In the framework of approximate Reinforcement Learning, deep neural networks brought a true improvement to the performance of RL agents in previously human-dominated arenas (e.g. Alpha-GO). However, the optimization of these non-linear universal approximators remains a tough task that induces instability in the learning process. In order to address this issue -but not only-, SUNRISE was conceived as a unified framework that can be mounted on any off-policy algorithm to boost its performance. In this project, we first try to reproduce some of the original paper's results on Atari environments, we then design a challenging maze-like environment to further assess the performance of SUNRISE.

Environment description:

The choice of such an environment results from its particular difficulty towards model-free RL algorithms. Indeed, the reward is very sparse and exploration is a key ingredient to succeeding in such environment. Hence the relevance of evaluating the SUNRISE framework on this setting.

The properties of the environment are:

Composed of an Agent (yellow), a Goal (green), and Traps (blue).
The terminal states are the Goal and the Traps.
A sparse reward, Moving penalty (-1), Out-of-the-maze penalty (-5), Falling into trap penalty (-10), Goal reward (+10)
Lifetime, The game terminates if a lower bound on the total reward is reached (-100)

The experiments will be conducted using the default hyper-parameters stated by the paper for discrete control tasks: N = 5, beta = 1, T = 40, and lambda = 10. The reference algorithm is again RainbowDQN.

We tried SUNRISE on Three different configurations: (SIZE,N_TRAPS) \in {(5,3),(7,5),(10,10)}.

SUNRISE managed to find its way out throughout the maze in the three cases (with different training times though), the average reward is shown in Figure 2.

(this is a discrete control task, thus the reference algorithm used for comparison is RainbowDQN. RainbowDQN is a mix of many techniques that proved to upgrade DQN performance on common benchmarks, among these we list: Double Q-Learning, Dueling networks, Distributional DQN, and Noisy networks)

The three configurations required respectively 10k, 10k, and 30k training iterations to solve the maze. It is also important to mention that training SUNRISE takes up to 3 or 4 times the time required for RainbowDQN.

On the (SIZE = 7, N_TRAPS = 5) configuration, SUNRISE managed to find the shortest path to the goal state, an illustration is in figure.

(left: SUNRISE, right: RainbowDQN)

Please consult report.pdf for the project detailed report.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Atari_Rainbow		Atari_Rainbow
Maze_Environment		Maze_Environment
OpenAIGym_SAC		OpenAIGym_SAC
.gitignore		.gitignore
README.md		README.md
maze.png		maze.png
maze_10_10_score.png		maze_10_10_score.png
path.png		path.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

About

Releases

Packages

Languages

abenechehab/sunrise

Folders and files

Latest commit

History

Repository files navigation

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages