1 Way to Easily Beat the Maze with Reinforcement Learning

Business Case

To train a player starting from a random location in a Maze to find the treasure at a fixed location using Deep Reinforcement Q Learning

Objective

Train the player to choose actions by utilizing a Neural Network to predict Q-values for each state so as to find the treasure within the grid

Implementation

By informing the fixed locations of treasure, walls and pits as inputs, the neural network will be trained to estimate the Q-values of all possible states.

Learn the optimal policy to find the treasure on the grid with reasonable accuracy from a randomized location

Environment

There are multiple obstacles on the grid (20 X 20 environment) that the player cannot enter such as walls. However, the player can enter pits with mixed results because of the nature of the pits.

Pits can be of 2 kinds:

Shallow pit - player can immediately escape with some negative reward
Deep pit - player cannot escape and the hunt for treasure is over

The following are details about the environment and rewards for various elements:

Element	Initialization	Number	Reward
Player	Random	1	-
Wall	Fixed	100	-0.2
Shallow Pit	Fixed	4	-0.3
Deep Pit	Fixed	4	-100
Treasure	Fixed	1	+100

Element	Grid Color
Treasure	Green
Wall	Navy Blue
Shallow Pit	Yellow
Deep Pit	Red

Prerequisites

What are the tools you need to install?

You must have administrator access to install the following:

TensorFlow        TensorFlow 1.12
Python            3.7.3 or newer
CUDA              CUDA 10.0
cuDNN             cuDNN 7.6.0
Python Libraries  OpenCV-Python, numpy, matplotlib
Text Editor       VS Code or any other

Process

Actions
- 4 actions (Up, Down, Left, Right) are represented with a 4 X 1 vector
- The position of 1 in the vector indicates the direction in which the player jumps (up, down, left and right)
- Actions are one-hot encoded.
Grid to frames (images)
- Grid is converted to 200X200 size frame which includes player position along with all other elements
Pre-processing
- Frame is resized to 84X84 and converted to grayscale
Normalization
- Image is normalized and sent to the neural network as input
Stacking
- Stack of 4 consecutive frames are sent to the neural network as input at each step to exploit temporal information
- Player performs action based on target q-value
Hyperparameters
- Hyperparameters are set and the DQ network is initialized
Memory
- Memory is created to store experiences from random actions for the player to not favor certain actions during training
- Experiences are added to memory during training
- Batches of 64 are sampled from memory to train the neural network
Training the model
- Model is trained for a specified number of episodes
- Episode reaches terminal state when the player either finds treasure or falls into a deep pit
- Episode is complete when the player either reaches terminal state or performs 100 actions
Testing the model
- Model is tested multiple times while recording reward and loss statistics to plot graphs
- Frames (images) from start to end are displayed when player reaches terminal state in an episode

Deployment

By following the prerequisites and process, you'll be able to deploy our project, train the neural network and find the treasure consistently.

Authors

Prakash Dontaraju LinkedIn Twitter Medium

Nikhil G LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
maze-deep-q-learning		maze-deep-q-learning
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1 Way to Easily Beat the Maze with Reinforcement Learning

Business Case

Objective

Implementation

Environment

Prerequisites

Process

Deployment

Authors

About

Releases

Packages

Languages

License

prakashdontaraju/maze-deep-reinforcement-learning

Folders and files

Latest commit

History

Repository files navigation

1 Way to Easily Beat the Maze with Reinforcement Learning

Business Case

Objective

Implementation

Environment

Prerequisites

Process

Deployment

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages