Skip to content

Latest commit

 

History

History
121 lines (103 loc) · 5.98 KB

README.md

File metadata and controls

121 lines (103 loc) · 5.98 KB

Reinforcement Learning

Project: Training AI to play Snake Game

Table of Contents Screenshot

Introduction

This is a training material for my undergraduate students who are relatively new to AI, but interested to learn reinforcement learning (RL). This material is prepared based on another online tutorial: https://towardsdatascience.com/how-to-teach-an-ai-to-play-games-deep-reinforcement-learning-28f9b920440a

The goal of this project is to develop an AI Bot to learn and play the popular game Snake from scratch. The implementation includes playing by human player, using a rule-based policy, Q-learning, SARSA, and finally Deep Q-Network (DQN) algorithms. For Q-learning, SARSA and DQN, no strategy is explicitly taught, the AI Bot has to try exploring all options to learn what to do to get a good reward.

The snake will get a reward after eating the food. It needs to make a series of right movements to approach and eventually land on the food. Since most movements do not get a reward, only the final movement that moves the snake onto the food will, the snake must learn how all previous movements contribute to the objective of reaching the food. Q-learning is specifically designed to deal with chaining a series of right actions that can eventually arrive to the objective and get a big reward. The idea was introduced in 1989 by Chris Watkins in his PhD thesis "Learning from Delayed Rewards": http://www.cs.rhul.ac.uk/~chrisw/thesis.html

Install

Clone the project by:

git clone https://github.com/cfoh/snake-game.git

This project can run on Python 3.7+ with the pygame library installed, as well as Keras with Tensorflow backend. The requirements.txt is produced based on Python 3.7.9 environment in Windows 7.

pip install pygame
pip install tensorflow

Note that in Windows 7, tensorflow requires Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019. If you cannot run the program and the message is to request you to install Microsoft Visual C++ Redistributable for Visual Studio, you can resolve the problem by installing it. You can download it from: https://support.microsoft.com/en-us/topic/the-latest-supported-visual-c-downloads-2647da03-1eea-4433-9aff-95f26a218cc0

Run

To run the game, executes in the folder:

python main.py

The program is set to human player by default. Type the following to see available options:

python main.py --help

To change to a Bot, modify main.py by uncommenting the corresponding algorithm to run:

    ## AI selector, pick one:
    algo = AI_Player()      # do nothing, let human player control
    #algo = AI_RuleBased()   # rule-based algorithm
    #algo = AI_RLQ()         # Q-learning - training mode
    #algo = AI_RLQ(False)    # Q-learning - testing mode, no exploration
    #algo = AI_SARSA()       # SARSA - training mode
    #algo = AI_SARSA(False)  # SARSA - testing mode, no exploration
    #algo = AI_DQN()         # DQN - training mode
    #algo = AI_DQN(False)    # DQN - testing mode, no exploration

If the program encounters problems importing Adam and to_categorical, read ai_dqn.py for a suggestion to solve the problems.

Trained Data (for Q-Learning)

The trained data (i.e. Q-table) will be stored in the following file. If one already exists, it will be overwritten.

q-table.json

When launching, the program will read the following Q-table. To use the previously trained data, simply rename it to the following filename.

q-table-learned.json

The learned data is stored in JSON format, and it is a readabile text file containing a list of state-action pairs. We can read the file to understand how the AI makes its decision. For example:

q-table-learned.json
    ...
    "[<  v],[-1,+0,+0]": [
        -1.9256617062962396,
        1.6245052987863142,
        4.672521001106912
    ],
    ...

The above shows a record of state-action pair, where at this state, the food is at the south-west direction (as in ["< v"]), and the snake can see an obstacle (-1) on the left but clear (+0) in front and on the right (as in [-1,+0,+0]). The values to move left, front, right are -1.92, 1.62, 4.67 respectively. Based on Q-learning algorithm, the movement with the highest value will be picked, that is, moving right. Indeed, the snake has made an appropriate move.

Trained Data (for DQN)

The trained data (i.e. weights) will be stored in the following file. If one already exists, it will be overwritten.

weights.hdf5

When launching, the program will read the following weight file. To use the previously trained data, simply rename it to the following filename.

weights-learned.hdf5

Can DQN play a perfect game?

This is an interesting question. The following youtube video seems to have the answer. The video explains the design of the system state and demonstrates the AI playing a perfect game with the design:

https://www.youtube.com/watch?v=vhiO4WsHA6c

What more inspiration? If you have a 90-minute break:

AlphaGo - The Movie | Full award-winning documentary
See how AI developed by DeepMind beats Go champion.
https://www.youtube.com/watch?v=WXuK6gekU1Y