Skip to content

Arcoti/Tic-Tac-Toe-Bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tic-Tac-Toe-Bot

This project aims to create a Tic Tac Toe Bot using Reinforcement Learning.

Installation and Running the Code

Set up the python virtual environment if this is your first time running the code.

python -m venv .venv

Activate the python virtual environment

.venv\Scripts\activate

Install the python libraries.

pip install -r requirements.txt

Run the code. You can either play with the bot or view the game simulation between the bot and the Minimax Bot.

python -m src.main

After you are done, deactivate the python virtal environment

deactivate

Training the Model

Follow the steps in Installation and Running the Code, but stop before running the code.

Then, run the training code.

python -m src.train

Upon completion, deactivate the python virtal environment

deactivate

Methodology

The Tic Tac Toe Bot is created using Q Learning, with its data stored using a dictionary and application of the Bellman's Equation:

$$Q(s, a) \leftarrow Q(s, a) + \alpha \left[ r + \gamma \max_{a'} Q(s', a') - Q(s, a) \right]$$

The Tic Tac Toe Bot is trained in three stages, playing against itself, playing against a bot implemented using Minimax Algorithm with Alpha Beta Pruning and playing against a random bot.

Discussion

The performance of this Tic Tac Toe Bot performed decent. When runned against default test cases for 100 times, the total failure rate of the bot is about 19.34%, which indicates that it correctly layout wins and blocks 80% of the time. However, one reason for this high failure rate may be due to the lack of training for the bot to factor into account faster win, which is what one of the test case is about.

Regarding the training of the Tic Tac Toe Bot, it trains relatively well against itself, having constant improvements in the draw rates over time. Against a random bot, it also performed relatively well, having a constant improvement in its win rates. However, when playing against Minimax Bot, the agent training does not seem to improve as much, perhaps due to the low training attempts (1000 against 100000).

The Self Training, Cross Training and Random Training Learning Curves are shown respectively below:

Self Training Learning Curve Cross Training Learning Curve Random Training Learning Curve

Acknowledgement

I would like to express my thanks to the following articles:

Releases

No releases published

Languages