Tic-Tac-Toe-Bot

This project aims to create a Tic Tac Toe Bot using Reinforcement Learning.

Installation and Running the Code

Set up the python virtual environment if this is your first time running the code.

python -m venv .venv

Activate the python virtual environment

.venv\Scripts\activate

Install the python libraries.

pip install -r requirements.txt

Run the code. You can either play with the bot or view the game simulation between the bot and the Minimax Bot.

python -m src.main

After you are done, deactivate the python virtal environment

deactivate

Training the Model

Follow the steps in Installation and Running the Code, but stop before running the code.

Then, run the training code.

python -m src.train

Upon completion, deactivate the python virtal environment

deactivate

Methodology

The Tic Tac Toe Bot is created using Q Learning, with its data stored using a dictionary and application of the Bellman's Equation:

$$Q(s, a) \leftarrow Q(s, a) + \alpha \left[ r + \gamma \max_{a'} Q(s', a') - Q(s, a) \right]$$

The Tic Tac Toe Bot is trained in three stages, playing against itself, playing against a bot implemented using Minimax Algorithm with Alpha Beta Pruning and playing against a random bot.

Discussion

The performance of this Tic Tac Toe Bot performed decent. When runned against default test cases for 100 times, the total failure rate of the bot is about 19.34%, which indicates that it correctly layout wins and blocks 80% of the time. However, one reason for this high failure rate may be due to the lack of training for the bot to factor into account faster win, which is what one of the test case is about.

Regarding the training of the Tic Tac Toe Bot, it trains relatively well against itself, having constant improvements in the draw rates over time. Against a random bot, it also performed relatively well, having a constant improvement in its win rates. However, when playing against Minimax Bot, the agent training does not seem to improve as much, perhaps due to the low training attempts (1000 against 100000).

The Self Training, Cross Training and Random Training Learning Curves are shown respectively below:

Acknowledgement

I would like to express my thanks to the following articles:

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
src		src
static		static
test		test
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tic-Tac-Toe-Bot

Installation and Running the Code

Training the Model

Methodology

Discussion

Acknowledgement

About

Uh oh!

Releases

Languages

Arcoti/Tic-Tac-Toe-Bot

Folders and files

Latest commit

History

Repository files navigation

Tic-Tac-Toe-Bot

Installation and Running the Code

Training the Model

Methodology

Discussion

Acknowledgement

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Languages