This project aims to create a Tic Tac Toe Bot using Reinforcement Learning.
Set up the python virtual environment if this is your first time running the code.
python -m venv .venv
Activate the python virtual environment
.venv\Scripts\activate
Install the python libraries.
pip install -r requirements.txt
Run the code. You can either play with the bot or view the game simulation between the bot and the Minimax Bot.
python -m src.main
After you are done, deactivate the python virtal environment
deactivate
Follow the steps in Installation and Running the Code, but stop before running the code.
Then, run the training code.
python -m src.train
Upon completion, deactivate the python virtal environment
deactivate
The Tic Tac Toe Bot is created using Q Learning, with its data stored using a dictionary and application of the Bellman's Equation:
The Tic Tac Toe Bot is trained in three stages, playing against itself, playing against a bot implemented using Minimax Algorithm with Alpha Beta Pruning and playing against a random bot.
The performance of this Tic Tac Toe Bot performed decent. When runned against default test cases for 100 times, the total failure rate of the bot is about 19.34%, which indicates that it correctly layout wins and blocks 80% of the time. However, one reason for this high failure rate may be due to the lack of training for the bot to factor into account faster win, which is what one of the test case is about.
Regarding the training of the Tic Tac Toe Bot, it trains relatively well against itself, having constant improvements in the draw rates over time. Against a random bot, it also performed relatively well, having a constant improvement in its win rates. However, when playing against Minimax Bot, the agent training does not seem to improve as much, perhaps due to the low training attempts (1000 against 100000).
The Self Training, Cross Training and Random Training Learning Curves are shown respectively below:
I would like to express my thanks to the following articles:


