Here's a link to a shared Google Colab of the notebook to view it quicker.
In this project, I created the qtrain() function and was provided with the Python files for the maze representation and state storage.
The qtrain() function implements a deep-Q learning algorithm to have the agent learn to properly traverse the maze. The agent chooses actions in each state through exploration 10% of the time and exploitation for the other 90%. When choosing actions through exploitation, quality values of each action recorded through previous states determine the outcome. These Q values are based on a reward system which penalizes the agent for hitting walls or wandering to the same cell too many times. The algorithm also counts win rate and stops the learning once a 100% win rate is achieved.