The following project concerns the development of an intelligent agent for the famous game produced by Nintendo Super Mario Bros. More in detail: the goal of this project was to design, implement and train an agent with the Q-learning reinforcement learning algorithm. Subsequently, the results of learning with the Q-learning algorithm were compared with the SARSA algorithm. In our case study, other types of learning involving the double Q-learning algorithm, Deep Q-Network (DQN) and Double Deep Q-Network (DDQN). The reason why these different learnings are provided is for performance issues. For more information, read the report written by us.
The parameters and plots of the relevant QL models are located under ./code/Reinforcement_Learning/models
, while the parameters and plots of the Sarsa models are located under ./code/Reinforcement_Learning/sarsa/models
.
Module | Version |
---|---|
gym | 0.25.2 |
gym-super-mario-bros | 7.4.0 |
nes-py | 8.2.1 |
pyglet | 1.5.21 |
torch | 2.1.1 |
pygame | 2.5.2 |
We used the gym-super-mario-bros environment. The code can be found in ./code/Reinforcement_Learning/utils/enviroment.py
, where we do the setup of the environment. In ./code/Reinforcement_Learning/utils/setup_env.py
We assign custom values to the rewards so as to take as many power-ups as possible. Then the agents QL logic can be found in ./code/Reinforcement_Learning/utils/agents
, while models and Sarsa agents can be found in ./code/Reinforcement_Learning/sarsa
The custom rewards are:
- time: -0.1, per second that passes
- death: -100., mario dies
- extra_life: 100., mario gets an extra life
- mushroom: 20., mario eats a mushroom to become big
- flower: 25., mario eats a flower
- mushroom_hit: -10., mario gets hit while big
- flower_hit: -15., mario gets hit while fire mario
- coin: 15., mario gets a coin
- score: 15., mario hit enemies
- victory: 1000 mario win
We used the QL, Double QL, Deep QN, Double Deep QN agents together with their respective sarsa agents with epsilon-greedy policy. Each model was trained for 1000 steps and took about 3.5 hours to finish except for DDQN and DDN Sarsa that they was trained for 10.000 steps and took about 13.4 hours.
Here are the results of all the models, specifically we make a comparison between the QL and Sarsa algorithms.
Training steps | 10K | 10K | 10K |
Episode score | 1723 | 4100 | 4320 |
Agents | DDN Sarsa | DDN Sarsa | DDQN |
Completed level? | False | True | True |
So, to get more results, we could implement the PPO agorithm in both QL and Sarsa algorithms and make further comparisons in order to figure out which algorithm is best for the super mario bros game.
Name | Description |
---|---|
Alberto Montefusco |
Developer - Alberto-00 Email - a.montefusco28@studenti.unisa.it LinkedIn - Alberto Montefusco My WebSite - alberto-00.github.io |
Alessandro Aquino |
Developer - AlessandroUnisa Email - a.aquino33@studenti.unisa.it LinkedIn - Alessandro Aquino |
Mattia d'Argenio |
Developer - mattiadarg Email - m.dargenio5@studenti.unisa.it LinkedIn - Mattia d'Argenio |