This project is a custom implementation of the classic Hill Climb Racing game using Python, Pygame, and the Box2D physics engine. The primary goal is to train and compare different reinforcement learning agents to master the challenge of navigating an infinitely generated, rugged terrain.
- Custom Environment: A fully custom Hill Climb environment built from scratch using
gymnasiumand theBox2Dphysics engine. - Multiple RL Algorithms: Implementation and comparison of three distinct reinforcement learning algorithms:
- PPO (Proximal Policy Optimization)
- DQN (Deep Q-Network)
- Expected SARSA
- Function Approximation: Support for different models, including Neural Networks (
nn) and Polynomial (poly) function approximators. - Train & Visualize: A command-line interface to easily train new agents and visualize the performance of saved models.
This project explores both value-based and policy-based reinforcement learning methods:
- PPO (Proximal Policy Optimization): An advanced actor-critic method known for its stability and sample efficiency. It uses a clipped objective function to constrain policy updates.
- DQN (Deep Q-Network): A classic value-based algorithm that utilizes an experience replay buffer and a target network to stabilize learning a Q-value function.
- Expected SARSA: An on-policy temporal-difference algorithm that improves upon SARSA by calculating the expected Q-value over all possible next actions, reducing variance.
The project is controlled via the main.py script. You can either train a new agent or visualize a pre-trained one.
| Argument | Shorthand | Description | Default Value |
|---|---|---|---|
action |
Action to perform (train or visualize). |
Required | |
algorithm |
Algorithm to use for train (ppo, dqn, sarsa). |
Required for train | |
--path |
Path to the model .zip file for visualize. |
Required for visualize | |
--model |
Model type for the agent (nn, linear, poly). |
nn |
|
--degree |
Degree for the polynomial model. | 3 |
|
--buffer_size |
Buffer size for the PPO algorithm. | 2048 |
|
--episodes |
-e |
Total episodes for training. | 100 |
--seed |
-s |
Random seed for the run. | 0 |
--verbose |
-v |
Verbosity level (0 or 1). | 0 |
Use the train action followed by the algorithm name.
Train a PPO agent (recommended):
python src/main.py train ppo --buffer_size 4096 --seed 42Train a DQN agent:
python src/main.py train dqn --model nn --seed 123
Train an Expected SARSA agent with a polynomial model
python src/main.py train sarsa --model poly --degree 2 --seed 10
Use the visualize action to see your trained agents in action. The script will automatically find the corresponding saved model in the /models directory.
Visualize the PPO agent:
python src/main.py visualize --path models/ppo_nn_2048_42.zipVisualize the DQN agent:
python src/main.py visualize --path models/dqn_nn_123.zip
Visualize the Expected SARSA agent with a polynomial model
python src/main.py visualize --path models/sarsa_poly_2_10.zip
Use the plotting_results script to save the plots of rewards and mean loss per episode, or also the entropy specifically for ppo
Plot rewards and losses of all the algorithms compared:
python .\src\plot_results.pySelect which algorithm to plot with different function approximation methods:
python .\src\plot_results.py --alg dqn-
Models: Trained agent models are saved as
.zipfiles in the/modelsdirectory. -
Logs: Training data, such as rewards and episode lengths, are saved as
.csvfiles in the/logsdirectory, allowing for performance analysis and plotting.
The repo comes with a handy script to replicate all the experiments that were done on orfeo.
The script submits all experiments as a series of SLURM job arrays with different models and seeds, collecting all relevant data for plotting in the logs/ directory, and all the trained models with the different seeds in the models/ directory. All the hyperparameters and training modes are specified in the rl_params.csv file, and parsed in the run.sh file. This setup is easily extendable to other hyperparameters for further tuning. After adding the, as CLI arguments to src/main.py, one just needs to add the specific column to rl_grid.csv and a parse line + variable in run.sh.
To run the script, just execute the following command:
chmod +x ./submit.sh #in case the file is not executable yet
./submit.sh
In case your AssocMaxSubmitJobLimit in Orfeo or other clusters is too low, the script automatically loops until it can submit all the jobs in the queue, retrying every 3 minutes.
In order to have this process running independently from the state of your machine, it is recommended to use tmux.
tmux
chmod +x ./submit.sh #in case the file is not executable yet
./submit.sh
Then hit Ctrl-B D to detach the session.
