This repository contains a Deep Reinforcement Learning project implemented in PyTorch. It starts with a basic Deep Q-Network (DQN) applied to a MiniGrid environment and then introduces several improvements such as prioritized experience replay.
Before running the code, install the required dependencies.
Install all required Python packages using:
pip install -r requirements.txt
For video evaluation, you need the ffmpeg
executable. On macOS, open a terminal and run:
brew install ffmpeg
This command installs ffmpeg
, which is used by ImageIO to record evaluation videos.
If you're on Windows, please download a static build of ffmpeg and add its bin
directory to your system PATH. On Linux, install ffmpeg via your package manager.
- The basic DQN network is implemented in
models/dqn_base.py
. - The DQN agent (
DQNAgentBase
) is implemented inagents/dqn_agent_base.py
using an ε-greedy exploration strategy and stores transitions in a replay buffer. - The training script in
experiments/dqn/train.py
sets up the MiniGrid environment (using Gymnasium and MiniGrid wrappers), selects a preprocessing pipeline, and trains the DQN.
-
Experience Replay with Prioritization
-
Motivation: Prioritized experience replay improves sample efficiency by replaying more informative transitions more often. For details, see the paper: Prioritized Experience Replay.
-
Implementation: A new class
PrioritizedReplayBuffer
was added inutils/prioritized_replay_buffer.py
that inherits from the uniformReplayBuffer
. -
Changes in Agent: In
agents/dqn_agent_base.py
, theupdate()
method is modified to:- Sample indices and importance-sampling weights when using a prioritized replay buffer.
- Compute a weighted loss using these weights.
- Update the priorities based on the temporal-difference (TD) error.
-
Configuration: In
configs/config.yaml
, you can set:replay_buffer: "prioritized" # or "uniform" priority_alpha: 0.6
-
-
Soft Update for Target Network
-
Motivation: Hard updates replace the entire target network at set intervals, but this can cause instability. Soft updates, using Polyak averaging, provide smoother updates, improving stability and convergence.
-
Implementation: We introduced
DQNAgentSoftUpdate
inagents/dqn_agent_soft.py
, which inherits fromDQNAgentBase
and overridesupdate_target_network()
to implement soft updates. -
Changes in Agent:
- The
update_target_network()
method was modified to support soft updates with Polyak averaging. - The
train.py
script was updated to select eitherDQNAgentBase
(hard updates) orDQNAgentSoftUpdate
(soft updates) based on configuration.
- The
-
Configuration: In
configs/config.yaml
, you can set:agent_variant: "soft_update" soft_update_tau: 0.005
-
-
Huber Loss for Stability
-
Motivation: Mean Squared Error (MSE) can be overly sensitive to large errors, which can destabilize training. The Huber loss is more robust as it behaves like MSE for small errors but transitions to an absolute loss for large errors, reducing the impact of outliers.
-
Implementation: We introduced
DQNAgentHuberLoss
inagents/dqn_agent_huber.py
, which inherits fromDQNAgentBase
and overrides theupdate()
function to use Huber loss instead of MSE loss. -
Changes in Agent:
- The
update()
method now applies the Huber loss instead of the standard MSE loss when using this agent. - The
train.py
script was updated to allow selectingDQNAgentHuberLoss
via configuration.
- The
-
Configuration: In
configs/config.yaml
, you can set:agent_variant: "huber_loss" huber_loss_weight: 0.01
-
-
Double DQN for More Stable Learning
-
Motivation: The standard DQN tends to overestimate Q-values, which can slow down or destabilize learning. Double DQN (DDQN) mitigates this issue by using two separate networks:
- One for selecting the action (policy network).
- One for evaluating the action (target network). This prevents over-optimistic value estimates and improves training stability.
-
Implementation: We introduced
DQNAgentDouble
inagents/dqn_agent_double.py
, which modifies theupdate()
method to:- Use the policy network to select the best action.
- Use the target network to evaluate the Q-value of that action.
-
Changes in Agent:
- The
update()
method now follows the Double DQN logic to compute target Q-values. - The
train.py
script was updated to allow selectingDQNAgentDouble
via configuration.
- The
-
Configuration: In
configs/config.yaml
, you can set:agent_variant: "double_dqn"
-
-
Training: To train the DQN agent, run:
python experiments/dqn/train.py
-
Evaluation: To evaluate the trained agent and record a video of its performance, run:
python experiments/dqn/evaluate.py
It is possible to run the agent implemented using the Actor-Critic method. However, due to time and computational constraints, we were unable to determine the optimal hyperparameters required to successfully solve the game.