Skip to content

Deep RL agents for the MiniGrid 6 Rooms. Implements DQN, Double DQN, Huber Loss, Soft Updates, PER, and Actor-Critic. Best model solves 90% of episodes with improved training stability.

Notifications You must be signed in to change notification settings

Ofekirsh/Deep-Reinforcement-Learning-Minigrid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Reinforcement Learning Minigrid

This repository contains a Deep Reinforcement Learning project implemented in PyTorch. It starts with a basic Deep Q-Network (DQN) applied to a MiniGrid environment and then introduces several improvements such as prioritized experience replay.

Evaluation with memory

Setup

Before running the code, install the required dependencies.

Python Dependencies

Install all required Python packages using:

pip install -r requirements.txt

Install ffmpeg

For video evaluation, you need the ffmpeg executable. On macOS, open a terminal and run:

brew install ffmpeg

This command installs ffmpeg, which is used by ImageIO to record evaluation videos. If you're on Windows, please download a static build of ffmpeg and add its bin directory to your system PATH. On Linux, install ffmpeg via your package manager.

Implementation Details

Basic DQN

  • The basic DQN network is implemented in models/dqn_base.py.
  • The DQN agent (DQNAgentBase) is implemented in agents/dqn_agent_base.py using an ε-greedy exploration strategy and stores transitions in a replay buffer.
  • The training script in experiments/dqn/train.py sets up the MiniGrid environment (using Gymnasium and MiniGrid wrappers), selects a preprocessing pipeline, and trains the DQN.

Improvements

  1. Experience Replay with Prioritization

    • Motivation: Prioritized experience replay improves sample efficiency by replaying more informative transitions more often. For details, see the paper: Prioritized Experience Replay.

    • Implementation: A new class PrioritizedReplayBuffer was added in utils/prioritized_replay_buffer.py that inherits from the uniform ReplayBuffer.

    • Changes in Agent: In agents/dqn_agent_base.py, the update() method is modified to:

      • Sample indices and importance-sampling weights when using a prioritized replay buffer.
      • Compute a weighted loss using these weights.
      • Update the priorities based on the temporal-difference (TD) error.
    • Configuration: In configs/config.yaml, you can set:

      replay_buffer: "prioritized"  # or "uniform"
      priority_alpha: 0.6
  2. Soft Update for Target Network

    • Motivation: Hard updates replace the entire target network at set intervals, but this can cause instability. Soft updates, using Polyak averaging, provide smoother updates, improving stability and convergence.

    • Implementation: We introduced DQNAgentSoftUpdate in agents/dqn_agent_soft.py, which inherits from DQNAgentBase and overrides update_target_network() to implement soft updates.

    • Changes in Agent:

      • The update_target_network() method was modified to support soft updates with Polyak averaging.
      • The train.py script was updated to select either DQNAgentBase (hard updates) or DQNAgentSoftUpdate (soft updates) based on configuration.
    • Configuration: In configs/config.yaml, you can set:

      agent_variant: "soft_update"
      soft_update_tau: 0.005
  3. Huber Loss for Stability

    • Motivation: Mean Squared Error (MSE) can be overly sensitive to large errors, which can destabilize training. The Huber loss is more robust as it behaves like MSE for small errors but transitions to an absolute loss for large errors, reducing the impact of outliers.

    • Implementation: We introduced DQNAgentHuberLoss in agents/dqn_agent_huber.py, which inherits from DQNAgentBase and overrides the update() function to use Huber loss instead of MSE loss.

    • Changes in Agent:

      • The update() method now applies the Huber loss instead of the standard MSE loss when using this agent.
      • The train.py script was updated to allow selecting DQNAgentHuberLoss via configuration.
    • Configuration: In configs/config.yaml, you can set:

      agent_variant: "huber_loss"
      huber_loss_weight: 0.01
    
    
  4. Double DQN for More Stable Learning

    • Motivation: The standard DQN tends to overestimate Q-values, which can slow down or destabilize learning. Double DQN (DDQN) mitigates this issue by using two separate networks:

      • One for selecting the action (policy network).
      • One for evaluating the action (target network). This prevents over-optimistic value estimates and improves training stability.
    • Implementation: We introduced DQNAgentDouble in agents/dqn_agent_double.py, which modifies the update() method to:

      • Use the policy network to select the best action.
      • Use the target network to evaluate the Q-value of that action.
    • Changes in Agent:

      • The update() method now follows the Double DQN logic to compute target Q-values.
      • The train.py script was updated to allow selecting DQNAgentDouble via configuration.
    • Configuration: In configs/config.yaml, you can set:

      agent_variant: "double_dqn"

Running the Project

  • Training: To train the DQN agent, run:

    python experiments/dqn/train.py
  • Evaluation: To evaluate the trained agent and record a video of its performance, run:

    python experiments/dqn/evaluate.py

Note

It is possible to run the agent implemented using the Actor-Critic method. However, due to time and computational constraints, we were unable to determine the optimal hyperparameters required to successfully solve the game.

About

Deep RL agents for the MiniGrid 6 Rooms. Implements DQN, Double DQN, Huber Loss, Soft Updates, PER, and Actor-Critic. Best model solves 90% of episodes with improved training stability.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages