This repository implements a Deep Q-Network (DQN), a reinforcement learning algorithm, to train an agent to play Atari's Breakout game. The implementation includes advanced features like experience replay, target networks, and game monitoring with video exports.
- Preprocessing: Converts raw game frames to grayscale, resizes to 84x84, and applies cropping for efficient input.
- Frame Buffer: Maintains the last four frames to help the agent observe motion.
- Neural Network: Utilizes a Convolutional Neural Network (CNN) to approximate Q-values for actions.
- Experience Replay: Stores past experiences for training stability and efficiency.
- Target Networks: Stabilizes Q-learning by periodically updating reference weights.
- Game Monitoring: Records gameplay videos to evaluate the agent's performance.
- Python 3.6+
- TensorFlow/Keras
- OpenAI Gym
- Additional libraries:
opencv-python
,unrar
,numpy
,matplotlib
- Clone the repository:
git clone https://github.com/LeoRigasaki/Deep-Q-Network-implementation.git cd Deep-Q-Network-implementation
-
Launch the training notebook:
jupyter notebook DQN_Atari_Breakout.ipynb
-
Train the agent:
- Configure hyperparameters (e.g., learning rate, batch size, epsilon decay).
- Run the training loop.
-
Evaluate and monitor the agent:
- Save the trained weights.
- Record gameplay videos for evaluation.
DQN_Atari_Breakout.ipynb
: Main notebook implementing the DQN training and evaluation.videos/
: Directory for storing gameplay videos.requirements.txt
: List of Python dependencies.
Watch the trained agent play Breakout: https://github.com/user-attachments/assets/65b28a4b-cd50-4100-88a6-fc535488632b
- DQN: Combines Q-learning with deep neural networks to handle high-dimensional inputs like game frames.
- Experience Replay: Improves learning efficiency by breaking correlations between consecutive observations.
- Target Networks: Addresses instability in Q-value updates by using a separate network for reference.
- The agent learns to maximize rewards by exploring the game environment and updating Q-values.
- Hyperparameters like epsilon control the balance between exploration and exploitation.
- Performance is monitored via reward plots and TD loss.
- Videos showcase the agent's progression over time.
- The agent progressively learns to play Breakout, achieving higher rewards as training proceeds.
- Training may take several hours depending on hardware and hyperparameter configurations.
- OpenAI Gym for providing the Atari game environment.
- DQN algorithm inspired by the original DeepMind paper and OpenAI Baselines.
This project is licensed under the MIT License. See the LICENSE
file for details.