Deep Q-network implementation for Pong-vo. The implementation follows from the paper - Playing Atari with Deep Reinforcement Learning and Human-level control through deep reinforcement learning.
- Input : 84 × 84 × 4 image (using the last 4 frames of a history)
- Conv Layer 1 : 32 8 × 8 filters with stride 4
- Conv Layer 2: 64 4 × 4 filters with stride 2
- Conv Layer 3: 64 3 × 3 filters with stride 1
- Fully Connected 1 : fully-connected and consists of 256 rectifier units
- Output : fully connected linear layer with a single output for each valid action.
- Input : 84 × 84 × 4 image (using the last 4 frames of a history)
- Conv Layer 1 : 16 8 × 8 filters with stride 4
- Conv Layer 2: 32 4 × 4 filters with stride 2
- Fully Connected 1 : fully-connected and consists of 256 rectifier units
- Output : fully connected linear layer with a single output for each valid action.
- Optimizer: RMSProp
- Batch Size: 32
- E-greedy : 0.1
Example:
conda create -n dqn_pong
pip install -r requirements.txt
sudo apt-get install ffmpeg
python train_atari.py
python train_atari.py --load-checkpoint-file results/checkpoint_dqn_nature.pth
A video is recorded every 50 episodes. See videos in /video/
folder.