- Introduces DQNs for control from raw-pixel inputs, which uses Bellman updates to learn an action-value function and off-policy action selection using the learned action-value function. Demonstrates how experience replay helps the algorithm get an even distribution of experience (eg. avoid "feeback loops").
- Learns to play a large number of Atari games. However, the algorithm is extremely sample inefficient.
- Papers:
- Blog posts:
- http://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/
- https://rubenfiszel.github.io/posts/rl4j/2016-08-24-Reinforcement-Learning-and-DQN.html
- https://medium.com/@awjuliani
- https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-reinforcement-learning/
- https://vmayoral.github.io/robots,/ai,/deep/learning,/rl,/reinforcement/learning/2016/08/07/deep-convolutional-q-learning/
- Introduces prioritized experience replay, an improved version of the experience replay strategy used in the DQN paper. In prioritized experince replay, examples in the experience replay buffer are weighted by TD-error, which measures how surprising the transition was.
- Shows improved performance on ALE.
- Papers:
- Papers:
- Papers:
- Other:
- Continuous version of DQN that efficiently computes argmax(Q).
- Papers
- Papers: