GITMIND
< Link!!
(1) Vanila PG(Sutton)
[Policy gradient methods for reinforcement learning with function approximation]
Richard S. Sutton, David McAllester, Satinder Singh, Yishay Mansour,1994
(2) DPG
[Deterministic policy gradient algorithms]
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014).
(3) DDPG
[Continuous control with deep reinforcement learning]
Timothy P. Lillicrap∗ , Jonathan J. Hunt∗ , Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver & Daan Wierstra (2016)
(4) NPG
[A natural policy gradient]
Sham Kakade(2002)
(5) TRPO
[Trust region policy optimization]
John Schulman, Sergey Levine, Philipp Moritz, Michael Jordan, Pieter Abbeel (2015)
(6) GAE
[High-Dimensional Continuous Control Using Generalized Advantage Estimation]
John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan and Pieter Abbeel(2016)
(7) PPO
[Proximal policy optimization algorithms]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov(2017)
(8) TD3
[Addressing Function Approximation Error in Actor-Critic Methods]
Scott Fujimoto , Herke van Hoof , David Meger (2018)
(9) SAC
[Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor]
(1) PER
[Prioritized Experience Replay]
Tom Schaul, John Quan, Ioannis Antonoglou and David Silver, Google DeepMind(2015)
(2) HER
[Hindsight Experience Replay, Marcin Andrychowicz]
Marcin Andrychowicz∗ , Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel , Wojciech Zaremba ,OpenAI(2018)
REVIEW
| PAPER