Practical_RL/week09_policy_II at spring20 · aps2019project/Practical_RL · GitHub

This repository has been archived by the owner on Jul 21, 2020. It is now read-only.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
mujoco_wrappers.py		mujoco_wrappers.py
ppo.ipynb		ppo.ipynb
runners.py		runners.py
seminar_TRPO_pytorch.ipynb		seminar_TRPO_pytorch.ipynb
seminar_TRPO_tensorflow.ipynb		seminar_TRPO_tensorflow.ipynb
seminar_TRPO_theano.ipynb		seminar_TRPO_theano.ipynb

README.md

Materials

This section covers some steroids for policy gradient methods, along with a cool general trick called

Lecture on NPG and TRPO by J. Schulman - video
Alternative lecture on TRPO and open problems by... J. Schulman - video
Our videos: lecture, seminar(pytorch) seminar(theano) (russian)
Original articles - TRPO, NPG

Practice

Seminar:
Homework:

More: Reinforcement learning in large/continuous action spaces

While you already know algorithms that will work with continuously many actions, it can't hurt to learn something more specialized.

Lecture by J. Schulman - video
Q-learning with normalized advantage functions - article, code1, code2
Deterministic policy gradient - article, post+code
Stochastic value gradient - article
Embedding large discrete action spaces for RL - article
Lecture by A. Seleznev, 5vision (russian) - video