Overview: Policy-Based Methods

In policy-based methods, we directly learn to approximate the optimal policy $\pi ^ *$ without having to learn a value function. We parameterize the policy by using an neural network which will output a probability distribution over actions (Stochastic Policy). Policy-gradient methods is a subclass of policy-based methods in which we search directly for the optimal policy. We optimize the parameter directly by performing the gradient ascent on the performance of the objective function.

Reinforce Algorithm

The Reinforce algorithm, also called Monte-Carlo policy-gradient, is a policy-gradient algorithm that uses an estimated return from an entire episode to update the policy parameter $\theta$. The following is an implementation of the Reinforce algorithm in the CartPole-v1 environment.

Results

Mean_Reward: 500 +/- 0.00

replay.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
gitattributes		gitattributes
hyperparameters.json		hyperparameters.json
model.pt		model.pt
replay.mp4		replay.mp4
results.json		results.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview: Policy-Based Methods

Reinforce Algorithm

Results

About

Releases

Packages

rishisim/Reinforce-CartPole-v1

Folders and files

Latest commit

History

Repository files navigation

Overview: Policy-Based Methods

Reinforce Algorithm

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages