Skip to content

sudo-michael/practical-pg

Repository files navigation

Towards Principled, Practical Policy Gradient for Bandits and Tabular MDPs

Code for: Towards Principled, Practical Policy Gradient for Bandits and Tabular MDPs.

Installation

conda create -n ppg python=3.8
conda install scipy tqdm matplotlib pandas
pip install jax flax absl-py

Running Experiments

To reproduce MDP experiments, run:

python mdp_experiments.py

To reproduce bandit experiments in the exact setting, run:

python pg_experiments.py

To reproduce bandit experiments in the stochastic setting, run:

python spg_experiments.py

For each bandit experiment, the corresponding plot can be generated in plot_experiments.ipynb.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published