Towards Principled, Practical Policy Gradient for Bandits and Tabular MDPs

Installation

conda create -n ppg python=3.8
conda install scipy tqdm matplotlib pandas
pip install jax flax absl-py

To reproduce MDP experiments, run:

python mdp_experiments.py

To reproduce bandit experiments in the exact setting, run:

python pg_experiments.py

To reproduce bandit experiments in the stochastic setting, run:

python spg_experiments.py

For each bandit experiment, the corresponding plot can be generated in plot_experiments.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
plots		plots
README.md		README.md
bandit_environments.py		bandit_environments.py
experiment.py		experiment.py
mdp_environments.py		mdp_environments.py
mdp_experiments.py		mdp_experiments.py
mdp_updates.py		mdp_updates.py
pg_experiments.py		pg_experiments.py
plot_experiment.ipynb		plot_experiment.ipynb
spg_experiments.py		spg_experiments.py
updates.py		updates.py
utils.py		utils.py