Symm-PPO

Proximal Policy Optimization with Symmetric Entropy

Motivation

Reinforcement Learning is very sensitive to hyperparameters. I was stuck with a problem in my final year project where the agent wasn't making progress and was stuck in a sub-optimal policy when I was using PPO. This is when I increased the entropy coefficient value in the existing vanilla PPO framework. Choosing such coefficients is hard and daunting in general. Since PPO follows on the analogy of new policy not being too different from the old one, I tried to inculcate the same idea to the entropy but the entropy is symmetric.

Algorithm

I also decay the entropy coefficient here.

Prerequisites

Python 3

PyTorch (Tested on 1.12.1)

gym==0.17.3

pybullet==3.1.6

stable-baselines3==1.0

matplotlib

Installation

Please clone this repository to your local machine

git clone https://github.com/AnirudhMaiya/Symm-PPO

After cloning, check into the src folder of the repository

!pip install -r requirements.txt

import urllib.request
urllib.request.urlretrieve('http://www.atarimania.com/roms/Roms.rar','Roms.rar')
!pip install unrar
!unrar x Roms.rar
!mkdir rars
!mv "HC ROMS" rars
!mv "ROMS" rars
!python -m atari_py.import_roms rars

#--algo ppo is actually Symm-PPO here!!!
!python main.py --env-name "PongNoFrameskip-v4" --algo ppo --use-gae --lr 2.5e-4 --clip-param 0.1 --value-loss-coef 0.5 --num-processes 8 --num-steps 128 --num-mini-batch 1 --log-interval 1 --use-linear-lr-decay --entropy-coef 0.05

The above code can also be executed through a jupyter notebook Run_Symm-PPO.ipynb

Results

PongNoFrameskip-v4

Seaquest-v0

Afterthought

The symmetric entropy added serves as a regularizer. Hence the median rewards are less initially when compared to Vanilla PPO.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
results		results
src		src
viz		viz
LICENSE.md		LICENSE.md
README.md		README.md
Run_Symm-PPO.ipynb		Run_Symm-PPO.ipynb
visualization.ipynb		visualization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Symm-PPO

Motivation

Algorithm

Prerequisites

Installation

Results

PongNoFrameskip-v4

Seaquest-v0

Afterthought

Special thanks to pytorch-a2c-ppo-acktr-gail repository.

About

Releases

Packages

Languages

License

AnirudhMaiya/Symm-PPO

Folders and files

Latest commit

History

Repository files navigation

Symm-PPO

Motivation

Algorithm

Prerequisites

Installation

Results

PongNoFrameskip-v4

Seaquest-v0

Afterthought

Special thanks to pytorch-a2c-ppo-acktr-gail repository.

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages