Proximal Policy Optimization (Discrete)

Overview

🚧 🛠️👷‍♀️ 🛑 Under construction...

This repository contains an implementation of Proximal Policy Optimization (PPO) for discrete action spaces, which has been evaluated against a variety of Gymnasium and Atari environments.

The main script in its current form is configured for Atari environments, with a custom environment wrapper that follows the approach outlined in the original DQN paper (for this reason, it is recommended to use the 'NoFrameskip' versions of the environments).

Setup

Required Dependencies

Install the required dependencies using the following command:

pip install -r requirements.txt

Running the Algorithm

You can run the algorithm on any supported Gymnasium environment. For example:

python main.py --env 'MsPacmanNoFrameskip-v4'

Results

🤔 For your consideration:

The Atari environments were trained for 20000 games. I regret this decision as it lead to inconsistent numbers of learning steps between environments (due to some games requiring more/less steps per game).

I also did not use reward scaling, which I use for most other algorithms. This was a nearly arbitrary decision that came about due to initial debugging - at a certain point things suddenly began to work so I just kinda rolled with it...

I only started tracking the average critic value for a set of fixed states after many environments had already been trained, but I feel that this provides an additional interesting piece of context.

CartPole-v1	MountainCar-v0	Acrobot-v1

LunarLander-v2	AirRaid	Alien

Amidar	Assault	Asterix

Asteroids	Atlantis	BankHeist

BattleZone	BeamRider	Breakout

Krull	Berzerk	CrazyClimber

DemonAttack	Kangaroo	KungFuMaster

Zaxxon	Skiing	MontezumaRevenge

Bowling	Boxing	Carnival

Centipede	ChopperCommand	Defender

DoubleDunk	NameThisGame	Solaris

SpaceInvaders	Phoenix	StarGunner

Pitfall	Tennis	Pong

Pooyan	TimePilot	Tutankham

Enduro	UpNDown	PrivateEye

Qbert	Riverraid	RoadRunner

FishingDerby	Venture	Freeway

Seaquest	Robotank	Frostbite

VideoPinball	Gopher	Gravitar

WizardOfWor	Hero	YarsRevenge

ElevatorAction	IceHockey	Jamesbond

JourneyEscape

Acknowledgements

Special thanks to Phil Tabor, an excellent teacher! I highly recommend his Youtube channel.

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
environments		environments
metrics		metrics
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
animate.py		animate.py
config.py		config.py
main.py		main.py
memory.py		memory.py
plot_metrics.py		plot_metrics.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
tovi.yaml		tovi.yaml
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proximal Policy Optimization (Discrete)

Overview

Setup

Required Dependencies

Running the Algorithm

Results

🤔 For your consideration:

Acknowledgements

About

Releases

Packages

Languages

naivoder/DiscretePPO

Folders and files

Latest commit

History

Repository files navigation

Proximal Policy Optimization (Discrete)

Overview

Setup

Required Dependencies

Running the Algorithm

Results

🤔 For your consideration:

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages