JYM (Jax Gym)

JAX implementations of standard RL algorithms and vectorized environments

🚀 Stack

Python

JAX

Haiku

Optax

Chex

🌟 Key Features

🐍 Clean and beginner-friendly implementation of Reinforcement Learning algorithms in JAX
⚡ Vectorized environments for lightning-fast training
👩‍👨‍👦‍👧 Parallel agent training for statistically significant results
📊 Plotly graphs enabling state value visualization throughout training and averaged performance reports
✅ Easy installation using Poetry virtual environments
✍️ Code walkthroughs:
- Vectorize and Parallelize RL Environments with JAX: Q-learning at the Speed of Light⚡, published in Towards Data Science
- A Gentle Introduction to Deep Reinforcement Learning in JAX 🕹️, published in Towards Data Science, selected as part of the "Getting Started" column

✅ Progress

🤖 Algorithms

Type	Name	Source
Bandits	Simple Bandits (ε-Greedy policy)	Sutton & Barto, 1998
Tabular	Q-learning	Watkins & Dayan, 1992
Tabular	Expected SARSA	Van Seijen et al., 2009
Tabular	Double Q-learning	Van Hasselt, 2010
Deep RL	Deep Q-Network (DQN)	Mnih et al., 2015

🌍 Environments

Type	Name	Source
Bandits	Casino (K-armed Bandits)	Sutton & Barto, 1998
Tabular	GridWorld	-
Tabular	Cliff Walking	-
Continuous Control	CartPole	Barto, Sutton, & Anderson, 1983
MinAtar	Breakout	Young et al., 2019

⌛ Coming Soon

🤖 Algorithms

Type	Name
Bandits	UCB (Upper Confidence Bound)
Tabular (model based)	Dyna-Q, Dyna-Q+

🌍 Environments

Type	Name
MinAtar	Asterix, Freeway, Seaquest, SpaceInvaders

🧪 Experiments

K-armed Bandits Testbed

Reproduction of the 10-armed Testbed experiment presented in Reinforcement Learning: An Introduction (chapter 2.3, page 28-29).

This experiment showcases the difference in performance between different values of epsilon and therefore the long-term tradeoff between exploration and exploitation.

10-armed Testebed environment

K-armed Bandits JAX environment

Results obtained in Reinforcement Learning: An Introduction

Replicated results using the K-armed Bandits JAX environment

Cliff Walking

Reproduction of the CliffWalking environment presented in Reinforcement Learning: An Introduction (chapter 6, page 132).

This experiment highlights the difference in behavior between TD algorithms, Q-learning being greedy (as the td target is the maximum Q-value over the next state) and Expected Sarsa being safer (td target: expected Q-value over the next state).

Described behaviour for the CliffWalking environment

Comparison of Expected Sarsa (top) and Q-learning (bottom) on CliffWalking

💾 Installation

To install and set up the project, follow these steps:

Clone the repository to your local machine:

git clone https://github.com/RPegoud/jax_rl.git

Navigate to the project directory:
```
cd jax_rl
```
Install Poetry (if not already installed):
```
python -m pip install poetry
```
Install project dependencies using Poetry:
```
poetry install
```
Activate the virtual environment created by Poetry:
```
poetry shell
```

📝 References

Reinforcement Learning: An Introduction Sutton, R. S., & Barto, A. G., The MIT Press., 2018
Writing an RL Environment in JAX, Nikolaj Goodger, Medium, Nov 14, 2021
JAX Tutorial Playlist, Aleksa Gordić - The AI Epiphany, YouTube, 2022


Official JAX Documenation

Name	Name	Last commit message	Last commit date
Latest commit RPegoud working per rollout, identified bug with breakout episode termination Dec 5, 2023 9e8f6f6 · Dec 5, 2023 History 99 Commits
.github/workflows	.github/workflows	workflow fix	Nov 29, 2023
.vscode	.vscode	PER sample function (to test)	Dec 3, 2023
__pycache__	__pycache__	added .env, chart upload utility and updated readme	Oct 23, 2023
images	images	repo refactoring	Nov 20, 2023
jym	jym	working per rollout, identified bug with breakout episode termination	Dec 5, 2023
notebooks	notebooks	working per rollout, identified bug with breakout episode termination	Dec 5, 2023
tests	tests	started per rollout, missing: param updates with importance weights	Dec 4, 2023
.gitignore	.gitignore	added cliffwalking and averaged results	Nov 6, 2023
LICENSE	LICENSE	Create LICENSE	Oct 18, 2023
README.md	README.md	PER sample function (to test)	Dec 3, 2023
poetry.lock	poetry.lock	Added base and uniform replay buffers	Nov 7, 2023
pyproject.toml	pyproject.toml	renamed poetry package from jax-rl to jym	Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JYM (Jax Gym)

JAX implementations of standard RL algorithms and vectorized environments

🚀 Stack

🌟 Key Features

✅ Progress

🤖 Algorithms

🌍 Environments

⌛ Coming Soon

🤖 Algorithms

🌍 Environments

🧪 Experiments

K-armed Bandits Testbed

Cliff Walking

💾 Installation

📝 References

About

Releases

Packages

Languages

License

RPegoud/jym

Folders and files

Latest commit

History

Repository files navigation

JYM (Jax Gym)

JAX implementations of standard RL algorithms and vectorized environments

🚀 Stack

🌟 Key Features

✅ Progress

🤖 Algorithms

🌍 Environments

⌛ Coming Soon

🤖 Algorithms

🌍 Environments

🧪 Experiments

K-armed Bandits Testbed

Cliff Walking

💾 Installation

📝 References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages