Skip to content

Simple Muesli RL algorithm implementation (PyTorch)

Notifications You must be signed in to change notification settings

Itomigna2/Muesli-cartpole

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

Muesli-cartpole

This repository is deprecated. I am working now on https://github.com/Itomigna2/Muesli-lunarlander

Links

Colab demo link : https://colab.research.google.com/drive/19qTIgLvevkc5TA9zNjaS5lILWofGvZPJ?usp=sharing

Muesli paper link : https://arxiv.org/abs/2104.06159

CartPole-v1 env document : https://www.gymlibrary.dev/environments/classic_control/cart_pole/

Implemented

  • MuZero network
  • 5 step unroll
  • L_pg+cmpo
  • L_v
  • L_r
  • L_m (5 step)
  • Stacking 8 observations
  • Mini-batch update
  • Hidden state scaled within [-1,1]
  • Gradient clipping by value [-1,1]
  • Dynamics network gradient scale 1/2
  • Target network(prior parameters) moving average update
  • Categorical representation (value, reward model)
  • Normalized advantage
  • Tensorboard monitoring

Differences from paper

  • self play follow main network inferenced policy (originally follow target network)

Memo

This code(.ipynb) is executable in Google Colab. Requirements.txt is from Colab CPU compute backend.

Releases

No releases published

Packages

No packages published