This repository is deprecated. I am working now on https://github.com/Itomigna2/Muesli-lunarlander
Colab demo link : https://colab.research.google.com/drive/19qTIgLvevkc5TA9zNjaS5lILWofGvZPJ?usp=sharing
Muesli paper link : https://arxiv.org/abs/2104.06159
CartPole-v1 env document : https://www.gymlibrary.dev/environments/classic_control/cart_pole/
- MuZero network
- 5 step unroll
- L_pg+cmpo
- L_v
- L_r
- L_m (5 step)
- Stacking 8 observations
- Mini-batch update
- Hidden state scaled within [-1,1]
- Gradient clipping by value [-1,1]
- Dynamics network gradient scale 1/2
- Target network(prior parameters) moving average update
- Categorical representation (value, reward model)
- Normalized advantage
- Tensorboard monitoring
- self play follow main network inferenced policy (originally follow target network)
This code(.ipynb) is executable in Google Colab. Requirements.txt is from Colab CPU compute backend.