AMPED is an iterative improvement on the MuZero algorithm. There are two important changes relative MuZero:
(1) AMPED combines the MuZero objective and the PPO
objective; (2) AMPED uses an n-th order Markov evolution
dynamics (NOMAD) function instead of the first order
Markov dynamics function used in MuZero.
Specifically, the AMPED objective is formulated as follows:
The AMPED objective is minimized using standard gradient descent techniques, i.e. ADAM.
AMPED extends the first order Markov dynamics function used in MuZero by allowing
Finally, AMPED uses an empirical advantage estimate
during MCTS backup phase. This advantage is calculated
from the predicted Q-value and the predicted value (from
the prediction function
We find that AMPED compares at least as good as PPO and MuZero on reinforcement learning problems.
See the paper in this repository for more details, as well as a re-implementation of basic PPO and MuZero algorithms.