AMPED

Advantaged Markovian Proxy Evolution Dynamics (AMPED)

AMPED is an iterative improvement on the MuZero algorithm. There are two important changes relative MuZero: (1) AMPED combines the MuZero objective and the PPO objective; (2) AMPED uses an n-th order Markov evolution dynamics (NOMAD) function instead of the first order Markov dynamics function used in MuZero. Specifically, the AMPED objective is formulated as follows: $L(θ) = −L_{CLIP} + L_{MU} − L_{ENTROPY}$

The AMPED objective is minimized using standard gradient descent techniques, i.e. ADAM. AMPED extends the first order Markov dynamics function used in MuZero by allowing $g$ (the dynamics function) to take in n previous states, $s_{i−n}, ..., s_i$. The $n$ states are initialized to $s_0$, as generated by the representation function $h$. Formally, we have: $gθ(s_{k−n−1}, ..., s_{k−1}) = r_k, s_k$. This allows AMPED to break the standard Markov assumption (that all states rely only on the previous state). We hypothesize that breaking this assumption by introducing the n-th order Markov chain input will allow the dynamics function to generalize better over time.

Finally, AMPED uses an empirical advantage estimate during MCTS backup phase. This advantage is calculated from the predicted Q-value and the predicted value (from the prediction function $f$), rather than solely using the Q-value as MuZero does. We refer the readers to [5] for details on the MCTS backup phase.

We find that AMPED compares at least as good as PPO and MuZero on reinforcement learning problems.

See the paper in this repository for more details, as well as a re-implementation of basic PPO and MuZero algorithms.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
amped_paper.pdf		amped_paper.pdf
environment.yaml		environment.yaml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AMPED

Advantaged Markovian Proxy Evolution Dynamics (AMPED)

About

Releases

Packages

Languages

ndalton12/AMPED

Folders and files

Latest commit

History

Repository files navigation

AMPED

Advantaged Markovian Proxy Evolution Dynamics (AMPED)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages