- PPO
- SAC
- GAIL(Generative Adversarial Imitation Learning)
- VAIL(Variational Adversarial Imitation Learning)
- SQIL(Imitation Learning via Reinforcement Learning with Sparse Rewards)
- AIRL(Adversarial Inverse Reinforcement Learning)
- Two value functions can be merged into one.
- Extremely unstable
- EAIRL(Empowerment-regularized Adversarial Inverse Reinforcement Learning)
- Two value functions can be merged into one.
- Extremely unstable
- VAIRL(Variational Adversarial Inverse Reinforcement Learning)
- Joint gaussian distribution kl-divergence yet.
- add more environments(ant and disabled ant)
- build setup file
- make expert
- make trajectories by expert