Skip to content

Latest commit

 

History

History
98 lines (73 loc) · 4.03 KB

README.md

File metadata and controls

98 lines (73 loc) · 4.03 KB

A Pragmatic Look at Deep Imitation Learning

MIT License

Imitation learning algorithms (with PPO [1]):

  • AIRL [2]
  • BC [3]
  • DRIL [4] (without BC)
  • FAIRL [5]
  • GAIL [6]
  • GMMIL [7] (including an optional self-similarity term [8])
  • nn-PUGAIL [9]
  • RED [10]

Options include:

  • State-only imitation learning: state-only: true/false
  • R1 gradient regularisation [11]: r1-reg-coeff: 0.5

Requirements

Requirements can be installed with:

pip install -r requirements.txt

Notable required packages are PyTorch, OpenAI Gym, D4RL-PyBullet and Hydra. Ax and the Hydra Ax sweeper plugin are required for hyperparameter optimisation; if unneeded they can be removed from requirements.txt.

Run

The training of each imitation learning algorithm can be started with:

python main.py algorithm=ALG/ENV

where ALG is one of [AIRL|BC|DRIL|FAIRL|GAIL|GMMIL|PUGAIL|RED] and ENV is one of [ant|halfcheetah|hopper|walker2d]. For example:

python main.py algorithm=AIRL/hopper

Hyperparameters can be found in conf/config.yaml and conf/algorithm/ALG/ENV.yaml, with the latter containing algorithm- and environment-specific hyperparameters that were tuned with Ax.

Results will be saved in outputs/ENV_ALGO/m-d_H-M-S with the last subfolder indicating the current datetime.

Hyperparameter optimisation

Hyperparameter optimisation can be run by adding -m hydra/sweeper=ax hyperparam_opt=ALG, for example:

python main.py -m algorithm=AIRL/hopper hydra/sweeper=ax hyperparam_opt=AIRL 

hyperparam_opt specifies the hyperparameter search space.

Seed sweep

A seed sweep can be performed as follows:

python main.py -m algorithm=AIRL/hopper seed=1,2,3,4,5 

or via the existing bash script:

./scripts/run_seed_experiments.sh ALG ENV

The results will be available in ./output/seed_sweeper_ENV_ALG folder (note that running this code twice will overwrite the previous results).

Results

PyBullet results

Acknowledgements

Citation

If you find this work useful and would like to cite it, the following would be appropriate:

@article{arulkumaran2021pragmatic,
  author = {Arulkumaran, Kai and Ogawa Lillrank, Dan},
  title = {A Pragmatic Look at Deep Imitation Learning},
  journal={arXiv preprint arXiv:2108.01867},
  year = {2021}
}

References

[1] Proximal Policy Optimization Algorithms
[2] Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
[3] Efficient Training of Artificial Neural Networks for Autonomous Navigation
[4] Disagreement-Regularized Imitation Learning
[5] A Divergence Minimization Perspective on Imitation Learning Methods
[6] Generative Adversarial Imitation Learning
[7] Imitation Learning via Kernel Mean Embedding
[8] A Pragmatic Look at Deep Imitation Learning
[9] Positive-Unlabeled Reward Learning
[10] Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation
[11] Which Training Methods for GANs do actually Converge?