IQ-Learn enables very fast, scalable and stable imitation learning.
Our IQ-Learn algorithm is present in iq.py
. This file can be used standalone to add IQ to your IL & RL projects.
IQ-Learn can be implemented on top of most existing RL methods (off-policy & on-policy) by changing the critic update loss to our proposed iq_loss
.
(IQ has been successfully tested to work with Q-Learning, SAC, PPO, DDPG and Decision Transformer agents).
- Added IQ-Learn results on Humanoid-v2
- Added support for DM Control environments
- Released
expert_generation
script to generate your own experts from trained RL agents for new environments.
- pytorch (>= 1.4)
- gym
- wandb
- tensorboardX
- hydra-core=1.0 (>= 1.1 is incompatible currently)
- Make a conda environment and install dependencies:
pip install -r requirements.txt
- Setup wandb project to log and visualize metrics
- (Optional) Download expert datasets for Atari environments from GDrive
We show some examples that push the boundaries of imitation learning using IQ-Learn:
python train_iq.py agent=softq method=iq env=cartpole expert.demos=1 expert.subsample_freq=20 agent.init_temp=0.001 method.chi=True method.loss=value_expert
IQ-Learn is the only method thats reaches the expert env reward of 500 (requiring only 3k training steps and less than 30 secs!!)
python train_iq.py agent=softq env=pong agent.init_temp=1e-3 method.loss=value_expert method.chi=True seed=0 expert.demos=30
Again, IQ-Learn is the only method thats reaches the expert env reward of 21
(we find better hyperparams compared to the original paper)
python train_iq.py env=humanoid agent=sac expert.demos=1 method.loss=v0 method.regularize=True agent.actor_lr=3e-05 seed=0 agent.init_temp=1
IQ-Learn learns to control a full humanoid at expert performance using a single demonstration reaching the expert env reward of 5300
We show example code for training Q-Learning and SAC agents with IQ-Learn in train_iq
.py. We make minimum modifications to original RL training code present in train_rl
.py and simply change the critic loss function.
- To reproduce our Offline IL experiments, see
scripts/run_offline.sh
- To reproduce our Mujoco experiments, see
scripts/run_mujoco.sh
- To reproduce Atari experiments, see
scripts/run_atari.sh
- To visualize our recovered state-only rewards on a toy Point Maze environment:
python -m vis.maze_vis env=pointmaze_right eval.policy=pointmaze agent.init_temp=1 agent=sac.q_net._target_=agent.sac_models.DoubleQCritic
.
Reward visualizations are saved invis/outputs
directory
Contributions are very welcome. If you know how to make this code better, please open an issue. If you want to submit a pull request, please open an issue first.
The code is made available for academic, non-commercial usage. Please see the LICENSE for the licensing terms of IQ-Learn for commercial use and running it on your robots/creating new AI agents.
For any inquiry, contact: Div Garg (divgarg@stanford.edu)