Name		Name	Last commit message	Last commit date
parent directory ..
agent		agent
conf		conf
dataset		dataset
envs		envs
experts		experts
results		results
scripts		scripts
trained_policies		trained_policies
utils		utils
vis		vis
wrappers		wrappers
LICENSE.md		LICENSE.md
LICENSE.pdf		LICENSE.pdf
README.md		README.md
expert_generation.py		expert_generation.py
iq.py		iq.py
make_envs.py		make_envs.py
requirements.txt		requirements.txt
test_iq.py		test_iq.py
test_rl.py		test_rl.py
train_iq.py		train_iq.py
train_rl.py		train_rl.py

README.md

Inverse Q-Learning (IQ-Learn)

SOTA framework for non-adversarial Imitation Learning

IQ-Learn enables very fast, scalable and stable imitation learning. Our IQ-Learn algorithm is present in iq.py. This file can be used standalone to add IQ to your IL & RL projects.

IQ-Learn can be implemented on top of most existing RL methods (off-policy & on-policy) by changing the critic update loss to our proposed iq_loss.
(IQ has been successfully tested to work with Q-Learning, SAC, PPO, DDPG and Decision Transformer agents).

Update:

Added IQ-Learn results on Humanoid-v2
Added support for DM Control environments
Released expert_generation script to generate your own experts from trained RL agents for new environments.

Requirement

pytorch (>= 1.4)
gym
wandb
tensorboardX
hydra-core=1.0 (>= 1.1 is incompatible currently)

Installation

Make a conda environment and install dependencies: pip install -r requirements.txt
Setup wandb project to log and visualize metrics
(Optional) Download expert datasets for Atari environments from GDrive

Examples

We show some examples that push the boundaries of imitation learning using IQ-Learn:

1. CartPole-v1 using 1 demo subsampled 20 times with fully offline imitation

python train_iq.py agent=softq method=iq env=cartpole expert.demos=1 expert.subsample_freq=20 agent.init_temp=0.001 method.chi=True method.loss=value_expert

IQ-Learn is the only method thats reaches the expert env reward of 500 (requiring only 3k training steps and less than 30 secs!!)

2. Playing Pong at human performance

python train_iq.py agent=softq env=pong agent.init_temp=1e-3 method.loss=value_expert method.chi=True seed=0 expert.demos=30

Again, IQ-Learn is the only method thats reaches the expert env reward of 21
(we find better hyperparams compared to the original paper)

3. Controlling a Humanoid with imitation of a single expert

python train_iq.py env=humanoid agent=sac expert.demos=1 method.loss=v0 method.regularize=True agent.actor_lr=3e-05 seed=0 agent.init_temp=1

IQ-Learn learns to control a full humanoid at expert performance using a single demonstration reaching the expert env reward of 5300

Instructions

We show example code for training Q-Learning and SAC agents with IQ-Learn in train_iq.py. We make minimum modifications to original RL training code present in train_rl.py and simply change the critic loss function.

To reproduce our Offline IL experiments, see scripts/run_offline.sh
To reproduce our Mujoco experiments, see scripts/run_mujoco.sh
To reproduce Atari experiments, see scripts/run_atari.sh
To visualize our recovered state-only rewards on a toy Point Maze environment: python -m vis.maze_vis env=pointmaze_right eval.policy=pointmaze agent.init_temp=1 agent=sac.q_net._target_=agent.sac_models.DoubleQCritic.
Reward visualizations are saved in vis/outputs directory

Contributions

Contributions are very welcome. If you know how to make this code better, please open an issue. If you want to submit a pull request, please open an issue first.

License

The code is made available for academic, non-commercial usage. Please see the LICENSE for the licensing terms of IQ-Learn for commercial use and running it on your robots/creating new AI agents.

For any inquiry, contact: Div Garg (divgarg@stanford.edu)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iq_learn

iq_learn

README.md

Inverse Q-Learning (IQ-Learn)

SOTA framework for non-adversarial Imitation Learning

Update:

Requirement

Installation

Examples

1. CartPole-v1 using 1 demo subsampled 20 times with fully offline imitation

2. Playing Pong at human performance

3. Controlling a Humanoid with imitation of a single expert

Instructions

Contributions

License

Files

iq_learn

Directory actions

More options

Directory actions

More options

Latest commit

History

iq_learn

Folders and files

parent directory

README.md

Inverse Q-Learning (IQ-Learn)

SOTA framework for non-adversarial Imitation Learning

Update:

Requirement

Installation

Examples

1. CartPole-v1 using 1 demo subsampled 20 times with fully offline imitation

2. Playing Pong at human performance

3. Controlling a Humanoid with imitation of a single expert

Instructions

Contributions

License