Information Bottleneck Option Learning (IBOL)

This is the code for our paper,

Jaekyeom Kim*, Seohong Park* and Gunhee Kim (*equal contribution). Unsupervised Skill Discovery with Bottleneck Option Learning. In ICML, 2021. [paper] [talk] [slides]

It includes the implementation for IBOL, specifically, the linearizer, the skill discovery method on top of it and the downstream tasks for the evaluation of them.

Citing the paper

If you find our work or this code useful in your research, please cite

@inproceedings{kim2021_ibol,
    title={Unsupervised Skill Discovery with Bottleneck Option Learning},
    author={Kim, Jaekyeom and Park, Seohong and Kim, Gunhee},
    booktitle={International Conference on Machine Learning (ICML)},
    year={2021}
}

Example Skills

We show some example skills discovered by IBOL in four MuJoCo environments without rewards.

Ant

Locomotion skills

Rotation skills (complementary to Figure 6 in the main paper)

Humanoid

HalfCheetah

Hopper (at 5x speed)

Requirements

This code is tested in environments with the following conditions:

Ubuntu 16.04 machine
CUDA-compatible GPUs
Python 3.7.10

Environment Setup

Install MuJoCo version 2.0 binaries, following the instructions. Note that they offer multiple licensing choices including the 30-day free trials.
At the top-level directory, run the following command to set up the environment.
```
pip install --no-deps -r requirements.txt
```

Training

Linearizer

Command	Environment
`python tests/main.py --train_type linearizer --env ant`	Ant
`python tests/main.py --train_type linearizer --env half_cheetah`	HalfCheetah
`python tests/main.py --train_type linearizer --env hopper`	Hopper
`python tests/main.py --train_type linearizer --env humanoid`	Humanoid
`python tests/main.py --train_type linearizer --env dkitty_randomized`	D'Kitty Randomized

Skill discovery

Command	Environment
`python tests/main.py --train_type skill_discovery --env ant --cp_path "exp/L_ANT/sampling_policy.pt"`	Ant
`python tests/main.py --train_type skill_discovery --env half_cheetah --cp_path "exp/L_HC/sampling_policy.pt"`	HalfCheetah
`python tests/main.py --train_type skill_discovery --env hopper --cp_path "exp/L_HP/sampling_policy.pt"`	Hopper
`python tests/main.py --train_type skill_discovery --env humanoid --cp_path "exp/L_HUM/sampling_policy.pt"`	Humanoid
`python tests/main.py --train_type skill_discovery --env dkitty_randomized --cp_path "exp/L_DK/sampling_policy.pt"`	D'Kitty Randomized

Downstream tasks

Command	Environment
`python tests/main.py --train_type downstream --env ant_goal --cp_path "exp/L_ANT/sampling_policy.pt" --dcp_path "exp/S_ANT/option_policy.pt"`	AntGoal
`python tests/main.py --train_type downstream --env ant_multi_goals --cp_path "exp/L_ANT/sampling_policy.pt" --dcp_path "exp/S_ANT/option_policy.pt"`	AntMultiGoals
`python tests/main.py --train_type downstream --env half_cheetah_goal --cp_path "exp/L_CH/sampling_policy.pt" --dcp_path "exp/S_CH/option_policy.pt"`	CheetahGoal
`python tests/main.py --train_type downstream --env half_cheetah_imi --cp_path "exp/L_CH/sampling_policy.pt" --dcp_path "exp/S_CH/option_policy.pt"`	CheetahImitation

Evaluation

Each training command stores its results in an experiment directory under exp/.
In each experiment directory, plots/ (files) or tb_plot/ (tensorboard) contain qualitative visualizations.
For downstream tasks, the column named TrainSp/IOD/SmoothedReward500 in progress.csv can be examined.

Acknowledgments

This code is based on garage.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
dowel		dowel
envs		envs
garaged		garaged
garagei		garagei
iod		iod
tests		tests
.gitignore		.gitignore
README.md		README.md
aim_wrapper.py		aim_wrapper.py
dowel_wrapper.py		dowel_wrapper.py
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py
setup_env.sh		setup_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Information Bottleneck Option Learning (IBOL)

Citing the paper

Example Skills

Ant

Locomotion skills

Rotation skills (complementary to Figure 6 in the main paper)

Humanoid

HalfCheetah

Hopper (at 5x speed)

Requirements

Environment Setup

Training

Linearizer

Skill discovery

Downstream tasks

Evaluation

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

jaekyeom/IBOL

Folders and files

Latest commit

History

Repository files navigation

Information Bottleneck Option Learning (IBOL)

Citing the paper

Example Skills

Ant

Locomotion skills

Rotation skills (complementary to Figure 6 in the main paper)

Humanoid

HalfCheetah

Hopper (at 5x speed)

Requirements

Environment Setup

Training

Linearizer

Skill discovery

Downstream tasks

Evaluation

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages