ExpGen: Explore to Generalize in Zero-Shot RL

PyTorch implementation of ExpGen [Paper] (NeurIPS'23).

Citation

Please cite using the following entry (bibtex):

@article{zisselman2024explore,
  title={Explore to Generalize in Zero-Shot RL},
  author={Zisselman, Ev and Lavie, Itai and Soudry, Daniel and Tamar, Aviv},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}

Installation

Recommended setup:

Ubuntu +18.4
Python +3.7

Clone repo and install dependencies:

git clone https://github.com/EvZissel/expgen.git
cd expgen

conda env create -f environment.yml
conda activate expgen_env

pip install procgen

Note: If you face an issue with libffi/cffi
ImportError: libffi.so.7: cannot open shared object file: No such file or directory
try installing cffi directly using pip install cffi==1.13.0

Example: Train ExpGen on ProcGen Maze

Example of training ExpGen on ProcGen environments:

Train a reward maximizing agent

Use a range of different seed values to produce an ensemble of agents:

python train_ppo.py --env-name maze --seed 0 --use_backgrounds

Note: Use seed values 0-9 to reproduce the results presented in the paper.

Train a maxEnt agent

python train_maxEnt.py --env-name maze --seed 0 --use_backgrounds

Evaluate ExpGen

python expgen_ensemble.py --env-name maze --use_backgrounds

Note: The hyperparameters used in the paper are set as default values in code.

ProcGen Results

Using PPO ensemble, ExpGen demonstrates a notable performance gain on games that were not solvable using invariant based approaches.

Alternatively, ExpGen can also utilize invariance based approaches such as an ensemble of IDAAC agents (instead of PPO) to attain invariance and test time exploration, achieving state-of-the-art results.

Hidden Maze Experiment

To reproduce the results of the hidden maze experiment (see Appendix A), use the following command:

python train_ppo.py --env-name maze --seed 0 --num-level 128 --recurrent-policy --mask_all --use_generated_assets --restrict_themes --use_monochrome_assets

Acknowledgements

This code is based on the open-source PyTorch implementation of PPO.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
PPO_maxEnt_LEEP		PPO_maxEnt_LEEP
figures		figures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
evaluation.py		evaluation.py
expgen_ensemble.py		expgen_ensemble.py
plot.py		plot.py
train_LEEP.py		train_LEEP.py
train_maxEnt.py		train_maxEnt.py
train_ppo.py		train_ppo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ExpGen: Explore to Generalize in Zero-Shot RL

Citation

Installation

Example: Train ExpGen on ProcGen Maze

Train a reward maximizing agent

Train a maxEnt agent

Evaluate ExpGen

ProcGen Results

Hidden Maze Experiment

Acknowledgements

About

Releases

Packages

Languages

License

EvZissel/expgen

Folders and files

Latest commit

History

Repository files navigation

ExpGen: Explore to Generalize in Zero-Shot RL

Citation

Installation

Example: Train ExpGen on ProcGen Maze

Train a reward maximizing agent

Train a maxEnt agent

Evaluate ExpGen

ProcGen Results

Hidden Maze Experiment

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages