Author: Yingru Li, Jiawei Xu, Lei Han, Zhi-Quan Luo
This repository contains the official implementation of the HyperAgent algorithm, introduced in our ICML 2024 paper Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice via HyperAgent.
For integrating the Generative Pre-trained Transformer (GPT) with HyperAgent, see szrlee/GPT-HyperAgent, designed for adaptive foundation models for online decisions.
- Data Efficient ✅: HyperAgent achieves human-level performance (1 IQM) with only 15% of the data used by Double-DQN (DDQN, 2016, DeepMind) in 1.5M interactions.
- Computation Efficient ✅: HyperAgent uses just 5% of the model parameters compared to the 2023 state-of-the-art algorithm (BBF, DeepMind).
- Ensemble+ Comparison: Achieves only 0.22 IQM score under 1.5M interactions and requires double the parameters of HyperAgent.
Reference:
cd HyperAgent
pip install -e .
If you encounter an error related to ROMs when using Atari, follow these steps:
-
Download
Roms.rar
from the Atari 2600 VCS ROM Collection. -
Extract the
.rar
file to a directory of your choice. -
Run the following command:
python -m atari_py.import_roms <path to extracted folder>
This command will import the ROMs and print their names as they are processed. The ROMs will be copied to your atari_py
installation directory.
For detailed instructions on using ROMs, please refer to the official documentation.
To reproduce the results for Atari (e.g., Pong):
sh experiments/start_atari.sh Pong
To reproduce the results for DeepSea (e.g., size 20):
sh experiments/start_deepsea.sh 20
If you find this work useful in your research, please cite our paper:
@inproceedings{li2024hyperagent,
title = {{Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice via HyperAgent}},
author = {Li, Yingru and Xu, Jiawei and Han, Lei and Luo, Zhi-Quan},
booktitle = {Forty-first International Conference on Machine Learning},
year = {2024},
series = {Proceedings of Machine Learning Research},
eprint = {2402.10228},
archiveprefix = {arXiv},
primaryclass = {cs.LG},
url = {https://arxiv.org/abs/2402.10228}
}
This project is licensed under the MIT License - see the LICENSE file for details.