Code for NeurIPS 2023 accepted paper: Supported Value Regularization for Offline Reinforcement Learning.
Paper results were collected with MuJoCo 210 (and mujoco-py 2.1.2.14) in OpenAI gym 0.23.1 with the D4RL datasets. Networks are trained using PyTorch 1.11.0 and Python 3.7.
We have uploaded pretrained behavior models in SVR_bcmodels/ to facilitate experiment reproduction.
You can also pretrain behavior models by running:
./run_pretrain.sh
You can train SVR on D4RL datasets by running:
./run_experiments.sh
This codebase uses tensorboard. You can view saved runs with:
tensorboard --logdir <run_dir>
If you find this work useful, please consider citing:
@article{mao2023supported,
title={Supported value regularization for offline reinforcement learning},
author={Mao, Yixiu and Zhang, Hongchang and Chen, Chen and Xu, Yi and Ji, Xiangyang},
journal={Advances in Neural Information Processing Systems},
volume={36},
pages={40587--40609},
year={2023}
}