Code for NeurIPS 2024 accepted paper: Doubly Mild Generalization for Offline Reinforcement Learning.
Paper results were collected with MuJoCo 210 (and mujoco-py 2.1.2.14) in OpenAI gym 0.23.1 with the D4RL datasets. Networks are trained using PyTorch 1.11.0 and Python 3.7.
Use the following command to train offline RL on D4RL, including Gym locomotion and Antmaze tasks, and save the models.
python train_offline.py --env halfcheetah-medium-v2 --lam 0.25 --nu 0.1 --save_model
python train_offline.py --env antmaze-large-diverse-v2 --lam 0.25 --nu 0.5 --no_normalize --save_model
Use the following command to online fine-tune the pretrained offline models on AntMaze tasks.
python train_finetune.py --env antmaze-large-diverse-v2 --lam 0.25 --nu 0.5 --lam_end 0.5 --nu_end 0.005 --no_normalize
You can view saved runs using TensorBoard.
tensorboard --logdir <run_dir>
If you find this work useful, please consider citing:
@article{mao2024doubly,
title={Doubly mild generalization for offline reinforcement learning},
author={Mao, Yixiu and Wang, Qi and Qu, Yun and Jiang, Yuhang and Ji, Xiangyang},
journal={Advances in Neural Information Processing Systems},
volume={37},
pages={51436--51473},
year={2024}
}