This repo is an unofficial implementation of Implicit Q-Learning (In-sample Q-Learning) in PyTorch.
@inproceedings{
kostrikov2022offline,
title={Offline Reinforcement Learning with Implicit Q-Learning},
author={Ilya Kostrikov and Ashvin Nair and Sergey Levine},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=68n2s9ZJWF8}
}
Note: Reward standardization (We standardize MuJoCo locomotion task rewards by dividing by the difference of returns of the best and worst trajectories in each dataset) used in official implementation is missed in this implementation. One can easily add it by itself.
python main_iql.py --env halfcheetah-medium-v2 --expectile 0.7 --temperature 3.0 --eval_freq 5000 --eval_episodes 10 --normalize
python main_iql.py --env antmaze-medium-play-v2 --expectile 0.9 --temperature 10.0 --eval_freq 50000 --eval_episodes 100
This repo borrows heavily from sfujim/TD3_BC and ikostrikov/implicit_q_learning.