The repository organisation is inspired by CORL and ReBRAC repositories.
To set up a python environment (with dev-tools of your taste, in our workflow, we use conda and python 3.8), just install all the requirements:
python3 install -r requirements.txt
However, in this setup, you must install mujoco210 binaries by hand. Sometimes this is not super straightforward, but this recipe can help:
mkdir -p /root/.mujoco \
&& wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz -O mujoco.tar.gz \
&& tar -xf mujoco.tar.gz -C /root/.mujoco \
&& rm mujoco.tar.gz
export LD_LIBRARY_PATH=/root/.mujoco/mujoco210/bin:${LD_LIBRARY_PATH}
You may also need to install additional dependencies for mujoco_py. We recommend following the official guide from mujoco_py.
We also provide a more straightforward way with a dockerfile that is already set up to work. All you have to do is build and run it :)
docker build -t clorl .
To run, mount current directory:
docker run -it \
--gpus=all \
--rm \
--volume "<PATH_TO_THE_REPO>:/workspace/" \
--name clorl \
clorl bash
Configs for reproducing results of original algorithms are stored in the configs/<algorithm_name>/<task_type>
. All avaialable hyperparameters are listed in the src/algorithms/<algorithm_name>.py
. Implemented algorithms are: rebrac
, iql
, lb-sac
.
Configs for reproducing results of algorithms with classification are stored in configs/<algorithm_name>-ce/<task_type>
, configs/<algorithm_name>-ce-ct/<task_type>
, configs/<algorithm_name>-ce-at/<task_type>
. The notation (the same in the paper): ce
denotes the replacement of MSE with cross-entropy, ce-at
denotes cross-entropy with tuned algorithm parameters, ce-ct
denotes cross-entropy with tuned classification parameter. All available hyperparameters are listed in the src/algorithms/<algorithm_name>_cl.py
. Implemented algorithms are: rebrac
, iql
, lb-sac
.
For example, to start ReBRAC+classification training process with D4RL halfcheetah-medium-v2
dataset, run the following:
PYTHONPATH=. python3 src/algorithms/rebrac_cl.py --config_path="configs/rebrac-ce/halfcheetah/medium_expert_v2.yaml"
We provide Weights & Biases logs for all of our experiments here.
If you want to replicate results from our work, you can use the configs for Weights & Biases Sweeps provided in the configs/sweeps
.
Paper element | Sweeps path (we omit the common prefix configs/sweeps/ ) |
---|---|
Tables 1, 2, 3, 16, 17, 18 | eval/<algorithm_name>.yaml , eval/<algorithm_name>-ce.yaml , eval/<algorithm_name>-ce-at.yaml , eval/<algorithm_name>-ce-ct.yaml , eval/<algorithm_name>-ce-mt.yaml |
Figure 2 | All sweeps from expand |
Figure 3 | All sweeps from network_sizes |
Hyperparameters tuning | All sweeps from tuning |
We also provide a script and binary data for reconstructing the graphs and tables from our paper: plotting/plotting.py
. We repacked the results into .pickle files, so you can re-use them for further research and head-to-head comparisons.
If you use this code for your research, please consider the following bibtex:
@article{tarasov2024value,
title={Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?},
author={Tarasov, Denis and Brilliantov, Kirill and Kharlapenko, Dmitrii},
journal={arXiv preprint arXiv:2406.06309},
year={2024}
}