Offline Reinforcement Learning with Classification

The repository organisation is inspired by CORL and ReBRAC repositories.

Dependencies & Docker setup

To set up a python environment (with dev-tools of your taste, in our workflow, we use conda and python 3.8), just install all the requirements:

python3 install -r requirements.txt

However, in this setup, you must install mujoco210 binaries by hand. Sometimes this is not super straightforward, but this recipe can help:

mkdir -p /root/.mujoco \
    && wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz -O mujoco.tar.gz \
    && tar -xf mujoco.tar.gz -C /root/.mujoco \
    && rm mujoco.tar.gz
export LD_LIBRARY_PATH=/root/.mujoco/mujoco210/bin:${LD_LIBRARY_PATH}

You may also need to install additional dependencies for mujoco_py. We recommend following the official guide from mujoco_py.

Docker

We also provide a more straightforward way with a dockerfile that is already set up to work. All you have to do is build and run it :)

docker build -t clorl .

To run, mount current directory:

docker run -it \
    --gpus=all \
    --rm \
    --volume "<PATH_TO_THE_REPO>:/workspace/" \
    --name clorl \
    clorl bash

How to reproduce experiments

Training

Configs for reproducing results of original algorithms are stored in the configs/<algorithm_name>/<task_type>. All avaialable hyperparameters are listed in the src/algorithms/<algorithm_name>.py. Implemented algorithms are: rebrac, iql, lb-sac.

Configs for reproducing results of algorithms with classification are stored in configs/<algorithm_name>-ce/<task_type>, configs/<algorithm_name>-ce-ct/<task_type>, configs/<algorithm_name>-ce-at/<task_type>. The notation (the same in the paper): ce denotes the replacement of MSE with cross-entropy, ce-at denotes cross-entropy with tuned algorithm parameters, ce-ct denotes cross-entropy with tuned classification parameter. All available hyperparameters are listed in the src/algorithms/<algorithm_name>_cl.py. Implemented algorithms are: rebrac, iql, lb-sac.

For example, to start ReBRAC+classification training process with D4RL halfcheetah-medium-v2 dataset, run the following:

PYTHONPATH=. python3 src/algorithms/rebrac_cl.py --config_path="configs/rebrac-ce/halfcheetah/medium_expert_v2.yaml"

Targeted Reproduction

We provide Weights & Biases logs for all of our experiments here.

If you want to replicate results from our work, you can use the configs for Weights & Biases Sweeps provided in the configs/sweeps.

Paper element	Sweeps path (we omit the common prefix `configs/sweeps/`)
Tables 1, 2, 3, 16, 17, 18	`eval/<algorithm_name>.yaml`, `eval/<algorithm_name>-ce.yaml`, `eval/<algorithm_name>-ce-at.yaml`, `eval/<algorithm_name>-ce-ct.yaml`, `eval/<algorithm_name>-ce-mt.yaml`
Figure 2	All sweeps from `expand`
Figure 3	All sweeps from `network_sizes`
Hyperparameters tuning	All sweeps from `tuning`

Reliable Reports

We also provide a script and binary data for reconstructing the graphs and tables from our paper: plotting/plotting.py. We repacked the results into .pickle files, so you can re-use them for further research and head-to-head comparisons.

Citing

If you use this code for your research, please consider the following bibtex:

@article{tarasov2024value,
  title={Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?},
  author={Tarasov, Denis and Brilliantov, Kirill and Kharlapenko, Dmitrii},
  journal={arXiv preprint arXiv:2406.06309},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
configs		configs
plotting		plotting
src/algorithms		src/algorithms
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Offline Reinforcement Learning with Classification

Dependencies & Docker setup

Docker

How to reproduce experiments

Training

Targeted Reproduction

Reliable Reports

Citing

About

Releases

Packages

Contributors 2

Languages

License

DT6A/ClORL

Folders and files

Latest commit

History

Repository files navigation

Offline Reinforcement Learning with Classification

Dependencies & Docker setup

Docker

How to reproduce experiments

Training

Targeted Reproduction

Reliable Reports

Citing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages