Clean-Offline-RLHF

Project Website · Paper · Platform · Datasets · Clean Offline RLHF

This is the official PyTorch implementation of the paper "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback". Clean-Offline-RLHF is an Offline Reinforcement Learning with Human Feedback codebase that provides high-quality and realistic human feedback implementations of offline RL algorithms.

💡 News

[03-26-2024] 🔥 Update Mini-Uni-RLHF, a minimal out-of-the-box annotation tool for researchers, powered by streamlit.
[03-24-2024] Release of SMARTS environment training dataset, scripts and labels. You can find it in the smarts branch.
[03-20-2024] Update detail setup bash files.
[02-22-2024] Initial code release.

🛠️ Getting Started

Installation on Linux (Ubuntu)

Clone the repo

git clone https://github.com/thomas475/Clean-Offline-RLHF.git
cd Clean-Offline-RLHF

Setup Anaconda environment

conda create -n rlhf python==3.9
conda activate rlhf

Install Dependencies
```
pip install -r requirements.txt
```
Install hdf5
```
conda install anaconda::hdf5
```
Install Torch
- For GPU Support (CUDA):
  
  pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
- For CPU Only:
  
  pip install torch==1.13.1+cpu torchvision==0.14.1+cpu torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cpu

MuJoCo

Many of the datasets use MuJoCo as environment, so it should be installed, too. See this for further details.

Download the MuJoCo library:

wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz

Create the MuJoCo folder:
```
mkdir ~/.mujoco
```

Extract the library to the MuJoCo folder:

tar -xvf mujoco210-linux-x86_64.tar.gz -C ~/.mujoco/

Add environment variables (run nano ~/.bashrc):

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin
export MUJOCO_GL=egl

Reload the .bashrc file to register the changes.
```
source ~/.bashrc
```

Install dependencies:

conda install -c conda-forge patchelf fasteners cython==0.29.37 cffi pyglfw libllvm11 imageio glew glfw mesalib
sudo apt-get install libglew-dev

Test that the library is installed.

cd ~/.mujoco/mujoco210/bin
./simulate ../model/humanoid.xml

💻 Usage

Human Feedback

Before using offline RLHF algorithm, you should annotate your dataset using human feedback. If you wish to collect labeled dataset to new tasks, we refer to the platform part for crowdsourced annotation. The already collected crowdsourced annotation datasets (~15M steps) are available here.

The processed crowdsourced (CS) and scripted teacher (ST) labels can be found in the crowdsource_human_labels and generated_fake_labels folders, respectively.

Note: for comparison and validation purposes, we provide fast track for scripted teacher (ST) label generation in fast_track/generate_d4rl_fake_labels.py.

Prepare Crowdsourced Data

The exported labels from the Uni-RLHF-Platform have to be transformed into an approprite format first. To do this you can use the following script (replace [dir_path] with the location of the raw labels):

cd scripts
python3 transform_raw_labels.py --data_dir [dir_path]

Pre-train Auxiliary Models

You can configure the training of the auxiliary models (reward model, attribute mapping model, keypoint prediction model) by creating a custom config.yaml file (available parameters can be seen in TrainConfig object in rlhf/train_model.py). To then train these models you run the following command:

cd rlhf
python3 train_model.py --config config.yaml

Train Offline RL with Pre-trained Auxiliary Models

Following the Uni-RLHF codebase implemeration, we modified the IQL, CQL and TD3BC algorithms. Furthermore, we added the DiffusionQL algorithm. You can adjust the details of the experiments in the TrainConfig objects in the algorithm implementations found in /algorithms/offline, as well as in the files in the /config directory.

Example: Train with implicit Q-learning. The log will be uploaded to wandb.

python3 algorithms/offline/iql_p.py

These are the possible variations of algorithms, feedback types, label types, and auxiliary model types:

Algorithm	Feedback Type	Label Type	Auxiliary Model Type
IQL	COMPARATIVE	CS	MLP
CQL	ATTRIBUTE	ST	TFM
TD3BC	EVALUATIVE		CNN
DIFFUSIONQL	KEYPOINT

🏷️ License

Distributed under the MIT License. See LICENSE.txt for more information.

✉️ Contact

For any questions, please feel free to email yuanyf@tju.edu.cn.

📝 Citation

If you find our work useful, please consider citing:

@inproceedings{anonymous2023unirlhf,
    title={Uni-{RLHF}: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback},
    author={Yuan, Yifu and Hao, Jianye and Ma, Yi and Dong, Zibin and Liang, Hebin and Liu, Jinyi and Feng, Zhixin and Zhao, Kai and Zheng, Yan}
    booktitle={The Twelfth International Conference on Learning Representations, ICLR},
    year={2024},
    url={https://openreview.net/forum?id=WesY0H9ghM},
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
algorithms		algorithms
assets		assets
configs		configs
crowdsource_human_labels		crowdsource_human_labels
fast_track		fast_track
generated_fake_labels		generated_fake_labels
raw_human_labels		raw_human_labels
rlhf		rlhf
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
legacy_requirements.txt		legacy_requirements.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Clean-Offline-RLHF

💡 News

🛠️ Getting Started

Installation on Linux (Ubuntu)

MuJoCo

💻 Usage

Human Feedback

Prepare Crowdsourced Data

Pre-train Auxiliary Models

Train Offline RL with Pre-trained Auxiliary Models

🏷️ License

✉️ Contact

📝 Citation

About

Uh oh!

Releases

Packages

Languages

License

thomas475/Clean-Offline-RLHF

Folders and files

Latest commit

History

Repository files navigation

Clean-Offline-RLHF

💡 News

🛠️ Getting Started

Installation on Linux (Ubuntu)

MuJoCo

💻 Usage

Human Feedback

Prepare Crowdsourced Data

Pre-train Auxiliary Models

Train Offline RL with Pre-trained Auxiliary Models

🏷️ License

✉️ Contact

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages