Skip to content

We extended the Clean-Offline-RLHF benchmarking framework to support learning from attribute, evaluative and keypoint feedback. Furthermore, we added support for learning from multiple feedback types simultaneously. Also, we added the Franka kitchen domain and the Diffusion-QL algorithm for benchmarking.

License

Notifications You must be signed in to change notification settings

thomas475/Clean-Offline-RLHF

 
 

Repository files navigation

Clean-Offline-RLHF

Project Website · Paper · Platform · Datasets · Clean Offline RLHF

This is the official PyTorch implementation of the paper "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback". Clean-Offline-RLHF is an Offline Reinforcement Learning with Human Feedback codebase that provides high-quality and realistic human feedback implementations of offline RL algorithms.



💡 News

  • [03-26-2024] 🔥 Update Mini-Uni-RLHF, a minimal out-of-the-box annotation tool for researchers, powered by streamlit.
  • [03-24-2024] Release of SMARTS environment training dataset, scripts and labels. You can find it in the smarts branch.
  • [03-20-2024] Update detail setup bash files.
  • [02-22-2024] Initial code release.

🛠️ Getting Started

Installation on Linux (Ubuntu)

  1. Clone the repo
    git clone https://github.com/thomas475/Clean-Offline-RLHF.git
    cd Clean-Offline-RLHF
  2. Setup Anaconda environment
    conda create -n rlhf python==3.9
    conda activate rlhf
  3. Install Dependencies
    pip install -r requirements.txt
  4. Install hdf5
    conda install anaconda::hdf5
  5. Install Torch
    • For GPU Support (CUDA):

      pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

    • For CPU Only:

      pip install torch==1.13.1+cpu torchvision==0.14.1+cpu torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cpu

MuJoCo

Many of the datasets use MuJoCo as environment, so it should be installed, too. See this for further details.

  1. Download the MuJoCo library:
    wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz
  2. Create the MuJoCo folder:
    mkdir ~/.mujoco
  3. Extract the library to the MuJoCo folder:
    tar -xvf mujoco210-linux-x86_64.tar.gz -C ~/.mujoco/
  4. Add environment variables (run nano ~/.bashrc):
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin
    export MUJOCO_GL=egl
  5. Reload the .bashrc file to register the changes.
    source ~/.bashrc
  6. Install dependencies:
    conda install -c conda-forge patchelf fasteners cython==0.29.37 cffi pyglfw libllvm11 imageio glew glfw mesalib
    sudo apt-get install libglew-dev
  7. Test that the library is installed.
    cd ~/.mujoco/mujoco210/bin
    ./simulate ../model/humanoid.xml

💻 Usage

Human Feedback

Before using offline RLHF algorithm, you should annotate your dataset using human feedback. If you wish to collect labeled dataset to new tasks, we refer to the platform part for crowdsourced annotation. The already collected crowdsourced annotation datasets (~15M steps) are available here.

The processed crowdsourced (CS) and scripted teacher (ST) labels can be found in the crowdsource_human_labels and generated_fake_labels folders, respectively.

Note: for comparison and validation purposes, we provide fast track for scripted teacher (ST) label generation in fast_track/generate_d4rl_fake_labels.py.

Prepare Crowdsourced Data

The exported labels from the Uni-RLHF-Platform have to be transformed into an approprite format first. To do this you can use the following script (replace [dir_path] with the location of the raw labels):

cd scripts
python3 transform_raw_labels.py --data_dir [dir_path]

Pre-train Auxiliary Models

You can configure the training of the auxiliary models (reward model, attribute mapping model, keypoint prediction model) by creating a custom config.yaml file (available parameters can be seen in TrainConfig object in rlhf/train_model.py). To then train these models you run the following command:

cd rlhf
python3 train_model.py --config config.yaml

Train Offline RL with Pre-trained Auxiliary Models

Following the Uni-RLHF codebase implemeration, we modified the IQL, CQL and TD3BC algorithms. Furthermore, we added the DiffusionQL algorithm. You can adjust the details of the experiments in the TrainConfig objects in the algorithm implementations found in /algorithms/offline, as well as in the files in the /config directory.

Example: Train with implicit Q-learning. The log will be uploaded to wandb.

python3 algorithms/offline/iql_p.py

These are the possible variations of algorithms, feedback types, label types, and auxiliary model types:

Algorithm Feedback Type Label Type Auxiliary Model Type
IQL COMPARATIVE CS MLP
CQL ATTRIBUTE ST TFM
TD3BC EVALUATIVE CNN
DIFFUSIONQL KEYPOINT

🏷️ License

Distributed under the MIT License. See LICENSE.txt for more information.

✉️ Contact

For any questions, please feel free to email yuanyf@tju.edu.cn.

📝 Citation

If you find our work useful, please consider citing:

@inproceedings{anonymous2023unirlhf,
    title={Uni-{RLHF}: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback},
    author={Yuan, Yifu and Hao, Jianye and Ma, Yi and Dong, Zibin and Liang, Hebin and Liu, Jinyi and Feng, Zhixin and Zhao, Kai and Zheng, Yan}
    booktitle={The Twelfth International Conference on Learning Representations, ICLR},
    year={2024},
    url={https://openreview.net/forum?id=WesY0H9ghM},
}

About

We extended the Clean-Offline-RLHF benchmarking framework to support learning from attribute, evaluative and keypoint feedback. Furthermore, we added support for learning from multiple feedback types simultaneously. Also, we added the Franka kitchen domain and the Diffusion-QL algorithm for benchmarking.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 88.3%
  • Shell 11.7%