IRLAlignment

This is the codebase to reproduce the experiment in the paper:

From Demonstrations to Rewards: Alignment Without Explicit Human Preferences

Get started

Install the dependencies

pip install -r requirements.txt
pip install poetry

Step 0: Generate Demonstration

Demonstration is generated by a well-trained policy model vwxyzjn/EleutherAI_pythia-6.9b-deduped__ reward__tldr(https://huggingface.co/vwxyzjn/EleutherAI_pythia-6.9b-deduped__reward__tldr/tree/reward__44413__1706651113)

./bash/sft_data_generation.sh

After running this, our demonstration is generated in /generated_data folder. Currently, we use a 2.8B model to generate demonstrations on an A40 GPU to prevent OOM issues. You can configure the model for demonstration generation in sft_data_generation.bash.

Step 1: IRL Training

Then we run

./bash/IRL_Pipeline.sh

Our IRL pipeline includes four steps:

Step 1.1: SFT.

Step 1.2: Generate Demonstration-agent pairs for reward update.

Step 1.3: Reward Update.

Step 1.4: Using PPO to update policy and go back to step 1.2.

Details are in the bash.

Evaluations

We evaluate the proposed IRL method from the quality of the estimated reward models and policy models.

For the reward model, we evaluate the reward accuracy on a hold-out TL;DR preference dataset.
For the policy model, we evaluate the performance of the model according to the reward score from a hold-out 6.9B reward model and also the ChatGPT-evaluted win-rate compared with a high-quality reference dataset generated from a public 6.9B PPO model (vwxyzjn/EleutherAI_pythia-6.9b-deduped__ppo_left_padding_new_nowhiten_reward__tldr)

Acknowledge

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization 🔗

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
bash		bash
summarize_from_feedback_details		summarize_from_feedback_details
LICENSE		LICENSE
README copy.md		README copy.md
README.md		README.md
deepspeed.yaml		deepspeed.yaml
deepspeed_2.yaml		deepspeed_2.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
read		read
release_runs.csv		release_runs.csv
requirements.txt		requirements.txt
visualize_tokens.py		visualize_tokens.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IRLAlignment

Get started

Step 0: Generate Demonstration

Step 1: IRL Training

Step 1.1: SFT.

Step 1.2: Generate Demonstration-agent pairs for reward update.

Step 1.3: Reward Update.

Step 1.4: Using PPO to update policy and go back to step 1.2.

Evaluations

Acknowledge

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

Hong-Lab-UMN-ECE/IRLAlignment

Folders and files

Latest commit

History

Repository files navigation

IRLAlignment

Get started

Step 0: Generate Demonstration

Step 1: IRL Training

Step 1.1: SFT.

Step 1.2: Generate Demonstration-agent pairs for reward update.

Step 1.3: Reward Update.

Step 1.4: Using PPO to update policy and go back to step 1.2.

Evaluations

Acknowledge

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages