The official repository for the paper "Authorship Style Transfer with Policy Optimization".
Commends for enviroment setup with conda.
conda create --name astrapop python=3.8
conda activate astrapop
pip install -U pip
pip install -r requirements.txtPlease download the original Reddit Million User Dataset (MUD) from here and the original ETS Corpus of Non-Native Written English from here. We will publish the data preprocessing code soon.
To reproduce the results on the Reddit dataset, please run the scirpts in scripts/reddit following the procedure below.
- Train the paraphrase model and the reference SFT model by running
00_train_paraphraser.shand00_train_sft.sh. - Generate the data for DPO and CPO training by running
01_generate_dpo_cpo_data.sh. - Train the PO models using PPO/DPO/CPO by running
02_train_ppo.sh/02_train_dpo.sh/02_train_cpo.sh. - Transfer the texts in the test set by running
03_generate.sh.
To reproduce the results on the ETS dataset, please run the scirpts in scripts/ets.
- Train the style reward model, the paraphrase model, and the reference SFT model by running
00_train_cls.sh,00_train_paraphraser.sh, and00_train_sft.sh. - Generate the data for DPO and CPO training by running
01_generate_dpo_cpo_data.sh. - Train the PO models using PPO/DPO/CPO by running
02_train_ppo.sh/02_train_dpo.sh/02_train_cpo.sh. - Transfer the texts in the test set by running
03_generate.sh.