Skip to content

Latest commit

 

History

History
124 lines (94 loc) · 4.45 KB

README.md

File metadata and controls

124 lines (94 loc) · 4.45 KB

Text2Reward: Reward Shaping with Language Models for Reinforcement Learning


Code for paper Text2Reward: Reward Shaping with Language Models for Reinforcement Learning. Please refer to our project page for more demonstrations and up-to-date related resources.

Updates

  • 2023-10-09: We released our code.
  • 2023-09-20: We release the paper and website of text2reward.

Dependencies

To establish the environment, run this code in the shell:

# set up conda
conda create -n text2reward python=3.7
conda activate text2reward
# set up ManiSkill2 environment
cd ManiSkill2
pip install -e .
pip install stable-baselines3==1.8.0 wandb tensorboard
cd ..
cd run_maniskill
bash download_data.sh
# set up MetaWorld environment
cd ..
cd Metaworld
pip install -e .
# set up code generation
pip install langchain chromadb==0.4.0

TroubleShooting

  1. If you have not installed mujoco yet, please follow the instructions from here to install it. After that, please try the following commands to confirm the successful installation:
$ python3
>>> import mujoco_py
  1. If you encounter the following errors when running ManiSkill2, we refer you to read the documents here.
    • RuntimeError: vk::Instance::enumeratePhysicalDevices: ErrorInitializationFailed
    • Some required Vulkan extension is not present. You may not use the renderer to render, however, CPU resources will be still available.
    • Segmentation fault (core dumped)

Usage

Reimplement

To reimplement our experiment results, you can run the following scripts:

ManiSkill2:

bash run_oracle.sh
bash run_zero_shot.sh
bash run_few_shot.sh

It's normal to encounter the following warnings:

[svulkan2] [error] GLFW error: X11: The DISPLAY environment variable is missing
[svulkan2] [warning] Continue without GLFW.

MetaWorld:

bash run_oracle.sh
bash run_zero_shot.sh

Generate new reward code

Firstly please add the following environment variable to your .bashrc (or .zshrc, etc.).

export PYTHONPATH=$PYTHONPATH:~/path/to/text2reward

Then navigate to the directory text2reward/code_generation/single_flow and run the following scripts:

# generate reward code for Maniskill
bash run_maniskill_zeroshot.sh
bash run_maniskill_fewshot.sh
# generate reward code for MetaWorld
bash run_metaworld_zeroshot.sh

Run new experiment

By default, the run_oracle.sh script above uses the expert-written rewards provided by the environment; the run_zero_shot.sh and run_few_shot.sh scripts use the generated rewards used in our experiments. If you want to run a new experiment based on the reward you provide, just follow the bash script above and modify the --reward_path parameter to the path of your own reward.

Citation

If you find our work helpful, please cite us:

@inproceedings{xietext2reward,
  title={Text2Reward: Reward Shaping with Language Models for Reinforcement Learning},
  author={Xie, Tianbao and Zhao, Siheng and Wu, Chen Henry and Liu, Yitao and Luo, Qian and Zhong, Victor and Yang, Yanchao and Yu, Tao},
  booktitle={The Twelfth International Conference on Learning Representations}
}

Contributors