Skip to content

Latest commit

 

History

History
171 lines (140 loc) · 6.68 KB

README.md

File metadata and controls

171 lines (140 loc) · 6.68 KB

OrbitGrasp: SE(3)-Equivariant Grasp Learning

Project Website | Paper | Video
Boce Hu1, Xupeng Zhu*1, Dian Wang*1, Zihao Dong*1, Haojie Huang*1, Chenghao Wang*1, Robin Walters1, Robert Platt1 2

1Northeastern Univeristy, 2Boston Dynamics AI Institute
Conference on Robot Learning 2024

Installation

  1. Install Mambaforge or Anaconda

  2. Clone this repo

    git clone git@github.com:BoceHu/orbitgrasp.git
    cd orbitgrasp
  3. Install environment: Use Mambaforge (recommended):

    mamba env create -f conda_environment.yaml
    conda activate orbitgrasp

    or use Anaconda:

    conda env create -f conda_environment.yaml
    conda activate orbitgrasp

    This procedure will take around 10 minutes to install all dependencies.(Since the torch cluster, geometric, and scatter are not available in the conda-forge channel, we need to install it from the source code, which may take a while.)

  4. Check the Segment Anything website to download the checkpoint. In our implementation, we use the vit-h as the backbone, and the checkpoint is available here. Move the downloaded checkpoint to the ./pretrained folder.

Dataset

We have two camera settings and two ways to collect the data. The camera settings are single and multi, and two ways are w/ mask and w/o mask.

For data collection w/ mask: To accelerate the data collection speed by multi-processing, we first use SAM to generate and save masks of all scenes and then directly use them for later data collection. (Thus, we don't need to load SAM in each data collection process.)

  1. The scene can be changed to pile or packed. And the sample_num indicates the num of scenes.

    e.g., (this is for single camera setting)

     python ./dataset/generate_raw_pose_mask_single.py --scene='pile' --sample_num=2500

    If you want to use the multi-camera setting, you can use the following command:

     python ./dataset/generate_raw_pose_mask_multi.py --scene='pile' --sample_num=800

    This procedure can be ignored if you don't need masks to generate grasp poses. Here, masks are used to provide semantic and object-centric information for the grasp pose generation.

  2. After generating the masks, you can start to collect the data. The scene can be changed to pile or packed. from_save=True indicates that we use the saved masks for data collection (Since we save the environment config in the last step, we can generate exactly the same scene as the scene used for generating masks). start_scene indicates the starting scene number from saved envs, and iteration_num indicates the number of scenes to generate grasp poses.

    e.g., if start_scene=100 and iteration_num=40, the scenes generated will be from 100 to 140.

     python ./dataset/generate_pose_single.py --scene='pile' --from_save=True --GUI=False --start_scene=0 --iteration_num=40

    If you want to use the multi-camera setting, you can use the following command:

     python ./dataset/generate_pose_multi.py --scene='pile' --from_save=True --GUI=False --start_scene=0 --iteration_num=40

    If you want to use the data collection w/o mask, you can use the following command:

     python ./dataset/generate_pose_single_wo_mask.py --scene='pile' --GUI=False --start_scene=0 --iteration_num=40

    To accelerate this procedure, you can open multiple terminals and run the same file (with different start_scene) in each terminal to collect data in parallel. For example,

     python ./dataset/generate_pose_single.py --scene='pile' --from_save=True --GUI=False --start_scene=0 --iteration_num=50

    ...

     python ./dataset/generate_pose_single.py --scene='pile' --from_save=True --GUI=False --start_scene=1000 --iteration_num=50

    In our implementation, we used 20 processes to collect data in parallel.

  3. Filter the dataset to remove unreachable poses.

     python ./dataset/filter_dataset.py --scene='pile' --camera='single'
  4. Split the dataset into train and test sets.

     python ./dataset/split_dataset.py --camera_setting='single' --train_ratio=0.8

    You can change the saved dir and root dir by --output_dir and --root_dir.

Training

Before training, we pre-calculate the harmonics of the test set to avoid repeated calculations during training. (Since we use data augmentation for the training set, this pre-calculation can only be used for the test set.)

python ./scripts/save_harmonics.py --camera_setting='single' --use_mask=True

Then, you can start to train the model.

python ./scripts/train_single.py
python ./scripts/train_multi.py

The training configuration is saved in ./scripts/single_config.yaml and ./scripts/multi_config.yaml. You can change the training configuration in this file.

Testing

After training, you can test the model by running the following command.

  • NOTE We provide the pre-trained model for the single-camera setting. You can find it in ./scripts/output/store/single/.
python ./scripts/test_grasp_single.py
python ./scripts/test_grasp_multi.py

Remember to load the correct checkpoint in the ./scripts/single_config.yaml and ./scripts/multi_config.yaml.

License

This repository is released under the MIT license. See LICENSE for additional details.

Acknowledgement