OrbitGrasp: SE(3)-Equivariant Grasp Learning

Project Website | Paper | Video
Boce Hu¹, Xupeng Zhu^*1, Dian Wang^*1, Zihao Dong^*1, Haojie Huang^*1, Chenghao Wang^*1, Robin Walters¹, Robert Platt^{1 2}

¹Northeastern Univeristy, ²Boston Dynamics AI Institute
Conference on Robot Learning 2024

Installation

Install Mambaforge or Anaconda

Clone this repo

git clone git@github.com:BoceHu/orbitgrasp.git
cd orbitgrasp

Install environment: Use Mambaforge (recommended):
```
mamba env create -f conda_environment.yaml
conda activate orbitgrasp
```
or use Anaconda:
```
conda env create -f conda_environment.yaml
conda activate orbitgrasp
```
This procedure will take around 10 minutes to install all dependencies.(Since the torch cluster, geometric, and scatter are not available in the conda-forge channel, we need to install it from the source code, which may take a while.)
Check the Segment Anything website to download the checkpoint. In our implementation, we use the vit-h as the backbone, and the checkpoint is available here. Move the downloaded checkpoint to the ./pretrained folder.

Dataset

We have two camera settings and two ways to collect the data. The camera settings are single and multi, and two ways are w/ mask and w/o mask.

For data collection w/ mask: To accelerate the data collection speed by multi-processing, we first use SAM to generate and save masks of all scenes and then directly use them for later data collection. (Thus, we don't need to load SAM in each data collection process.)

The scene can be changed to pile or packed. And the sample_num indicates the num of scenes.

e.g., (this is for single camera setting)
```
 python ./dataset/generate_raw_pose_mask_single.py --scene='pile' --sample_num=2500
```
If you want to use the multi-camera setting, you can use the following command:
```
 python ./dataset/generate_raw_pose_mask_multi.py --scene='pile' --sample_num=800
```
This procedure can be ignored if you don't need masks to generate grasp poses. Here, masks are used to provide semantic and object-centric information for the grasp pose generation.
After generating the masks, you can start to collect the data. The scene can be changed to pile or packed. from_save=True indicates that we use the saved masks for data collection (Since we save the environment config in the last step, we can generate exactly the same scene as the scene used for generating masks). start_scene indicates the starting scene number from saved envs, and iteration_num indicates the number of scenes to generate grasp poses.

e.g., if start_scene=100 and iteration_num=40, the scenes generated will be from 100 to 140.
```
 python ./dataset/generate_pose_single.py --scene='pile' --from_save=True --GUI=False --start_scene=0 --iteration_num=40
```
If you want to use the multi-camera setting, you can use the following command:
```
 python ./dataset/generate_pose_multi.py --scene='pile' --from_save=True --GUI=False --start_scene=0 --iteration_num=40
```
If you want to use the data collection w/o mask, you can use the following command:
```
 python ./dataset/generate_pose_single_wo_mask.py --scene='pile' --GUI=False --start_scene=0 --iteration_num=40
```
To accelerate this procedure, you can open multiple terminals and run the same file (with different start_scene) in each terminal to collect data in parallel. For example,
```
 python ./dataset/generate_pose_single.py --scene='pile' --from_save=True --GUI=False --start_scene=0 --iteration_num=50
```
...
```
 python ./dataset/generate_pose_single.py --scene='pile' --from_save=True --GUI=False --start_scene=1000 --iteration_num=50
```
In our implementation, we used 20 processes to collect data in parallel.

Filter the dataset to remove unreachable poses.

 python ./dataset/filter_dataset.py --scene='pile' --camera='single'

Split the dataset into train and test sets.
```
 python ./dataset/split_dataset.py --camera_setting='single' --train_ratio=0.8
```
You can change the saved dir and root dir by --output_dir and --root_dir.

Training

Before training, we pre-calculate the harmonics of the test set to avoid repeated calculations during training. (Since we use data augmentation for the training set, this pre-calculation can only be used for the test set.)

python ./scripts/save_harmonics.py --camera_setting='single' --use_mask=True

Then, you can start to train the model.

python ./scripts/train_single.py

python ./scripts/train_multi.py

The training configuration is saved in ./scripts/single_config.yaml and ./scripts/multi_config.yaml. You can change the training configuration in this file.

Testing

After training, you can test the model by running the following command.

NOTE We provide the pre-trained model for the single-camera setting. You can find it in ./scripts/output/store/single/.

python ./scripts/test_grasp_single.py

python ./scripts/test_grasp_multi.py

Remember to load the correct checkpoint in the ./scripts/single_config.yaml and ./scripts/multi_config.yaml.

License

This repository is released under the MIT license. See LICENSE for additional details.

Acknowledgement

Our repo is built upon the GIGA and EdgeGrasp.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

OrbitGrasp: SE(3)-Equivariant Grasp Learning

Installation

Dataset

Training

Testing

License

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

OrbitGrasp: SE(3)-Equivariant Grasp Learning

Installation

Dataset

Training

Testing

License

Acknowledgement