Project Website | Paper | Video
Boce Hu1,
Xupeng Zhu*1,
Dian Wang*1,
Zihao Dong*1,
Haojie Huang*1,
Chenghao Wang*1,
Robin Walters1,
Robert Platt1 2
1Northeastern Univeristy, 2Boston Dynamics AI Institute
Conference on Robot Learning 2024
-
Install Mambaforge or Anaconda
-
Clone this repo
git clone git@github.com:BoceHu/orbitgrasp.git cd orbitgrasp
-
Install environment: Use Mambaforge (recommended):
mamba env create -f conda_environment.yaml conda activate orbitgrasp
or use Anaconda:
conda env create -f conda_environment.yaml conda activate orbitgrasp
This procedure will take around 10 minutes to install all dependencies.(Since the torch cluster, geometric, and scatter are not available in the conda-forge channel, we need to install it from the source code, which may take a while.)
-
Check the Segment Anything website to download the checkpoint. In our implementation, we use the vit-h as the backbone, and the checkpoint is available here. Move the downloaded checkpoint to the
./pretrained
folder.
We have two camera settings and two ways to collect the data. The camera settings are single
and multi
, and two ways
are w/ mask and w/o mask.
For data collection w/ mask: To accelerate the data collection speed by multi-processing, we first use SAM to generate and save masks of all scenes and then directly use them for later data collection. (Thus, we don't need to load SAM in each data collection process.)
-
The scene can be changed to
pile
orpacked
. And the sample_num indicates the num of scenes.e.g., (this is for single camera setting)
python ./dataset/generate_raw_pose_mask_single.py --scene='pile' --sample_num=2500
If you want to use the multi-camera setting, you can use the following command:
python ./dataset/generate_raw_pose_mask_multi.py --scene='pile' --sample_num=800
This procedure can be ignored if you don't need masks to generate grasp poses. Here, masks are used to provide semantic and object-centric information for the grasp pose generation.
-
After generating the masks, you can start to collect the data. The scene can be changed to
pile
orpacked
.from_save=True
indicates that we use the saved masks for data collection (Since we save the environment config in the last step, we can generate exactly the same scene as the scene used for generating masks).start_scene
indicates the starting scene number from saved envs, anditeration_num
indicates the number of scenes to generate grasp poses.e.g., if
start_scene=100
anditeration_num=40
, the scenes generated will be from 100 to 140.python ./dataset/generate_pose_single.py --scene='pile' --from_save=True --GUI=False --start_scene=0 --iteration_num=40
If you want to use the multi-camera setting, you can use the following command:
python ./dataset/generate_pose_multi.py --scene='pile' --from_save=True --GUI=False --start_scene=0 --iteration_num=40
If you want to use the data collection w/o mask, you can use the following command:
python ./dataset/generate_pose_single_wo_mask.py --scene='pile' --GUI=False --start_scene=0 --iteration_num=40
To accelerate this procedure, you can open multiple terminals and run the same file (with different
start_scene
) in each terminal to collect data in parallel. For example,python ./dataset/generate_pose_single.py --scene='pile' --from_save=True --GUI=False --start_scene=0 --iteration_num=50
...
python ./dataset/generate_pose_single.py --scene='pile' --from_save=True --GUI=False --start_scene=1000 --iteration_num=50
In our implementation, we used 20 processes to collect data in parallel.
-
Filter the dataset to remove unreachable poses.
python ./dataset/filter_dataset.py --scene='pile' --camera='single'
-
Split the dataset into train and test sets.
python ./dataset/split_dataset.py --camera_setting='single' --train_ratio=0.8
You can change the saved dir and root dir by
--output_dir
and--root_dir
.
Before training, we pre-calculate the harmonics of the test set to avoid repeated calculations during training. (Since we use data augmentation for the training set, this pre-calculation can only be used for the test set.)
python ./scripts/save_harmonics.py --camera_setting='single' --use_mask=True
Then, you can start to train the model.
python ./scripts/train_single.py
python ./scripts/train_multi.py
The training configuration is saved in ./scripts/single_config.yaml
and ./scripts/multi_config.yaml
.
You can change the training configuration in this file.
After training, you can test the model by running the following command.
- NOTE We provide the pre-trained model for the single-camera setting. You can find it in
./scripts/output/store/single/
.
python ./scripts/test_grasp_single.py
python ./scripts/test_grasp_multi.py
Remember to load the correct checkpoint in the ./scripts/single_config.yaml
and ./scripts/multi_config.yaml
.
This repository is released under the MIT license. See LICENSE for additional details.