Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception

This is official implementation of our CVPR 2024 paper "Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception"

paper

Updates

[2024-05-15] We are preparing the code and expect to release it before June 19.
[2024-06-11] Initialize the release code.

Requirements

python=3.9
pytorch=2.1.0
lightning=2.1.0

conda create -n py39_pyt210_cu118 python==3.9 -y
conda activate py39_pyt210_cu118

# install pytorch==2.1.0
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia -y
or
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118

# install MinkowskiEngine following https://github.com/NVIDIA/MinkowskiEngine
# for example:
pip install ninja
git clone https://github.com/NVIDIA/MinkowskiEngine
cd MinkowskiEngine
python setup.py install --blas_include_dirs=${CONDA_PREFIX}/include --blas=openblas

# install torch-scatter
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.1.0+cu118.html

pip install -r requirements.txt

Datasets

Download nuScenes dataset from the official link, and put the dataset in {project_root}/datasets/nuscenes
Download the superpixels superpixels_dinov2_ade20k.zip from the BAIDU, and unzip the file under {project_root}/superpixels/nuscenes

the project structure should be like:

{project_root}
|--config
|--downstream
|--model
|--pretrain
|--utils
|--datasets
  |--nuscenes
    |--samples
    |--sweeps
    |--lidarseg
    |--nuScenes-panoptic-v1.0-all
    |--v1.0-trainval
|--superpixels
  |--nuscenes
    |--superpixels_dinov2_ade20k
|--...

Experiments

3D Semantic Segmentation

# 1. pre-train the 3d backbone MinkUNet
CUDA_VISIBLE_DEVICES=0,1 python pretrain_cluster_prototype.py --cfg config/pretrain/csc_minkunet_dinov2_g2b16.yaml
# the {pretrain_weights_path}   will be found in `{project_root}/output/pretrain/nuscenes/cp/v1_1/{year}_{month}_{day}_{hour}_{minute}/final_model_cp_v1_1.pt`

# 2. fine-tune the 3d backbone using our provided script
sh downstream_semseg_finetune.sh 0,1 {pretrain_weights_path} csc_sem_seg

3D Object Detection

#1. pre-train the 3D backbone VoxelNet
CUDA_VISIBLE_DEVICES=0,1 python pretrain_cluster_prototype.py --cfg  config/pretrain/csc_voxelnet_dinov2_g2b16.yaml

#2. fine-tune the VoxelNet using OpenPCDet, https://github.com/open-mmlab/OpenPCDet. 
# Please refer to the TriCC https://openaccess.thecvf.com/content/CVPR2023/html/Pang_Unsupervised_3D_Point_Cloud_Representation_Learning_by_Triangle_Constrained_Contrast_CVPR_2023_paper.html

3D Panoptic Segmentation

# 1. pre-train the 3d backbone Cylinder3D
CUDA_VISIBLE_DEVICES=0,1 python  pretrain_cluster_prototype.py --cfg_file config/pretrain/csc_cylinder3d_dinov2_g2b16.yaml
# the pre-training weights will be found in `{project_root}/output/pretrain/cp_V1_1/panoptic_polarnet_cylinder3d/dinov2_ade20k/{year}_{month}_{day}_{hour}_{minute}/model.pt`

# 2. fine-tune the 3d backbone using our provided script
sh downstream_panseg_finetune.sh 0,1 {pretrain_weights_path} csc_pan_seg

Acknowledgement

The codebase is adapted from SLidR.

Citation

@InProceedings{chen2024building,
   title={Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception},
   author={Chen, Haoming and Zhang, Zhizhong and Qu, Yanyun and Zhang, Ruixin and Tan, Xin and Xie, Yuan},
   booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
   month = {June},
   year= {2024}
}

Name	Name	Last commit message	Last commit date
Latest commit chenhaomingbob first commit Jun 11, 2024 50a7869 · Jun 11, 2024 History 6 Commits
config	config	first commit	Jun 11, 2024
downstream	downstream	first commit	Jun 11, 2024
model	model	first commit	Jun 11, 2024
preprocss	preprocss	first commit	Jun 11, 2024
pretrain	pretrain	first commit	Jun 11, 2024
utils	utils	first commit	Jun 11, 2024
.gitignore	.gitignore	first commit	Jun 11, 2024
LICENSE	LICENSE	first commit	Jun 11, 2024
README.md	README.md	first commit	Jun 11, 2024
downstream_panoptic.py	downstream_panoptic.py	first commit	Jun 11, 2024
downstream_panseg_finetune.sh	downstream_panseg_finetune.sh	first commit	Jun 11, 2024
downstream_semantic.py	downstream_semantic.py	first commit	Jun 11, 2024
downstream_semseg_finetune.sh	downstream_semseg_finetune.sh	first commit	Jun 11, 2024
evaluate_downstream.py	evaluate_downstream.py	first commit	Jun 11, 2024
pretrain_cluster_prototype.py	pretrain_cluster_prototype.py	first commit	Jun 11, 2024
requirements.txt	requirements.txt	first commit	Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception

Updates

Requirements

Datasets

Experiments

3D Semantic Segmentation

3D Object Detection

3D Panoptic Segmentation

Acknowledgement

Citation

About

Releases

Packages

Languages

License

chenhaomingbob/CSC

Folders and files

Latest commit

History

Repository files navigation

Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception

Updates

Requirements

Datasets

Experiments

3D Semantic Segmentation

3D Object Detection

3D Panoptic Segmentation

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages