Dingkang Liang1 *, Xin Zhou1 *, Xinyu Wang1 *, Xingkui Zhu1 , Wei Xu1, Zhikang Zou2, Xiaoqing Ye2, and Xiang Bai1
1 Huazhong University of Science & Technology, 2 Baidu Inc.
(*) equal contribution
- [16/Mar/2024] The configurations and checkpoints for ModelNet40 are now accessible, check it out!
- [05/Mar/2024] Our paper DAPT (github) has been accepted by CVPR 2024! 🥳🥳🥳 Check it out and give it a star 🌟!
Transformers have become one of the foundational architectures in point cloud analysis tasks due to their excellent global modeling ability. However, the attention mechanism has quadratic complexity and is difficult to extend to long sequence modeling due to limited computational resources and so on. Recently, state space models (SSM), a new family of deep sequence models, have presented great potential for sequence modeling in NLP tasks. In this paper, taking inspiration from the success of SSM in NLP, we propose PointMamba, a framework with global modeling and linear complexity. Specifically, by taking embedded point patches as input, we proposed a reordering strategy to enhance SSM's global modeling ability by providing a more logical geometric scanning order. The reordered point tokens are then sent to a series of Mamba blocks to causally capture the point cloud structure. Experimental results show our proposed PointMamba outperforms the transformer-based counterparts on different point cloud analysis datasets, while significantly saving about 44.3% parameters and 25% FLOPs, demonstrating the potential option for constructing foundational 3D vision models. We hope our PointMamba can provide a new perspective for point cloud analysis.
Task | Dataset | Config | Acc.(Scratch) | Download (Scratch) | Acc.(pre-train) | Download (Fine-tune) |
---|---|---|---|---|---|---|
Pre-training | ShapeNet | pretrain.yaml | N.A. | here | ||
Classification | ModelNet40 | finetune_modelnet.yaml | 92.4% | here | 93.6% | here |
Classification | ScanObjectNN | finetune_scan_objbg.yaml | 88.30% | here | 90.71% | here |
Classification | ScanObjectNN | finetune_scan_objonly.yaml | 87.78% | here | 88.47% | here |
Classification | ScanObjectNN | finetune_scan_hardest.yaml | 82.48% | here | 84.87% | here |
Part Segmentation | ShapeNetPart | part segmentation | 85.8% mIoU | here | 86.0% mIoU | here |
This codebase was tested with the following environment configurations. It may work with other versions.
- Ubuntu 20.04
- CUDA 11.7
- Python 3.9
- PyTorch 1.13.1 + cu117
We recommend using Anaconda for the installation process:
# Create virtual env and install PyTorch
$ conda create -n pointmamba python=3.9
$ conda activate pointmamba
(pointmamba) $ pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
# Install basic required packages
(pointmamba) $ pip install -r requirements.txt
# Chamfer Distance & emd
(pointmamba) $ cd ./extensions/chamfer_dist && python setup.py install --user
(pointmamba) $ cd ./extensions/emd && python setup.py install --user
# PointNet++
(pointmamba) $ pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
# GPU kNN
(pointmamba) $ pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl
# Mamba
(pointmamba) $ pip install causal-conv1d==1.1.1
(pointmamba) $ pip install mamba-ssm==1.1.1
See DATASET.md for details.
CUDA_VISIBLE_DEVICES=<GPU> python main.py --config cfgs/pretrain.yaml --exp_name <name>
Training from scratch.
CUDA_VISIBLE_DEVICES=<GPU> python main.py --scratch_model --config cfgs/finetune_modelnet.yaml --exp_name <name>
Training from pre-training.
CUDA_VISIBLE_DEVICES=<GPU> python main.py --finetune_model --config cfgs/finetune_modelnet.yaml --ckpts <path/to/pre-trained/model> --exp_name <name>
Training from scratch.
CUDA_VISIBLE_DEVICES=<GPU> python main.py --scratch_model --config cfgs/finetune_scan_objbg.yaml --exp_name <name>
Training from pre-training.
CUDA_VISIBLE_DEVICES=<GPU> python main.py --finetune_model --config cfgs/finetune_scan_objbg.yaml --ckpts <path/to/pre-trained/model> --exp_name <name>
Training from scratch.
cd part_segmentation
CUDA_VISIBLE_DEVICES=<GPU> python main.py --config cfgs/config.yaml --log_dir <name>
Training from pre-training.
cd part_segmentation
CUDA_VISIBLE_DEVICES=<GPU> python main.py --config cfgs/config.yaml --ckpts <path/to/pre-trained/model> --log_dir <name>
- Release code.
- Release checkpoints.
- ModelNet40.
- Semantic segmentation.
This project is based on Point-BERT (paper, code), Point-MAE (paper, code), Mamba (paper, code), Causal-Conv1d (code). Thanks for their wonderful works.
If you find this repository useful in your research, please consider giving a star ⭐ and a citation
@article{liang2024pointmamba,
title={PointMamba: A Simple State Space Model for Point Cloud Analysis},
author={Dingkang Liang and Xin Zhou and Xinyu Wang and Xingkui Zhu and Wei Xu and Zhikang Zou and Xiaoqing Ye and Xiang Bai},
journal={arXiv preprint arXiv:2402.10739},
year={2024}
}