Rethinking Masked Representation Learning for 3D Point Cloud Understanding, TIP 2024
In this work, we rethink grouping strategies and pretext tasks that are more suitable for self-supervised point cloud representation learning and propose a novel hierarchical masked representation learning method, including an optimal transport-based hierarchical grouping strategy, a prototype-based part modeling module, and a hierarchical attention encoder. The proposed method enjoys several merits. First, the proposed grouping strategy partitions the point cloud into non-overlapping groups, eliminating the early leakage of structural information in the masked groups. Second, the proposed prototype-based part modeling module dynamically models different object components, ensuring feature consistency on parts with the same semantics.
The code will be open source soon
We use ShapeNet, ScanObjectNN, ModelNet40 and ShapeNetPart in this work. See DATASET.md for details.
| Task | Dataset | Config | Acc. | Download |
|---|---|---|---|---|
| Pre-training | ShapeNet | pretrain.yaml | N.A. | todo |
| Classification | ScanObjectNN | finetune_scan_hardest.yaml | 89.0% | todo |
| Classification | ScanObjectNN | finetune_scan_objbg.yaml | 92.9% | todo |
| Classification | ScanObjectNN | finetune_scan_objonly.yaml | 92.3% | todo |
| Classification | ModelNet40(1k) | finetune_modelnet.yaml | 94.5% | todo |
| Part segmentation | ShapeNetPart | segmentation | 86.8% mIoU_i | todo |
| Part segmentation | ShapeNetPart | segmentation | 85.1% mIoU_c | todo |
| Task | Dataset | Config | 5w10s Acc. (%) | 5w20s Acc. (%) | 10w10s Acc. (%) | 10w20s Acc. (%) |
|---|---|---|---|---|---|---|
| Few-shot learning | ModelNet40 | fewshot.yaml | 97.2 ± 2.3 | 98.7 ± 1.2 | 93.2 ± 3.4 | 95.6 ± 2.6 |
@ARTICLE{10815033,
author={Wang, Chuxin and Zha, Yixin and He, Jianfeng and Yang, Wenfei and Zhang, Tianzhu},
journal={IEEE Transactions on Image Processing},
title={Rethinking Masked Representation Learning for 3D Point Cloud Understanding},
year={2025},
volume={34},
number={},
pages={247-262},
keywords={Point cloud compression;Semantics;Feature extraction;Three-dimensional displays;Representation learning;Solid modeling;Prototypes;Shape;Nearest neighbor methods;Image reconstruction;Self-supervised point cloud representation learning;optimal transport;and part modeling},
doi={10.1109/TIP.2024.3520008}
}
