Authors: Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang.
Please follow our installation guidance to prepare dependencies.
After downloading and processing the data, place it in the ./data/ directory.
-
For ScanNet and ScanNet200 datasets preprocessing please follow the instruction.
-
We provide the DINO-X features required for training and evaluation, which are available for download on Hugging Face. After downloading, please place them in the
./data/features_2d/directory.
The directory structure after data preparation should be as below:
data
├── features_2d/
│ ├── scannet/
│ ├── scannet200/
├── scannet/
├── scannet200/
├── readme.mdFirst, download our provided checkpoints, and put them at "./checkpoint".
# Select the dataset you want to evaluate in eval.sh manually.
bash scripts/eval.shFor training on ScanNet200, please prepare the pretrained backbone "mask3d_scannet200_aligned.pth" and put it to ./pretrained_backbone before training. The backbone is initialized from Mask3D checkpoint and can be downloaded here.
For training on ScanNet, please prepare the pretrained backbone "aligned_sstnet_scannet.pth" and put it to ./pretrained_backbone before training. The backbone is initialized from SSTNet checkpoint and can be downloaded here.
# Select the dataset used for training in train.sh manually.
bash scripts/train.shWe provide the configuration files and checkpoints for the ScanNet and ScanNet200 benchmarks (validation set), using DINO-X as the 2D detection model to provide 2D features.
| Dataset | mAP | mAP50 | mAP25 | Download |
|---|---|---|---|---|
| ScanNet (val) | 64.0 | 81.5 | 88.9 | model | config |
| ScanNet200 (val) | 40.2 | 52.4 | 58.6 | model | config |
Additionally, our performance on the ScanNet200 hidden test set is shown below:
| Dataset | mAP | mAP50 | mAP25 | Details |
|---|---|---|---|---|
| ScanNet200 (test) | 34.6 | 45.4 | 51.1 | details |
If you find this work helpful for your research, please cite:
@inproceedings{qu2025segdino3d,
title={{SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features}},
author={Qu, Jinyuan and Li, Hongyang and Chen, Xingyu and Liu, Shilong and Shi, Yukai and Ren, Tianhe and Jing, Ruitao and Zhang, Lei},
booktitle={Association for the Advancement of Artificial Intelligence (AAAI)},
year={2026},
}
We would like to thank the authors of the following projects for their excellent work:
- DINO-X - DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
- Grounding DINO - Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
- OneFormer3D - OneFormer3D: One Transformer for Unified Point Cloud Segmentation
- DAB-DETR - DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR
- SPFormer - Superpoint Transformer for 3D Scene Instance Segmentation
- Mask3D - Mask3D: Mask Transformer for 3D Instance Segmentation
- MAFT - Mask-Attention-Free Transformer for 3D Instance Segmentation
- 3DETR - 3DETR: An End-to-End Transformer Model for 3D Object Detection

