Skip to content

[AAAI 2026] Official implementation of the paper ”SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features“

License

Notifications You must be signed in to change notification settings

IDEA-Research/SegDINO3D

Repository files navigation

SegDINO3D logo

SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features

Authors: Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang.

Paper Project Page Dataset Code

Installation

Please follow our installation guidance to prepare dependencies. After downloading and processing the data, place it in the ./data/ directory.

Data Preparation

  1. For ScanNet and ScanNet200 datasets preprocessing please follow the instruction.

  2. We provide the DINO-X features required for training and evaluation, which are available for download on Hugging Face. After downloading, please place them in the ./data/features_2d/ directory.

The directory structure after data preparation should be as below:

data
├── features_2d/
│   ├── scannet/
│   ├── scannet200/
├── scannet/
├── scannet200/
├── readme.md

Evaluation

First, download our provided checkpoints, and put them at "./checkpoint".

# Select the dataset you want to evaluate in eval.sh manually.
bash scripts/eval.sh

Training

For training on ScanNet200, please prepare the pretrained backbone "mask3d_scannet200_aligned.pth" and put it to ./pretrained_backbone before training. The backbone is initialized from Mask3D checkpoint and can be downloaded here.

For training on ScanNet, please prepare the pretrained backbone "aligned_sstnet_scannet.pth" and put it to ./pretrained_backbone before training. The backbone is initialized from SSTNet checkpoint and can be downloaded here.

# Select the dataset used for training in train.sh manually.
bash scripts/train.sh

Models

We provide the configuration files and checkpoints for the ScanNet and ScanNet200 benchmarks (validation set), using DINO-X as the 2D detection model to provide 2D features.

Dataset mAP mAP50 mAP25 Download
ScanNet (val) 64.0 81.5 88.9 model | config
ScanNet200 (val) 40.2 52.4 58.6 model | config

Additionally, our performance on the ScanNet200 hidden test set is shown below:

Dataset mAP mAP50 mAP25 Details
ScanNet200 (test) 34.6 45.4 51.1 details

Qualitative Performance

Ciatation

If you find this work helpful for your research, please cite:

@inproceedings{qu2025segdino3d,
  title={{SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features}},
  author={Qu, Jinyuan and Li, Hongyang and Chen, Xingyu and Liu, Shilong and Shi, Yukai and Ren, Tianhe and Jing, Ruitao and Zhang, Lei},
  booktitle={Association for the Advancement of Artificial Intelligence (AAAI)},
  year={2026},
}

Acknowledgement

We would like to thank the authors of the following projects for their excellent work:

  • DINO-X - DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
  • Grounding DINO - Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
  • OneFormer3D - OneFormer3D: One Transformer for Unified Point Cloud Segmentation
  • DAB-DETR - DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR
  • SPFormer - Superpoint Transformer for 3D Scene Instance Segmentation
  • Mask3D - Mask3D: Mask Transformer for 3D Instance Segmentation
  • MAFT - Mask-Attention-Free Transformer for 3D Instance Segmentation
  • 3DETR - 3DETR: An End-to-End Transformer Model for 3D Object Detection

About

[AAAI 2026] Official implementation of the paper ”SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features“

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published