CVPR, 2024
Xiangtai Li
·
Haobo Yuan
.
Wei Li
.
Henghui Ding
·
Size Wu
·
Wenwei Zhang
·
Yining Li
.
Kai Chen
.
Chen Change Loy
S-Lab, MMlab@NTU, Shanghai AI Laboratory
Xiangtai is the project leader and corresponding author.
In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models. We propose OMG-Seg, One Model that is Good enough to efficiently and effectively handle all the Segmentation tasks, including image semantic, instance, and panoptic segmentation, as well as their video counterparts, open vocabulary settings, prompt-driven, interactive segmentation like SAM, and video object segmentation. To our knowledge, this is the first model to fill all these tasks in one model and achieve good enough performance.
We show that OMG-Seg, a transformer-based encoder-decoder architecture with task-specific queries and outputs, can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead across various tasks and datasets. We rigorously evaluate the inter-task influences and correlations during co-training. Both the code and models will be publicly available.
Short introduction on VALSE of OMG-Seg with other related work, can be found here, in Chinese.
See DATASET.md
Our codebase is built with MMdetection-3.0 tools.
See INSTALL.md
-
First set up the dataset and environment. Make sure you have fixed and corresponding versions.
-
Download pre-trained CLIP backbone. The scripts will automatically download the pre-trained CLIP models.
-
Generate CLIP text embedding for each dataset and joint merged dataset for co-training. See the embedding generation.
-
Run the train/test scripts below to carry out experiments on model training and testing.
See the configs under seg/configs/m2ov_train.
./tools/dist.sh train seg/configs/m2ov_train/omg_convl_vlm_fix_12e_ov_coco_vid_yt19_vip_city_cocopansam.py 8 --checkpoint pre_trained_model_path
Note that you can also use CLIP pre-trained models, by running the following command.
./tools/dist.sh train seg/configs/m2ov_train/omg_convl_vlm_fix_12e_ov_coco_vid_yt19_vip_city_cocopansam.py 8
We adopt slurm to train our model with 32 A100 GPUS.
PARTITION=YOUR_PARTITION JOB_NAME=YOUR_JOB_NAME GPUS=32 GPUS_PER_NODE=8 ./tools/slurm.sh train seg/configs/m2ov_train/omg_convl_vlm_fix_12e_ov_coco_vid_yt19_vip_city_cocopansam.py
Run the visualization scripts on COCO
./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_coco.py 1 --checkpoint model_path --show-dir vis
Run the visualization scripts on VIPSeg
./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_vipseg.py 1 --checkpoint model_path --show-dir vis
The color maps are dumped in the sub-folder vis in work_dir.
See the configs under seg/configs/m2ov_val. Make sure you have set up the classification embeddings for testing.
Test Cityscape dataset, we observe 0.3% noises for Cityscapes panoptic segmentation
./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_cityscapes.py 4 --checkpoint model_path
Test COCO dataset, we observe 0.5% noises for COCO panoptic segmentation
./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_coco.py 4 --checkpoint model_path
Test Open-Vocabulary ADE dataset, we observe 0.8% noises for COCO panoptic segmentation
./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_ade.py 4 --checkpoint model_path
Test Interactive COCO segmentation:
./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_ov_coco_pan_point.py 4 --checkpoint model_path
Test Youtube-VIS-19 dataset
./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_y19.py 4 --checkpoint model_path
Test VIP-Seg dataset
./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_vipseg.py 1 --checkpoint model_path
ConvNeXt-large backbone. model
ConvNeXt-XX-large backbone. model
The Object-365 pretrained models can be found here.
Using one machine to re-run our codebase, ConvNeXt-large backbone. model, log.
If you think OMG-Seg codebase and models are useful for your research, please consider referring us:
@inproceedings{OMGSeg,
author = {Xiangtai Li and
Haobo Yuan and
Wei Li and
Henghui Ding and
Size Wu and
Wenwei Zhang and
Yining Li and
Kai Chen and
Chen Change Loy},
title = {OMG-Seg: Is One Model Good Enough For All Segmentation?},
booktitle={CVPR},
year={2024}
}
S-Lab LICENSE.