Skip to content

Latest commit

 

History

History
203 lines (136 loc) · 7.25 KB

OMG_Seg_README.md

File metadata and controls

203 lines (136 loc) · 7.25 KB

OMG-Seg: Is One Model Good Enough For All Segmentation?

CVPR, 2024
Xiangtai Li · Haobo Yuan . Wei Li . Henghui Ding · Size Wu · Wenwei Zhang ·
Yining Li . Kai Chen . Chen Change Loy

S-Lab, MMlab@NTU, Shanghai AI Laboratory

Xiangtai is the project leader and corresponding author.

arXiv PDF Project Page Project Page HuggingFace Model


avatar

Short Introduction

In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models. We propose OMG-Seg, One Model that is Good enough to efficiently and effectively handle all the Segmentation tasks, including image semantic, instance, and panoptic segmentation, as well as their video counterparts, open vocabulary settings, prompt-driven, interactive segmentation like SAM, and video object segmentation. To our knowledge, this is the first model to fill all these tasks in one model and achieve good enough performance.

We show that OMG-Seg, a transformer-based encoder-decoder architecture with task-specific queries and outputs, can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead across various tasks and datasets. We rigorously evaluate the inter-task influences and correlations during co-training. Both the code and models will be publicly available.

Short introduction on VALSE of OMG-Seg with other related work, can be found here, in Chinese.

Experiment Set Up

Dataset

See DATASET.md

Install

Our codebase is built with MMdetection-3.0 tools.

See INSTALL.md

Quick Start

Experiment Preparation

  1. First set up the dataset and environment. Make sure you have fixed and corresponding versions.

  2. Download pre-trained CLIP backbone. The scripts will automatically download the pre-trained CLIP models.

  3. Generate CLIP text embedding for each dataset and joint merged dataset for co-training. See the embedding generation.

  4. Run the train/test scripts below to carry out experiments on model training and testing.

Train

See the configs under seg/configs/m2ov_train.

./tools/dist.sh train seg/configs/m2ov_train/omg_convl_vlm_fix_12e_ov_coco_vid_yt19_vip_city_cocopansam.py  8 --checkpoint pre_trained_model_path

Note that you can also use CLIP pre-trained models, by running the following command.

./tools/dist.sh train seg/configs/m2ov_train/omg_convl_vlm_fix_12e_ov_coco_vid_yt19_vip_city_cocopansam.py  8 

We adopt slurm to train our model with 32 A100 GPUS.

PARTITION=YOUR_PARTITION JOB_NAME=YOUR_JOB_NAME GPUS=32 GPUS_PER_NODE=8 ./tools/slurm.sh train seg/configs/m2ov_train/omg_convl_vlm_fix_12e_ov_coco_vid_yt19_vip_city_cocopansam.py 

Demo Scripts

Run the visualization scripts on COCO

./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_coco.py 1 --checkpoint model_path --show-dir vis

Run the visualization scripts on VIPSeg

./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_vipseg.py 1 --checkpoint model_path --show-dir vis

The color maps are dumped in the sub-folder vis in work_dir.

Test

See the configs under seg/configs/m2ov_val. Make sure you have set up the classification embeddings for testing.

Test Cityscape dataset, we observe 0.3% noises for Cityscapes panoptic segmentation

./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_cityscapes.py 4 --checkpoint model_path

Test COCO dataset, we observe 0.5% noises for COCO panoptic segmentation

./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_coco.py 4 --checkpoint model_path

Test Open-Vocabulary ADE dataset, we observe 0.8% noises for COCO panoptic segmentation

./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_ade.py 4 --checkpoint model_path

Test Interactive COCO segmentation:

./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_ov_coco_pan_point.py 4 --checkpoint model_path

Test Youtube-VIS-19 dataset

./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_y19.py 4 --checkpoint model_path

Test VIP-Seg dataset

./tools/dist.sh test seg/configs/m2ov_val/eval_m2_convl_300q_ov_vipseg.py 1 --checkpoint model_path

Trained Model

ConvNeXt-large backbone. model

ConvNeXt-XX-large backbone. model

The Object-365 pretrained models can be found here.

Using one machine to re-run our codebase, ConvNeXt-large backbone. model, log.

Citation

If you think OMG-Seg codebase and models are useful for your research, please consider referring us:

@inproceedings{OMGSeg,
author       = {Xiangtai Li and
                  Haobo Yuan and
                  Wei Li and
                  Henghui Ding and
                  Size Wu and
                  Wenwei Zhang and
                  Yining Li and
                  Kai Chen and
                  Chen Change Loy},
  title        = {OMG-Seg: Is One Model Good Enough For All Segmentation?},
booktitle={CVPR},
  year={2024}
}

License

S-Lab LICENSE.