Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Mask2Former to SMP #1044

Open
caxel-ap opened this issue Jan 24, 2025 · 4 comments · May be fixed by #1059
Open

Add Mask2Former to SMP #1044

caxel-ap opened this issue Jan 24, 2025 · 4 comments · May be fixed by #1059
Labels

Comments

@caxel-ap
Copy link

Mask2Former model was introduced in the paper Masked-attention Mask Transformer for Universal Image Segmentation and first released in this repository.

Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA, MaskFormer both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi-scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.

Papers with Code
https://paperswithcode.com/paper/masked-attention-mask-transformer-for

Paper:
https://arxiv.org/abs/2112.01527

HF Reference implementation:
https://huggingface.co/docs/transformers/main/en/model_doc/mask2former
https://github.com/huggingface/transformers/blob/main/src/transformers/models/mask2former/modeling_mask2former.py

@qubvel
Copy link
Collaborator

qubvel commented Jan 25, 2025

Thanks for opening an issue @caxel-ap 🤗 It might be a first instance segmentation model in the library, let's see if anyone would be a eager to contribute, I suppose it will be super impactful 👍

@caxel-ap
Copy link
Author

Even just for semantic segmentation would be great to have in here someday, I’ve had good results using it in transformers for semantic with https://huggingface.co/facebook/mask2former-swin-large-ade-semantic

@ariG23498
Copy link

Hey @qubvel !

I would like to work on it. Is there a guideline on how to contribute to this repo? My todos would be:

  1. Read the resources shared
  2. Read contribution guideline if available
  3. Start a draft PR and iterate on it.

Thanks!

@qubvel
Copy link
Collaborator

qubvel commented Feb 7, 2025

Hey @ariG23498! That's super cool, thanks for your interest 🤗

At the moment there are no guidelines, but you can get inspiration from any of the existing models. The code for existing models is relatively small, so you can just copy decoders/unet and start from that point.

  • There is no need to implement an encoder, as far as I understand Swin should be compatible with timm model.
  • As suggested above we can start with semantic segmentation decoder and see if we can extend it to instance/panoptic as well.
  • The main idea is to have decoder.py and model.py files under decoders/mask2former.

Just let me know what questions you will face and I will try to answer, and then add it to the docs 🤗

@ariG23498 ariG23498 linked a pull request Feb 11, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants