Fix missing initializations for models created in 2023#39239
Fix missing initializations for models created in 2023#39239Cyrilvallez merged 38 commits intohuggingface:mainfrom
Conversation
|
cc @Cyrilvallez as we are also working on putting the default in the PreTrainedModel |
|
Super happy to see you're following again on a new batch of models @bvantuan! 🚀🤗 Let me know when this is ready! |
|
Yes, of course! Really excited to keep contributing whenever I have time. |
| module.weight.data.normal_(mean=0.0, std=factor * 0.02) | ||
|
|
||
| elif isinstance(module, nn.LayerNorm): | ||
| elif isinstance(module, (nn.LayerNorm, nn.BatchNorm2d)): |
There was a problem hiding this comment.
| elif isinstance(module, nn.Linear): | ||
| shape = module.weight.data.shape | ||
| gain = 1.0 | ||
| scale = 1.0 # extra scale for gain | ||
| if module.bias is not None: | ||
| module.bias.data.zero_() | ||
| if shape[0] > shape[1]: | ||
| gain = math.sqrt(shape[0] / shape[1]) | ||
| if shape[0] == self.config.vocab_size and shape[1] == self.config.hidden_size: # final projection? | ||
| scale = 0.5 | ||
|
|
||
| gain *= scale | ||
| nn.init.orthogonal_(module.weight, gain=gain) | ||
| elif isinstance(module, nn.Embedding): | ||
| shape = module.weight.data.shape | ||
| gain = 1e-4 * math.sqrt(max(shape[0], shape[1])) | ||
| nn.init.orthogonal_(module.weight, gain=gain) |
There was a problem hiding this comment.
|
Cc @Cyrilvallez ! The PR is now ready and awaiting your review😊. |
Cyrilvallez
left a comment
There was a problem hiding this comment.
All right, nice work! 🤗 Very happy to see this! Just a few comments to simplify!
Especially, the pattern copy_(torch.randn(...)) is an anti-pattern: we should use normal_ instead directly, which makes it much simpler and we don't have to worry about the shapes
src/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py
Outdated
Show resolved
Hide resolved
|
Thanks a lot for reviewing, @Cyrilvallez ! I’m really glad you’re happy with it. I completely agree—using |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: align, autoformer, bridgetower, bros, clap, clvp, efficientnet, fastspeech2_conformer, informer, kosmos2, mgp_str, mobilevit, mobilevitv2, mra, nllb_moe, omdet_turbo |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Alright, thanks a lot for this new batch! Always greatly appreciated! 🤗
…9239) * fix SwiftFormer * fix Kosmos2 * fix Owlv2 * fix Sam * fix Vits * fix Pvt * fix MobileViTV2 * fix PatchTST * fix Bros * fix Informer * fix BridgeTower * fix Mra and Yoso * fix Rwkv * fix EfficientNet * fix NllbMoe * fix Tvp * fix Clap * fix Autoformer * fix SwiftFormer * fix Mgpstr * fix Align * fix VitMatte * fix SpeechT5 * add conditional check for parameters * fix SpeechT5 * fix TimmBackbone and Clvp * fix SwiftFormer * fix SeamlessM4T and SeamlessM4Tv2 * fix Align * fix Owlv2 and OwlViT * add reviewed changes * add reviewed changes * fix typo --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
|
Hi @bvantuan Thanks for the PR. Could you check which are skipped before this PR but now failing since this PR? |
|
Hi @ydshieh @Cyrilvallez ! I think the |
…9239) * fix SwiftFormer * fix Kosmos2 * fix Owlv2 * fix Sam * fix Vits * fix Pvt * fix MobileViTV2 * fix PatchTST * fix Bros * fix Informer * fix BridgeTower * fix Mra and Yoso * fix Rwkv * fix EfficientNet * fix NllbMoe * fix Tvp * fix Clap * fix Autoformer * fix SwiftFormer * fix Mgpstr * fix Align * fix VitMatte * fix SpeechT5 * add conditional check for parameters * fix SpeechT5 * fix TimmBackbone and Clvp * fix SwiftFormer * fix SeamlessM4T and SeamlessM4Tv2 * fix Align * fix Owlv2 and OwlViT * add reviewed changes * add reviewed changes * fix typo --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
…9239) * fix SwiftFormer * fix Kosmos2 * fix Owlv2 * fix Sam * fix Vits * fix Pvt * fix MobileViTV2 * fix PatchTST * fix Bros * fix Informer * fix BridgeTower * fix Mra and Yoso * fix Rwkv * fix EfficientNet * fix NllbMoe * fix Tvp * fix Clap * fix Autoformer * fix SwiftFormer * fix Mgpstr * fix Align * fix VitMatte * fix SpeechT5 * add conditional check for parameters * fix SpeechT5 * fix TimmBackbone and Clvp * fix SwiftFormer * fix SeamlessM4T and SeamlessM4Tv2 * fix Align * fix Owlv2 and OwlViT * add reviewed changes * add reviewed changes * fix typo --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
…9239) * fix SwiftFormer * fix Kosmos2 * fix Owlv2 * fix Sam * fix Vits * fix Pvt * fix MobileViTV2 * fix PatchTST * fix Bros * fix Informer * fix BridgeTower * fix Mra and Yoso * fix Rwkv * fix EfficientNet * fix NllbMoe * fix Tvp * fix Clap * fix Autoformer * fix SwiftFormer * fix Mgpstr * fix Align * fix VitMatte * fix SpeechT5 * add conditional check for parameters * fix SpeechT5 * fix TimmBackbone and Clvp * fix SwiftFormer * fix SeamlessM4T and SeamlessM4Tv2 * fix Align * fix Owlv2 and OwlViT * add reviewed changes * add reviewed changes * fix typo --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
…9239) * fix SwiftFormer * fix Kosmos2 * fix Owlv2 * fix Sam * fix Vits * fix Pvt * fix MobileViTV2 * fix PatchTST * fix Bros * fix Informer * fix BridgeTower * fix Mra and Yoso * fix Rwkv * fix EfficientNet * fix NllbMoe * fix Tvp * fix Clap * fix Autoformer * fix SwiftFormer * fix Mgpstr * fix Align * fix VitMatte * fix SpeechT5 * add conditional check for parameters * fix SpeechT5 * fix TimmBackbone and Clvp * fix SwiftFormer * fix SeamlessM4T and SeamlessM4Tv2 * fix Align * fix Owlv2 and OwlViT * add reviewed changes * add reviewed changes * fix typo --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
…9239) * fix SwiftFormer * fix Kosmos2 * fix Owlv2 * fix Sam * fix Vits * fix Pvt * fix MobileViTV2 * fix PatchTST * fix Bros * fix Informer * fix BridgeTower * fix Mra and Yoso * fix Rwkv * fix EfficientNet * fix NllbMoe * fix Tvp * fix Clap * fix Autoformer * fix SwiftFormer * fix Mgpstr * fix Align * fix VitMatte * fix SpeechT5 * add conditional check for parameters * fix SpeechT5 * fix TimmBackbone and Clvp * fix SwiftFormer * fix SeamlessM4T and SeamlessM4Tv2 * fix Align * fix Owlv2 and OwlViT * add reviewed changes * add reviewed changes * fix typo --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
…9239) * fix SwiftFormer * fix Kosmos2 * fix Owlv2 * fix Sam * fix Vits * fix Pvt * fix MobileViTV2 * fix PatchTST * fix Bros * fix Informer * fix BridgeTower * fix Mra and Yoso * fix Rwkv * fix EfficientNet * fix NllbMoe * fix Tvp * fix Clap * fix Autoformer * fix SwiftFormer * fix Mgpstr * fix Align * fix VitMatte * fix SpeechT5 * add conditional check for parameters * fix SpeechT5 * fix TimmBackbone and Clvp * fix SwiftFormer * fix SeamlessM4T and SeamlessM4Tv2 * fix Align * fix Owlv2 and OwlViT * add reviewed changes * add reviewed changes * fix typo --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
…9239) * fix SwiftFormer * fix Kosmos2 * fix Owlv2 * fix Sam * fix Vits * fix Pvt * fix MobileViTV2 * fix PatchTST * fix Bros * fix Informer * fix BridgeTower * fix Mra and Yoso * fix Rwkv * fix EfficientNet * fix NllbMoe * fix Tvp * fix Clap * fix Autoformer * fix SwiftFormer * fix Mgpstr * fix Align * fix VitMatte * fix SpeechT5 * add conditional check for parameters * fix SpeechT5 * fix TimmBackbone and Clvp * fix SwiftFormer * fix SeamlessM4T and SeamlessM4Tv2 * fix Align * fix Owlv2 and OwlViT * add reviewed changes * add reviewed changes * fix typo --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
What does this PR do?
Fixes missing weight initializations for models created in 2023.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@Cyrilvallez