Skip to content

Fix missing initializations for models created in 2023#39239

Merged
Cyrilvallez merged 38 commits intohuggingface:mainfrom
bvantuan:fix-missing-initializations-2023
Jul 21, 2025
Merged

Fix missing initializations for models created in 2023#39239
Cyrilvallez merged 38 commits intohuggingface:mainfrom
bvantuan:fix-missing-initializations-2023

Conversation

@bvantuan
Copy link
Contributor

@bvantuan bvantuan commented Jul 6, 2025

What does this PR do?

Fixes missing weight initializations for models created in 2023.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@Cyrilvallez

@bvantuan bvantuan marked this pull request as draft July 6, 2025 15:14
@ArthurZucker
Copy link
Collaborator

cc @Cyrilvallez as we are also working on putting the default in the PreTrainedModel

@Cyrilvallez
Copy link
Member

Super happy to see you're following again on a new batch of models @bvantuan! 🚀🤗 Let me know when this is ready!

@bvantuan
Copy link
Contributor Author

bvantuan commented Jul 7, 2025

Yes, of course! Really excited to keep contributing whenever I have time.

module.weight.data.normal_(mean=0.0, std=factor * 0.02)

elif isinstance(module, nn.LayerNorm):
elif isinstance(module, (nn.LayerNorm, nn.BatchNorm2d)):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +445 to +461
elif isinstance(module, nn.Linear):
shape = module.weight.data.shape
gain = 1.0
scale = 1.0 # extra scale for gain
if module.bias is not None:
module.bias.data.zero_()
if shape[0] > shape[1]:
gain = math.sqrt(shape[0] / shape[1])
if shape[0] == self.config.vocab_size and shape[1] == self.config.hidden_size: # final projection?
scale = 0.5

gain *= scale
nn.init.orthogonal_(module.weight, gain=gain)
elif isinstance(module, nn.Embedding):
shape = module.weight.data.shape
gain = 1e-4 * math.sqrt(max(shape[0], shape[1]))
nn.init.orthogonal_(module.weight, gain=gain)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bvantuan
Copy link
Contributor Author

bvantuan commented Jul 9, 2025

Cc @Cyrilvallez ! The PR is now ready and awaiting your review😊.

@bvantuan bvantuan marked this pull request as ready for review July 9, 2025 11:08
Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, nice work! 🤗 Very happy to see this! Just a few comments to simplify!
Especially, the pattern copy_(torch.randn(...)) is an anti-pattern: we should use normal_ instead directly, which makes it much simpler and we don't have to worry about the shapes

@bvantuan
Copy link
Contributor Author

Thanks a lot for reviewing, @Cyrilvallez ! I’m really glad you’re happy with it. I completely agree—using normal_ instead of copy_(torch.randn(...)) is much cleaner and more elegant. I’ve simplified that part and addressed the other comments as well.

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: align, autoformer, bridgetower, bros, clap, clvp, efficientnet, fastspeech2_conformer, informer, kosmos2, mgp_str, mobilevit, mobilevitv2, mra, nllb_moe, omdet_turbo

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, thanks a lot for this new batch! Always greatly appreciated! 🤗

@Cyrilvallez Cyrilvallez merged commit 6b3a1f2 into huggingface:main Jul 21, 2025
23 checks passed
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jul 22, 2025
…9239)

* fix SwiftFormer

* fix Kosmos2

* fix Owlv2

* fix Sam

* fix Vits

* fix Pvt

* fix MobileViTV2

* fix PatchTST

* fix Bros

* fix Informer

* fix BridgeTower

* fix Mra and Yoso

* fix Rwkv

* fix EfficientNet

* fix NllbMoe

* fix Tvp

* fix Clap

* fix Autoformer

* fix SwiftFormer

* fix Mgpstr

* fix Align

* fix VitMatte

* fix SpeechT5

* add conditional check for parameters

* fix SpeechT5

* fix TimmBackbone and Clvp

* fix SwiftFormer

* fix SeamlessM4T and SeamlessM4Tv2

* fix Align

* fix Owlv2 and OwlViT

* add reviewed changes

* add reviewed changes

* fix typo

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
@ydshieh
Copy link
Collaborator

ydshieh commented Jul 25, 2025

Hi @bvantuan

Thanks for the PR. Could you check

    "models_fastspeech2_conformer": {
        "job_link": "https://github.com/huggingface/transformers/actions/runs/16433389284/job/46439076372",
        "single": [
            {
                "line": "tests/models/fastspeech2_conformer/test_modeling_fastspeech2_conformer.py::FastSpeech2ConformerModelTest::test_can_init_all_missing_weights",
                "trace": "(line 687)  AssertionError: False is not true : The following keys are not properly handled by `_init_weights()`:"
            },
            {
                "line": "tests/models/fastspeech2_conformer/test_modeling_fastspeech2_conformer.py::FastSpeech2ConformerWithHifiGanTest::test_can_init_all_missing_weights",
                "trace": "(line 687)  AssertionError: False is not true : The following keys are not properly handled by `_init_weights()`:"
            }
        ]
    },

which are skipped before this PR but now failing since this PR?

@bvantuan
Copy link
Contributor Author

Hi @ydshieh @Cyrilvallez ! I think the FastSpeech2Conformer tests require @require_torch_accelerator, which might explain why they didn’t run on this CI. I opened a PR #39689 to fix this.

@Cyrilvallez
Copy link
Member

Indeed @bvantuan, the tests should not have that decorator as it prevents them being run by the CI! Thanks @ydshieh for raising the issue, the PR linked by @bvantuan will fix it! 🤗 I'm reactivating the tests as well!

zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
…9239)

* fix SwiftFormer

* fix Kosmos2

* fix Owlv2

* fix Sam

* fix Vits

* fix Pvt

* fix MobileViTV2

* fix PatchTST

* fix Bros

* fix Informer

* fix BridgeTower

* fix Mra and Yoso

* fix Rwkv

* fix EfficientNet

* fix NllbMoe

* fix Tvp

* fix Clap

* fix Autoformer

* fix SwiftFormer

* fix Mgpstr

* fix Align

* fix VitMatte

* fix SpeechT5

* add conditional check for parameters

* fix SpeechT5

* fix TimmBackbone and Clvp

* fix SwiftFormer

* fix SeamlessM4T and SeamlessM4Tv2

* fix Align

* fix Owlv2 and OwlViT

* add reviewed changes

* add reviewed changes

* fix typo

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
…9239)

* fix SwiftFormer

* fix Kosmos2

* fix Owlv2

* fix Sam

* fix Vits

* fix Pvt

* fix MobileViTV2

* fix PatchTST

* fix Bros

* fix Informer

* fix BridgeTower

* fix Mra and Yoso

* fix Rwkv

* fix EfficientNet

* fix NllbMoe

* fix Tvp

* fix Clap

* fix Autoformer

* fix SwiftFormer

* fix Mgpstr

* fix Align

* fix VitMatte

* fix SpeechT5

* add conditional check for parameters

* fix SpeechT5

* fix TimmBackbone and Clvp

* fix SwiftFormer

* fix SeamlessM4T and SeamlessM4Tv2

* fix Align

* fix Owlv2 and OwlViT

* add reviewed changes

* add reviewed changes

* fix typo

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
…9239)

* fix SwiftFormer

* fix Kosmos2

* fix Owlv2

* fix Sam

* fix Vits

* fix Pvt

* fix MobileViTV2

* fix PatchTST

* fix Bros

* fix Informer

* fix BridgeTower

* fix Mra and Yoso

* fix Rwkv

* fix EfficientNet

* fix NllbMoe

* fix Tvp

* fix Clap

* fix Autoformer

* fix SwiftFormer

* fix Mgpstr

* fix Align

* fix VitMatte

* fix SpeechT5

* add conditional check for parameters

* fix SpeechT5

* fix TimmBackbone and Clvp

* fix SwiftFormer

* fix SeamlessM4T and SeamlessM4Tv2

* fix Align

* fix Owlv2 and OwlViT

* add reviewed changes

* add reviewed changes

* fix typo

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
…9239)

* fix SwiftFormer

* fix Kosmos2

* fix Owlv2

* fix Sam

* fix Vits

* fix Pvt

* fix MobileViTV2

* fix PatchTST

* fix Bros

* fix Informer

* fix BridgeTower

* fix Mra and Yoso

* fix Rwkv

* fix EfficientNet

* fix NllbMoe

* fix Tvp

* fix Clap

* fix Autoformer

* fix SwiftFormer

* fix Mgpstr

* fix Align

* fix VitMatte

* fix SpeechT5

* add conditional check for parameters

* fix SpeechT5

* fix TimmBackbone and Clvp

* fix SwiftFormer

* fix SeamlessM4T and SeamlessM4Tv2

* fix Align

* fix Owlv2 and OwlViT

* add reviewed changes

* add reviewed changes

* fix typo

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
…9239)

* fix SwiftFormer

* fix Kosmos2

* fix Owlv2

* fix Sam

* fix Vits

* fix Pvt

* fix MobileViTV2

* fix PatchTST

* fix Bros

* fix Informer

* fix BridgeTower

* fix Mra and Yoso

* fix Rwkv

* fix EfficientNet

* fix NllbMoe

* fix Tvp

* fix Clap

* fix Autoformer

* fix SwiftFormer

* fix Mgpstr

* fix Align

* fix VitMatte

* fix SpeechT5

* add conditional check for parameters

* fix SpeechT5

* fix TimmBackbone and Clvp

* fix SwiftFormer

* fix SeamlessM4T and SeamlessM4Tv2

* fix Align

* fix Owlv2 and OwlViT

* add reviewed changes

* add reviewed changes

* fix typo

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
…9239)

* fix SwiftFormer

* fix Kosmos2

* fix Owlv2

* fix Sam

* fix Vits

* fix Pvt

* fix MobileViTV2

* fix PatchTST

* fix Bros

* fix Informer

* fix BridgeTower

* fix Mra and Yoso

* fix Rwkv

* fix EfficientNet

* fix NllbMoe

* fix Tvp

* fix Clap

* fix Autoformer

* fix SwiftFormer

* fix Mgpstr

* fix Align

* fix VitMatte

* fix SpeechT5

* add conditional check for parameters

* fix SpeechT5

* fix TimmBackbone and Clvp

* fix SwiftFormer

* fix SeamlessM4T and SeamlessM4Tv2

* fix Align

* fix Owlv2 and OwlViT

* add reviewed changes

* add reviewed changes

* fix typo

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
…9239)

* fix SwiftFormer

* fix Kosmos2

* fix Owlv2

* fix Sam

* fix Vits

* fix Pvt

* fix MobileViTV2

* fix PatchTST

* fix Bros

* fix Informer

* fix BridgeTower

* fix Mra and Yoso

* fix Rwkv

* fix EfficientNet

* fix NllbMoe

* fix Tvp

* fix Clap

* fix Autoformer

* fix SwiftFormer

* fix Mgpstr

* fix Align

* fix VitMatte

* fix SpeechT5

* add conditional check for parameters

* fix SpeechT5

* fix TimmBackbone and Clvp

* fix SwiftFormer

* fix SeamlessM4T and SeamlessM4Tv2

* fix Align

* fix Owlv2 and OwlViT

* add reviewed changes

* add reviewed changes

* fix typo

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants