Question on Default init_values in SwinTransformerV2CrBlock #2233
Answered
by
rwightman
zhaohm14
asked this question in
Contributing
-
Hi @rwightman, I have a question regarding the Thank you for your time! |
Beta Was this translation helpful? Give feedback.
Answered by
rwightman
Jul 23, 2024
Replies: 1 comment 3 replies
-
@zhaohm14 yes, it's intentional. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@zhaohm14 norm is at the end of the residual path, so the norm's weight is the last scaling layer before merging with shortcut, therefore, it's similar to layer-scale, skip-init, and resnet zero-init-bn which all scale the residual by a single scalar or one-per-channel and typically start with 0 to very small value.