Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove zero initialization of to_attn_bias weights, since for these bias=False #302

Merged
merged 1 commit into from
Oct 3, 2024

Conversation

amorehead
Copy link
Contributor

Line 8 of Algorithm 24 uses a LinearNoBias layer following LayerNorm to merge pairwise with attention bias representations, as follows
image
However, the code has the LinearNoBias weights also initialized with zeros, meaning both the weights and biases of these modules are initially zero (or null) which leads to these weights receiving no gradients throughout training.

@lucidrains lucidrains merged commit 665ffef into lucidrains:main Oct 3, 2024
11 checks passed
@amorehead amorehead deleted the patch-2 branch October 3, 2024 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants