Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quadratic_linear_attn implementation #4

Open
Kiet0712 opened this issue Sep 5, 2024 · 2 comments
Open

quadratic_linear_attn implementation #4

Kiet0712 opened this issue Sep 5, 2024 · 2 comments
Assignees

Comments

@Kiet0712
Copy link

Kiet0712 commented Sep 5, 2024

I think you should put an epsilon in denominator of output of quadratic_linear_attn function to prevent NaN value when training HedgeHog MLP.
qk / (qk.sum(dim=-1, keepdim=True) +epsilon)

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar
Copy link

github-actions bot commented Sep 5, 2024

Hello there, thank you for opening an Issue ! 🙏🏻 The team was notified and they will get back to you asap.

@kyegomez
Copy link
Owner

kyegomez commented Sep 8, 2024

@Kiet0712 if you could open up a pr that would be nice :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants