quadratic_linear_attn implementation #4

Kiet0712 · 2024-09-05T11:18:50Z

I think you should put an epsilon in denominator of output of quadratic_linear_attn function to prevent NaN value when training HedgeHog MLP.
qk / (qk.sum(dim=-1, keepdim=True) +epsilon)

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

github-actions · 2024-09-05T11:19:15Z

Hello there, thank you for opening an Issue ! 🙏🏻 The team was notified and they will get back to you asap.

kyegomez · 2024-09-08T22:56:50Z

@Kiet0712 if you could open up a pr that would be nice :)

Kiet0712 assigned kyegomez Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quadratic_linear_attn implementation #4

quadratic_linear_attn implementation #4

Kiet0712 commented Sep 5, 2024 •

edited by polar-sh bot

Loading

github-actions bot commented Sep 5, 2024

kyegomez commented Sep 8, 2024

quadratic_linear_attn implementation #4

quadratic_linear_attn implementation #4

Comments

Kiet0712 commented Sep 5, 2024 • edited by polar-sh bot Loading

Upvote & Fund

github-actions bot commented Sep 5, 2024

kyegomez commented Sep 8, 2024

Kiet0712 commented Sep 5, 2024 •

edited by polar-sh bot

Loading