-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training Hyena-based Models with FlashFFTConv + Safari #16
Comments
You should squeeze the kernel once before you pass it in, so the shape is
(H, L). Then it should work! (All Hyena experiments in the paper were on a
private fork of safari).
…On Mon, Jan 15, 2024 at 8:33 AM Guy Jacob ***@***.***> wrote:
I saw in #9 <#9>
that it should be possible to to run training with FlashFFTConv. I
integrated the library into the Safari codebase (based on the Hyena example
in this repo - followed the Readme there and also diff-ed the code). Trying
to run The Pile experiment I'm seeing some issues:
- With the sequence length 4096 I get the following error:
RuntimeError: Function FlashFFTConvFuncBackward returned an invalid gradient at index 1 - got [864, 4096] but expected shape compatible with [1, 864, 4096]
Same thing happens with sequence length 2048 (with shape [864, 2048]
of course).
- With sequence length 1024 I get:
File "/work/venvs/safari/lib/python3.10/site-packages/flashfftconv-0.0.0-py3.10.egg/flashfftconv/conv.py", line 608, in forward
return monarch_conv_forward_r2r(
RuntimeError: k_f must have shape (H, fftsize + 1, 2)
*Note that this error also happens if I try to run the
benchmark_fwd.py in the Hyena folder in this repo with sequence length
1024.* So at least this one seems unrelated any mistakes I might have
made integrating the code.
Should I expect this combo of Safari + Hyena + FlashFFTConv to work for
training? If so, any ideas how to address the errors above?
Thank you!
—
Reply to this email directly, view it on GitHub
<#16>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDDIIUDGD47G2S5L5O4NSTYOVK5DAVCNFSM6AAAAABB3RJJSSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4DEMZWGMYTIMQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thanks for the quick reply. Funny, I actually tried your suggestion this before opening the issue, and indeed it bypasses the errors I mentioned above, but then other issues come up. And since I wasn't sure if it was a valid fix to begin with so didn't want to divert the discussion unnecessarily. So after adding the squeeze op, it only runs if I pass
(pasted only part of the error message for brevity) In addition, when running specifically with sequence length 4096, this error shows up repeatedly (it doesn't crash because of this, just keeps dumping it over and over):
Any of this makes sense? |
I saw in #9 that it should be possible to to run training with FlashFFTConv. I integrated the library into the Safari codebase (based on the Hyena example in this repo - followed the Readme there and also diff-ed the code). Trying to run The Pile experiment I'm seeing some issues:
[864, 2048]
of course).benchmark_fwd.py
in the Hyena folder in this repo with sequence length 1024. So at least this one seems unrelated any mistakes I might have made integrating the code.Should I expect this combo of Safari + Hyena + FlashFFTConv to work for training? If so, any ideas how to address the errors above?
Thank you!
The text was updated successfully, but these errors were encountered: