-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release of pretraining and fine tuning code #20
Comments
Code here:
https://github.com/HazyResearch/safari
We don’t have a config for fine tuning, but will look to add it soon!
…On Wed, Mar 8, 2023 at 2:10 PM ksrinivs64 ***@***.***> wrote:
Hi, thanks for a very nice piece of work. Do you also plan to release
pretraining and fine tuning code for this model? Also, is there a way to
apply the model on long sequences, such as those from the LRA benchmark?
Thanks!
—
Reply to this email directly, view it on GitHub
<#20>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDDIIWGFGZAM2YVAJGZHLDW3DDSRANCNFSM6AAAAAAVUC3GUA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
thanks! |
Hello, I am also interested in the fine-tuning code. If you do not have the time to cobble it together can you provide some hints on how a fine-tuning harness can be constructed? I really want to push this to its limits. Also a few points.
Thanks for all the hard work. You're a rock star. |
Thank you for the kind words!
In safari, we'll need to put in some hooks for loading a pre-trained model: https://github.com/HazyResearch/safari/blob/main/train.py#L170. The H3 model definition in that repo has slightly different parameter names than the one in this repo, so we may need to have some custom code to rename the model parameters upon loading the state dict. Then it should work. We'll try to get this implemented soon! |
Just came here to say I'm interested in the fine-tuning code as well! Great work! Listening to the podcast from Deep Papers and found out about you guys! Been looking for a solution to the long context window for a little while now. So I'm excited to start training it on custom data! |
Hi, thanks for a very nice piece of work. Do you also plan to release pretraining and fine tuning code for this model? Also, is there a way to apply the model on long sequences, such as those from the LRA benchmark? Thanks!
The text was updated successfully, but these errors were encountered: