Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release of pretraining and fine tuning code #20

Open
ksrinivs64 opened this issue Mar 8, 2023 · 5 comments
Open

Release of pretraining and fine tuning code #20

ksrinivs64 opened this issue Mar 8, 2023 · 5 comments

Comments

@ksrinivs64
Copy link

Hi, thanks for a very nice piece of work. Do you also plan to release pretraining and fine tuning code for this model? Also, is there a way to apply the model on long sequences, such as those from the LRA benchmark? Thanks!

@DanFu09
Copy link
Contributor

DanFu09 commented Mar 8, 2023 via email

@ksrinivs64
Copy link
Author

thanks!

@KurtFeynmanGodel
Copy link

Code here: https://github.com/HazyResearch/safari We don’t have a config for fine tuning, but will look to add it soon!

On Wed, Mar 8, 2023 at 2:10 PM ksrinivs64 @.> wrote: Hi, thanks for a very nice piece of work. Do you also plan to release pretraining and fine tuning code for this model? Also, is there a way to apply the model on long sequences, such as those from the LRA benchmark? Thanks! — Reply to this email directly, view it on GitHub <#20>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDDIIWGFGZAM2YVAJGZHLDW3DDSRANCNFSM6AAAAAAVUC3GUA . You are receiving this because you are subscribed to this thread.Message ID: @.>

Hello, I am also interested in the fine-tuning code. If you do not have the time to cobble it together can you provide some hints on how a fine-tuning harness can be constructed? I really want to push this to its limits.

Also a few points.

  1. Great work. I honestly believe this is the path to true coherent multi-modality
  2. I tried out the models and I am impressed with the context length.
  3. I did notice a higher propensity to hallucinate gibberish at extreme lengths but I assume that is due to model size.

Thanks for all the hard work. You're a rock star.

@DanFu09
Copy link
Contributor

DanFu09 commented Mar 17, 2023

Thank you for the kind words!

If you do not have the time to cobble it together can you provide some hints on how a fine-tuning harness can be constructed?

In safari, we'll need to put in some hooks for loading a pre-trained model: https://github.com/HazyResearch/safari/blob/main/train.py#L170. The H3 model definition in that repo has slightly different parameter names than the one in this repo, so we may need to have some custom code to rename the model parameters upon loading the state dict. Then it should work.

We'll try to get this implemented soon!

@jordancole21
Copy link

Just came here to say I'm interested in the fine-tuning code as well! Great work! Listening to the podcast from Deep Papers and found out about you guys!

Been looking for a solution to the long context window for a little while now. So I'm excited to start training it on custom data!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants