Release of pretraining and fine tuning code #20

ksrinivs64 · 2023-03-08T18:10:36Z

Hi, thanks for a very nice piece of work. Do you also plan to release pretraining and fine tuning code for this model? Also, is there a way to apply the model on long sequences, such as those from the LRA benchmark? Thanks!

DanFu09 · 2023-03-08T18:32:47Z

Code here: https://github.com/HazyResearch/safari We don’t have a config for fine tuning, but will look to add it soon!

…

On Wed, Mar 8, 2023 at 2:10 PM ksrinivs64 ***@***.***> wrote: Hi, thanks for a very nice piece of work. Do you also plan to release pretraining and fine tuning code for this model? Also, is there a way to apply the model on long sequences, such as those from the LRA benchmark? Thanks! — Reply to this email directly, view it on GitHub <#20>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABDDIIWGFGZAM2YVAJGZHLDW3DDSRANCNFSM6AAAAAAVUC3GUA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

ksrinivs64 · 2023-03-08T18:46:38Z

thanks!

KurtFeynmanGodel · 2023-03-17T19:13:24Z

Code here: https://github.com/HazyResearch/safari We don’t have a config for fine tuning, but will look to add it soon!
…
On Wed, Mar 8, 2023 at 2:10 PM ksrinivs64 @.> wrote: Hi, thanks for a very nice piece of work. Do you also plan to release pretraining and fine tuning code for this model? Also, is there a way to apply the model on long sequences, such as those from the LRA benchmark? Thanks! — Reply to this email directly, view it on GitHub <#20>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDDIIWGFGZAM2YVAJGZHLDW3DDSRANCNFSM6AAAAAAVUC3GUA . You are receiving this because you are subscribed to this thread.Message ID: @.>

Hello, I am also interested in the fine-tuning code. If you do not have the time to cobble it together can you provide some hints on how a fine-tuning harness can be constructed? I really want to push this to its limits.

Also a few points.

Great work. I honestly believe this is the path to true coherent multi-modality
I tried out the models and I am impressed with the context length.
I did notice a higher propensity to hallucinate gibberish at extreme lengths but I assume that is due to model size.

Thanks for all the hard work. You're a rock star.

DanFu09 · 2023-03-17T20:05:34Z

Thank you for the kind words!

If you do not have the time to cobble it together can you provide some hints on how a fine-tuning harness can be constructed?

In safari, we'll need to put in some hooks for loading a pre-trained model: https://github.com/HazyResearch/safari/blob/main/train.py#L170. The H3 model definition in that repo has slightly different parameter names than the one in this repo, so we may need to have some custom code to rename the model parameters upon loading the state dict. Then it should work.

We'll try to get this implemented soon!

jordancole21 · 2023-03-22T00:50:23Z

Just came here to say I'm interested in the fine-tuning code as well! Great work! Listening to the podcast from Deep Papers and found out about you guys!

Been looking for a solution to the long context window for a little while now. So I'm excited to start training it on custom data!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release of pretraining and fine tuning code #20

Release of pretraining and fine tuning code #20

ksrinivs64 commented Mar 8, 2023

DanFu09 commented Mar 8, 2023 via email

ksrinivs64 commented Mar 8, 2023

KurtFeynmanGodel commented Mar 17, 2023

DanFu09 commented Mar 17, 2023

jordancole21 commented Mar 22, 2023

Release of pretraining and fine tuning code #20

Release of pretraining and fine tuning code #20

Comments

ksrinivs64 commented Mar 8, 2023

DanFu09 commented Mar 8, 2023 via email

ksrinivs64 commented Mar 8, 2023

KurtFeynmanGodel commented Mar 17, 2023

DanFu09 commented Mar 17, 2023

jordancole21 commented Mar 22, 2023