Welcome to CALM-pytorch Discussions! #1

lucidrains · 2024-01-09T16:32:54Z

lucidrains
Jan 9, 2024
Maintainer

👋 Welcome!

We’re using Discussions as a place to connect with other members of our community. We hope that you:

Ask questions you’re wondering about.
Share ideas.
Engage with other community members.
Welcome others and are open-minded. Remember that this is a community we
build together 💪.

To get started, comment below with an introduction of yourself and tell us about what you do with this community.

jidno · 2024-01-16T14:53:18Z

jidno
Jan 16, 2024

Hey everyone!

Curious on everyone's thought of using this framework for fine-tuning and augmenting CV models. My initial idea was to augment pre-trained CLIP models with a smaller ViT fine-tuned on application-specific data. The hypothesis here is that by keeping the pre-trained model's weights frozen the composed model would be robust against distribution shifts in the data while getting better performance from the smaller specialized augmenting model. This framework might be easier and more performant than using other methods like weight space ensembling.

Another idea was to use this framework for augmenting video transformer models like ViViT with a pre-trained CLIP ViT as and augmenting model. The appeal of this approach would be the much lower number of parameters to train a video transformer, while leveraging the learned feature extraction capabilities of the pre-trained CLIP model.

Let me know if anybody thinks this idea makes sense or not, would love to get your thoughts on it! Thanks!

1 reply

lucidrains Jan 16, 2024
Maintainer Author

cool ideas! you should just try it and share your results, in the spirit of open science

krzysiekpodk · 2024-01-17T11:04:42Z

krzysiekpodk
Jan 17, 2024

Extremely excited for this repo!! this might really replace Lora!

Btw. do you think this method can also be used to actually update all weights of bigger model? so instead of creating adapter we use this method to perform low vram full finetune of bigger model after doing it smaller model first?

4 replies

lucidrains Jan 17, 2024
Maintainer Author

@krzysiekpodk sure, you'll just need to unfreeze the models

however, i think the main sell of the paper is that so you don't have to?

lucidrains Jan 17, 2024
Maintainer Author

also, they never explored lora + cross attention in the paper

krzysiekpodk Jan 17, 2024

Maybe I have misunderstood, but I noticed they claim that creating A+B uses less than 10% of overall finetune data, I assumed this would enable a low-48GB-VRAM user like myself to finetune a smaller model (i.e. 7B) in a reasonable time, using full data set (or taking DeepSeek Coder 6.7B fine-tune: Magicoder) then with just a fraction of it use CALM, then after I merge it with 70B model it would be almost equivalent to finetuning 70B model directly, but with much less compute. Alternatively, I have been thinking if we could improve the coding capabilities of DeepSeek 67B model based on superior in this domain DeepSeek Coder 33B (although they have different tokenizers, but lets assume they have the same and its CodeLlama 34B and Llama 2 70B)

lucidrains Jan 17, 2024
Maintainer Author

you are correct that efficient finetuning is the goal

how they achieve that is never updating the weights of either of the models (augment or anchor). they only update the parameters of this cross attention layer of the anchor hidden states attending to the augment hidden states (every 4th layer). the whole paper seems to be selling it as a successor to lora

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Welcome to CALM-pytorch Discussions! #1

{{title}}

Replies: 2 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Welcome to CALM-pytorch Discussions! #1

lucidrains Jan 9, 2024 Maintainer

👋 Welcome!

Replies: 2 comments · 5 replies

jidno Jan 16, 2024

lucidrains Jan 16, 2024 Maintainer Author

krzysiekpodk Jan 17, 2024

lucidrains Jan 17, 2024 Maintainer Author

lucidrains Jan 17, 2024 Maintainer Author

krzysiekpodk Jan 17, 2024

lucidrains Jan 17, 2024 Maintainer Author

lucidrains
Jan 9, 2024
Maintainer

Replies: 2 comments 5 replies

jidno
Jan 16, 2024

lucidrains Jan 16, 2024
Maintainer Author

krzysiekpodk
Jan 17, 2024

lucidrains Jan 17, 2024
Maintainer Author

lucidrains Jan 17, 2024
Maintainer Author

lucidrains Jan 17, 2024
Maintainer Author