Welcome to CALM-pytorch Discussions! #1
Replies: 2 comments 5 replies
-
Hey everyone! Curious on everyone's thought of using this framework for fine-tuning and augmenting CV models. My initial idea was to augment pre-trained CLIP models with a smaller ViT fine-tuned on application-specific data. The hypothesis here is that by keeping the pre-trained model's weights frozen the composed model would be robust against distribution shifts in the data while getting better performance from the smaller specialized augmenting model. This framework might be easier and more performant than using other methods like weight space ensembling. Another idea was to use this framework for augmenting video transformer models like ViViT with a pre-trained CLIP ViT as and augmenting model. The appeal of this approach would be the much lower number of parameters to train a video transformer, while leveraging the learned feature extraction capabilities of the pre-trained CLIP model. Let me know if anybody thinks this idea makes sense or not, would love to get your thoughts on it! Thanks! |
Beta Was this translation helpful? Give feedback.
-
Extremely excited for this repo!! this might really replace Lora! Btw. do you think this method can also be used to actually update all weights of bigger model? so instead of creating adapter we use this method to perform low vram full finetune of bigger model after doing it smaller model first? |
Beta Was this translation helpful? Give feedback.
-
👋 Welcome!
We’re using Discussions as a place to connect with other members of our community. We hope that you:
build together 💪.
To get started, comment below with an introduction of yourself and tell us about what you do with this community.
Beta Was this translation helpful? Give feedback.
All reactions