You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Regarding the first question, we conducted the fine-tuning based on the same seed and the same base model, so their initial weight should be the same before fine-tuning.
As for the second question, I think the weighted averaging methods are certainly different from the normal training process. But since it is less frequently adopted in practice of fine-tuning LLMs, we didn't conduct experiments on that. To draw insights from weighted averaging methods, we think at least two parts of experiments are needed if anyone is interested in this aspect:
Weighted averaging method can reproduce the good performance as normal fine-tuning techniques
After that, we may check the layer-wise weight norm of weighted-average-fine-tuned models
In the article, only the comparison of the average weight paradigm of each layer during lora fine-tuning is given.
The text was updated successfully, but these errors were encountered: