You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now, the question I have is:
Once this loop starts again with adding a fresh adapter:
(1) Is it training a brand new, randomly initialised head and, therefore, only encodes the knowledge of the task I am currently training?
(2) Or do I have to call something like delete_head (or just initialise the BertModel at the start of each iteration) as well if I am running these in a loop so that there is no information leak between each of the subsequent tasks?
The text was updated successfully, but these errors were encountered:
Hey @pugantsov , to make sure each adapter training starts with a new randomly initialized adapter and a new head you need to make sure that you:
use the BertAdapterModel which allows you to have a new head for each adapter.
add an adapter (add_adapter) and head (add_classification_head depending on the task) with the same name at the beginning of each training of a fresh adapter.
activate and train the adapter and head with train_adapter. If they have the same name the head is automatically activated and set to training
reset the optimizer and gradients at the beginning of each training to avoid information passing between the training of different adapters
You don't have to delete the adapters that you have trained. You could keep them loaded but not activated in the model. If you want to delete them you have to delete the adapter and head separately.
I hope this helps.
This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.
I am training multiple adapters, using BertModel as a base. I have a question about the following process and how it works with adapters:
(1) I initiate a BertModel outside of a loop (mostly to save time).
(2) I then set a new adapter to train with the following code:
(3) At the end of the training loop, I disable and delete the adapter as follows:
Now, the question I have is:
Once this loop starts again with adding a fresh adapter:
(1) Is it training a brand new, randomly initialised head and, therefore, only encodes the knowledge of the task I am currently training?
(2) Or do I have to call something like
delete_head
(or just initialise the BertModel at the start of each iteration) as well if I am running these in a loop so that there is no information leak between each of the subsequent tasks?The text was updated successfully, but these errors were encountered: