Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion about training adapters sequentially in a single script #519

Closed
pugantsov opened this issue Mar 15, 2023 · 3 comments
Closed

Confusion about training adapters sequentially in a single script #519

pugantsov opened this issue Mar 15, 2023 · 3 comments
Labels
question Further information is requested Stale

Comments

@pugantsov
Copy link

pugantsov commented Mar 15, 2023

I am training multiple adapters, using BertModel as a base. I have a question about the following process and how it works with adapters:

(1) I initiate a BertModel outside of a loop (mostly to save time).
(2) I then set a new adapter to train with the following code:

model.add_adapter(model_name, config=transformers.adapters.PfeifferConfig())
model.train_adapter(model_name)
model = model.to(device)

(3) At the end of the training loop, I disable and delete the adapter as follows:

model.set_active_adapters(None)
model.delete_adapter(model_name)

Now, the question I have is:
Once this loop starts again with adding a fresh adapter:
(1) Is it training a brand new, randomly initialised head and, therefore, only encodes the knowledge of the task I am currently training?
(2) Or do I have to call something like delete_head (or just initialise the BertModel at the start of each iteration) as well if I am running these in a loop so that there is no information leak between each of the subsequent tasks?

@pugantsov pugantsov added the question Further information is requested label Mar 15, 2023
@hSterz
Copy link
Member

hSterz commented Mar 16, 2023

Hey @pugantsov , to make sure each adapter training starts with a new randomly initialized adapter and a new head you need to make sure that you:

  • use the BertAdapterModel which allows you to have a new head for each adapter.
  • add an adapter (add_adapter) and head (add_classification_head depending on the task) with the same name at the beginning of each training of a fresh adapter.
  • activate and train the adapter and head with train_adapter. If they have the same name the head is automatically activated and set to training
  • reset the optimizer and gradients at the beginning of each training to avoid information passing between the training of different adapters

You don't have to delete the adapters that you have trained. You could keep them loaded but not activated in the model. If you want to delete them you have to delete the adapter and head separately.
I hope this helps.

@adapter-hub-bert
Copy link
Member

This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.

@adapter-hub-bert
Copy link
Member

This issue was closed because it was stale for 14 days without any activity.

@adapter-hub-bert adapter-hub-bert closed this as not planned Won't fix, can't repro, duplicate, stale Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

3 participants