Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple adapter model inference #577

Closed
StephennFernandes opened this issue Aug 13, 2023 · 2 comments
Closed

multiple adapter model inference #577

StephennFernandes opened this issue Aug 13, 2023 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@StephennFernandes
Copy link

hey folks in the adapter community,
I am looking for a way to inference multiple NLP services eg NER, POS, QA, summarization, chatbot using a single given base model like LLaMa-2 where concurrent users use multiple services but the base model is shared across other adapters.

hoping such an implementation would reduce GPU utilization significantly as instead of using parallel finetuned models, one base model with mutliple adapters could be used.

requesting anyone with such info reach out, any help woiuld be highly appreciated.

@StephennFernandes StephennFernandes added the question Further information is requested label Aug 13, 2023
@HallerPatrick
Copy link

@StephennFernandes I am also interested in this use case. Did you find a solution?

@TimoImhof
Copy link
Contributor

TimoImhof commented Sep 7, 2023

Hi,

I'm not sure if I understand you correctly, but this sounds like a case for the Parallel composition block.
This block can be used to load and use multiple adapters in parallel, each with its own prediction head.
I took the example from the docs:

from adapters import AutoAdapterModel
from transformers import AutoTokenizer
import adapters.composition as ac

model = AutoAdapterModel.from_pretrained("distilbert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

adapter1 = model.load_adapter("sts/sts-b@ukp")
adapter2 = model.load_adapter("sts/mrpc@ukp")

model.active_adapters = ac.Parallel(adapter1, adapter2)

input_ids = tokenizer("Adapters are great!", "Adapters are awesome!", return_tensors="pt")

output1, output2 = model(**input_ids)

(Short warning: I already used the imports from the adapters package, the new version of adapter-transformers, you can find more information on that in #584.)
Each adapter generates an output. Depending on which service is required, you can then process the specific output further.
The section in the docs can be found here if you want some more information about possible composition blocks.

Hope this helps!
Best,
Timo

@TimoImhof TimoImhof self-assigned this Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants