You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many language models use generate API to generate text. When using ParallelAdapters, would use of generate still be possible?
When using Parallel Adapter, how does latency & GPU memory & GPU utilization look like?
Would having ParallelAdapter with N adapters be equivalent to having batch size N ?
Is there a benchmark study on latency comparison on using ParallelAdapter vs SingleAdapter?
The text was updated successfully, but these errors were encountered:
Hi @hchoi-moveworks ,
with our new release of adapter-transformers v.3.2, we solved some of your issues
We support parallel composition for prefix tuning but not for IA3.
We added support for generation with parallel composition with the generate method.
Unfortunately, we don't have statistics on the latency, GPU memory or GPU utilization. If you get statistics, feel free to share them with us :) Using parallel composition is not equivalent to a batch size of N. The batch size is adapted at the first parallel block (see this illustration) until the first parallel block the batch size is the original one.
This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.
Environment info
NA
adapter-transformers
version: 3.1.0Details
Is Parallel Adapters [1] supported for IA3 / Prefix Tuning[2] as well?
[1]: https://docs.adapterhub.ml/adapter_composition.html#parallel
[2]: https://docs.adapterhub.ml/classes/adapter_config.html?highlight=ia3#prefix-tuning
Many language models use
generate
API to generate text. When using ParallelAdapters, would use ofgenerate
still be possible?When using Parallel Adapter, how does latency & GPU memory & GPU utilization look like?
Would having ParallelAdapter with N adapters be equivalent to having batch size N ?
Is there a benchmark study on latency comparison on using ParallelAdapter vs SingleAdapter?
The text was updated successfully, but these errors were encountered: