Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to enable multiple LoRA adapters? #576

Closed
Jaja612 opened this issue Aug 13, 2023 · 3 comments
Closed

How to enable multiple LoRA adapters? #576

Jaja612 opened this issue Aug 13, 2023 · 3 comments
Labels
question Further information is requested

Comments

@Jaja612
Copy link

Jaja612 commented Aug 13, 2023

Hi, thanks for this great project! I am wondering what should I revise to support one forward pass with multiple LoRA adapters. It seems to be a straightforward extension, but the current version doesn't support stack (activate) multiple LoRA adapters.

Any guidance and help would be appreciated, thanks! @calpt

@Jaja612 Jaja612 added the question Further information is requested label Aug 13, 2023
@Jaja612
Copy link
Author

Jaja612 commented Aug 13, 2023

Hi @StephennFernandes , thanks for the reply. However, what I need is more like "Stack" rather than this "Parallel" composition for LoRA, see this doc.
Let me clarify my question. Suppose we have a model M, and a LoRA adapter A, I want to first merge their weights, and then train another LoRA adapter B based on M+A. I think this merge property is one advantage for LoRA, but it seems this repo only support merge for inference, rather than training.

When I tried to activate two LoRA adapters A and B, and set only B as trainable, it can not perform the forward pass because LoRA doesn't support stack operation.

Any guidance about how to revise this package to support the above feature would be greatly appreciated, thanks!

@lenglaender
Copy link
Member

Hi @Jaja612,

The LoRA paper suggests merging LoRA adapters to not change the inference time: "Our simple linear design allows us to merge the trainable matrices with the frozen weights when deployed, introducing no inference latency compared to a fully fine-tuned model, by construction." (Lora: Low-Rank Adaptation of Large Language Models by Hu et al.)

Other modular combinations, such as stacking, are thus not in line with the paper's original idea and thus we did not yet implement them.
To support stacking and similar approaches, you must change: https://github.com/adapter-hub/adapter-transformers/blob/master/src/transformers/adapters/lora.py
For the implementation, you can orientate yourself on the prefix tuning stacking:
https://github.com/adapter-hub/adapter-transformers/blob/master/src/transformers/adapters/prefix_tuning.py#L399-L464

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants