Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mt5 support #568

Closed
3 tasks done
prafiny opened this issue Jul 13, 2023 · 3 comments · Fixed by #629
Closed
3 tasks done

Add mt5 support #568

prafiny opened this issue Jul 13, 2023 · 3 comments · Fixed by #629
Labels
enhancement New feature or request

Comments

@prafiny
Copy link

prafiny commented Jul 13, 2023

🌟 New adapter setup

Model description

mT5-based models are multilingual equivalent of T5. Therefore, they are extremely useful to implement few-shot PEFT SOTA (as of summer 2023) such as LoRA or (IA)³ in multilingual contexts

https://huggingface.co/docs/transformers/model_doc/mt5 :

The mT5 model was presented in mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.

The abstract from the paper is the following:

The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent “accidental translation” in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

Open source status

Note

I am willing to help on this

@prafiny prafiny added the enhancement New feature or request label Jul 13, 2023
@prafiny
Copy link
Author

prafiny commented Jul 13, 2023

From what I can read, there is no structural difference between t5 and mt5 :
https://arxiv.org/abs/2010.11934

The model architecture and training procedure that we use for mT5 closely follows that of T5. Specifically, we base mT5 on the “T5.1.1” recipe1,

The implementation would then be a copy of T5* mixins or aliases to it ?

Footnotes

  1. https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511

@calpt
Copy link
Member

calpt commented Nov 19, 2023

The implementation would then be a copy of T5* mixins or aliases to it ?

Yes, mT5 can basically re-use the mixins of T5 and copy its model integration. With the switch to the new codebase (see #584), we're open to new model integrations again (see updated guide). mT5 definitely would be a great addition!

@sotwi
Copy link
Contributor

sotwi commented Jan 5, 2024

Hello! I am very interested on this. Is someone actively working on it? If not, I would like to help getting the implementation started.

Edit: Just followed the updated guide and did a very quick port. I followed the approach that the mBART implementation took(they reused the BART mixins, I reused the T5 mixins) so the changes were very minimal. I will submit a pull request. I hope it works.

@calpt calpt linked a pull request Jan 5, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants