Add mt5 support #568

prafiny · 2023-07-13T15:16:31Z

🌟 New adapter setup

Model description

mT5-based models are multilingual equivalent of T5. Therefore, they are extremely useful to implement few-shot PEFT SOTA (as of summer 2023) such as LoRA or (IA)³ in multilingual contexts

https://huggingface.co/docs/transformers/model_doc/mt5 :

The mT5 model was presented in mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.

The abstract from the paper is the following:

The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent “accidental translation” in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

Open source status

the model implementation is available: https://huggingface.co/docs/transformers/model_doc/mt5
the model weights are available: https://huggingface.co/bigscience/T0, https://huggingface.co/models?search=mt5
who are the authors: Authors from mT5: A massively multilingual pre-trained text-to-text transformer and @patrickvonplaten for the huggingface implem

Note

I am willing to help on this

prafiny · 2023-07-13T15:42:06Z

From what I can read, there is no structural difference between t5 and mt5 :
https://arxiv.org/abs/2010.11934

The model architecture and training procedure that we use for mT5 closely follows that of T5. Specifically, we base mT5 on the “T5.1.1” recipe¹,

The implementation would then be a copy of T5* mixins or aliases to it ?

https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511 ↩

calpt · 2023-11-19T13:11:42Z

The implementation would then be a copy of T5* mixins or aliases to it ?

Yes, mT5 can basically re-use the mixins of T5 and copy its model integration. With the switch to the new codebase (see #584), we're open to new model integrations again (see updated guide). mT5 definitely would be a great addition!

sotwi · 2024-01-05T12:30:47Z

Hello! I am very interested on this. Is someone actively working on it? If not, I would like to help getting the implementation started.

Edit: Just followed the updated guide and did a very quick port. I followed the approach that the mBART implementation took(they reused the BART mixins, I reused the T5 mixins) so the changes were very minimal. I will submit a pull request. I hope it works.

prafiny added the enhancement New feature or request label Jul 13, 2023

sotwi mentioned this issue Jan 5, 2024

Adding MT5 support #629

Merged

calpt linked a pull request Jan 5, 2024 that will close this issue

Adding MT5 support #629

Merged

calpt closed this as completed in #629 Jan 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mt5 support #568

Add mt5 support #568

prafiny commented Jul 13, 2023 •

edited

Loading

prafiny commented Jul 13, 2023 •

edited

Loading

calpt commented Nov 19, 2023

sotwi commented Jan 5, 2024 •

edited

Loading

Add mt5 support #568

Add mt5 support #568

Comments

prafiny commented Jul 13, 2023 • edited Loading

🌟 New adapter setup

Model description

Open source status

Note

prafiny commented Jul 13, 2023 • edited Loading

Footnotes

calpt commented Nov 19, 2023

sotwi commented Jan 5, 2024 • edited Loading

prafiny commented Jul 13, 2023 •

edited

Loading

prafiny commented Jul 13, 2023 •

edited

Loading

sotwi commented Jan 5, 2024 •

edited

Loading