You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to train a T5 model from scratch in a new language, but I am new to this framework. I went through the documentation, issues, and discussions; however, I have some doubts about the proper installation and use of the library.
I got a TPU with tpu_vm_base version, and then I followed these steps: https://github.com/google-research/t5x/tree/main#installation. Am I missing some dependencies to install (e.g., do I have to install also t5 library)? Is there a requirements.txt there to install the proper library versions?
I read in this issue Pretraining from scratch #24 that seqio Tasks are used to set up the data pipelines. I understood that I have to create a new TaskRegistry and MixtureRegistry for my datasets. However, it is not clear to me where I have to put this code. Does it go in the gin file?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello @adarob
I want to train a T5 model from scratch in a new language, but I am new to this framework. I went through the documentation, issues, and discussions; however, I have some doubts about the proper installation and use of the library.
I got a TPU with tpu_vm_base version, and then I followed these steps: https://github.com/google-research/t5x/tree/main#installation. Am I missing some dependencies to install (e.g., do I have to install also t5 library)? Is there a requirements.txt there to install the proper library versions?
I read in this issue Pretraining from scratch #24 that seqio Tasks are used to set up the data pipelines. I understood that I have to create a new
TaskRegistry
andMixtureRegistry
for my datasets. However, it is not clear to me where I have to put this code. Does it go in the gin file?I already have my own SentencePiece tokenizer. Is it enough to change the name or path of the tokenizer in this gin file https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/base.gin?
I would really appreciate your help in this matter. Thanks for your contribution to this library to the community.
Beta Was this translation helpful? Give feedback.
All reactions