You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A recently published paper introduced a strategy called "trans-tokenization", which "focuses on adapting a high-resource monolingual LLM to an unseen target language by initializing the token embeddings of the target language using a weighted average of semantically similar token embeddings from the source language." We should investigate whether this approach could improve the performance of adding trained tokens to NLLB.
The text was updated successfully, but these errors were encountered:
A recently published paper introduced a strategy called "trans-tokenization", which "focuses on adapting a high-resource monolingual LLM to an unseen target language by initializing the token embeddings of the target language using a weighted average of semantically similar token embeddings from the source language." We should investigate whether this approach could improve the performance of adding trained tokens to NLLB.
The text was updated successfully, but these errors were encountered: