- Build EN-MR and MR-EN language translation models then merged them into a bidirectional translation model by extracting the encoder and decoder components from each model and creating a new EncoderDecoderModel using the HuggingFace transformers library.
- Leveraging Huggingface's pretrained multilingual translation models, I developed a English-to-Marathi and Marathi to English translation model by fine-tuning hyperparameters and utilizing AutoModelForSeq2seqLM, AutoTokenizer, AutoConfig classes.
- Compared three different models i.e.
- Achieved remarkable results with the Helsinki-NLP which gave a loss rate of 0.5174, surpassing the other models.
The dataset I have collected is a comprehensive collection of English and Marathi translations obtained from various publicly available resources. It contains a total of 3,517,283 rows, making it a substantial dataset for language translation tasks. The dataset size is approximately 451 MB, indicating the richness of the data contained within it. To access and utilize this dataset conveniently, it can be downloaded and loaded using the Hugging Face datasets library.
from datasets import load_dataset
dataset = load_dataset("anujsahani01/English-Marathi")
The models can be accessed and tested on Huggingface, using the below links.
- Mbart[EN-MR] , Mbart[MR-EN]
- Helsinki-NLP[EN-MR] , Helsinki-NLP[MR-EN]
- AI4Bharat[EN-MR] , AI4Bharat[MR-EN]
After series of experiments and trials i finally found this set of Hyperparameters on which my model performed best.
Helsinki-NLP | Mbart | AI4Bharat |
---|---|---|
learning rate : 0.0005 | learning rate : 0.0005 | learning rate : 0.0005 |
max_steps : 10000 | max_steps : 10000 | max_steps : 8000 |
warmup steps : 50 | warmup steps : 50 | warmup steps : 50 |
weight_decay : 0.01 | weight_decay : 0.01 | weight_decay : 0.01 |
per_device_train_batch_size : 64 | per_device_train_batch_size : 12 | per_device_train_batch_size : 12 |
per_device_eval_batch_size : 64 | per_device_eval_batch_size : 12 | per_device_eval_batch_size : 12 |
evaluation_strategy : ‘no’ | evaluation_strategy : ‘no’ | evaluation_strategy : ‘no’ |
num_train_epochs : 1 | num_train_epochs : 1 | num_train_epochs : 1 |
remove_unused_columns : False | remove_unused_columns : False | remove_unused_columns : False |
The following losses were obtained for English to Marathi Language Translation Model. The best results were obtained using a fine-tuned Helsinki-NLP model.
Helsinki-NLP | Mbart | AI4Bharat |
---|---|---|
0.5174 | 0.8225 | 0.9779 |
The following losses were obtained for Marathi to English Language Translation Model. The best results were obtained using a fine-tuned Mbart model.
Helsinki-NLP | Mbart | AI4Bharat |
---|---|---|
0.6818 | 0.6712 | 0.7775 |
If you have any feedback, please reach out to me at:
Author: @anujsahani01