In this repository, I store the code for Fine Tuning Meta's Llama-2 Model for Neural Machine Translation From Bengali to English Language
Chosen Task:
Neural Machine Translation (NMT) from Bengali to English Language
Why I choose it?
I have been working with Neural Machine Translation for a while. For my research purpose, I am exploring different Machine Translation model. I know that we can fine tune any LLM for doing a specific task. As, I am working with Machine Translation, I want to see the performance of LLM for Machine Translation, I also have a Good Dataset. So, I think it will be a good choice for me to work on this task.
Base Dataset: BUET-BanglaNMT Dataset(2.5 Million)
Preprocessed Dataset: Preprocessed Dataset(2.1 Million)
<
Small Dataset: Small Dataset(200k)
Why I choose this dataset?
This is one of the largest Bengali to English parallel corpus available. I have format the dataset for my task, according to model. I have started working with large dataset. But for low resource and time, I have also created a small dataset for my task and fine tune the model with that dataset.
I have used the BUET-BanglaNMT Dataset from HuggingFace. It contains around 2.5 million pairs of Bengali and English sentences.
I have used the Meta's Llama-2 Model from Meta. This is my fine-tuned adapter: Fine-Tuned Llama-2