GitHub - KristinaRay/english-arabic-nmt-bot: Telegram Bot for English

Telegram Bot for English - Arabic Neural Machine Translation

It can be found here -> https://t.me/english_arabic_translator_bot

The bot is deployed on Oracle Cloud

Demo

Dataset

The OpenSubtitles dataset for English-Arabic languages is used to train the Seq2Seq model link to download

Data preprocessing

To download and preprocess a file in order to remove extra characters and clean up data, run

python data/get_dataset.py --sample_size 5000000 --max_text_len 150

Tokenization is performed using YouTokenToMe BPE-tokenizer

Model

The implementation of the Transformer in PyTorch with 6 layered decoder and encoder and 8 multi attention heads with Glorot initialized parameters.

Reference

Attention Is All You Need paper
Understanding the difficulty of training deep feedforward neural networks paper

Learning rate

For the training learning rate 0.00005 is used with warm up for 30000 iterations

Model pruning

The implementation of the method Voita et al. in PyTorch paper

2 experiments of the model attention heads pruning were carried out with λ = 0.05 experiment_1 and λ = 0.01 experiment_2

Results

For λ = 0.05 91 retained heads, for λ = 0.01 89 retained heads.

Reference

https://github.com/lena-voita/the-story-of-heads
Are Sixteen Heads Really Better than One? paper
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned paper
Learning Sparse Neural Networks through L0 Regularization paper

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
bots		bots
data		data
experiment_1		experiment_1
experiment_2		experiment_2
logs		logs
models		models
pics		pics
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Demo

Dataset

Data preprocessing

Model

Learning rate

Model pruning

Results

About

Releases

Packages

Languages

License

KristinaRay/english-arabic-nmt-bot

Folders and files

Latest commit

History

Repository files navigation

Demo

Dataset

Data preprocessing

Model

Learning rate

Model pruning

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages