GitHub

A state-of-the-art Translator to translate EnglishToHindi and GermanToEnglish

Explore the docs »

View Demo · Report Bug · Request Feature

About The Project

Implemented takeaways from the original transformer paper "Attention is all you need" and deployed an idea to code a language translator /converter that extracts data from the epochs entered in one language , trains it and convenes the translation of one language into another. The proposed automated English-to-local-language system architecture is designed according to the transfer-based approach to automating translation. At the first layer of the architecture is a natural language processing tool that performs morphological analysis: sentence tokenization, part-of-speech tagging, phrase formation and figure of speech tagging. The second layer of the architecture comprises of a grammar generator that is responsible for converting English language structure to target local language structures. At the third layer of the application, results produced by the grammar generator are mapped with matching terms in the bilingual dictionary. The architectures for a transfer-based model and automated German-English and English-Hindi translator are trained respectively as per as the above mentioned procedure.

Architecture: The Image depicts flowchart for working of Encoder and Decoder based architecture from Vaswani et. al.

Image taken from Research Paper »

GermanToEnglish :

Finding libraries to tokenise and setup vocab was fairly simple . We used Spacy for the same purpose and trained the model on Multi30k dataset for 150 epochs . One important thing to note is value of some parameters such as HeadCount and Encoder-Decoder iterations where reduced in comparison to original paper considering limited computational power we had at our disposal. . Prerained weights for GerToEng can be found here

EnglishToHindi :

Tokenising and setting up the vocab was quite difficult given the complexities of Hindi grammers , but task was done using INLTK lib. The model was trained on english & hindi parallel corpus from Dataset and more 80k lines from 24 lakh parallel corpus from Dataset , preprocessed and cleaned version can be found here. Same goes here too ,Value of some parameters such as HeadCount and Encoder-Decoder iterations were reduced in comparison to original paper considering limited computational power we had at our disposal. Prerained weights for EngToHin can be found here

Training loss plots:

For English to Hindi :

For German to English :

Built With

TorchText
Spacy
INLTK
Nvidia cuda toolkit

Getting Started

Step 1. Clone the repository.

Step 2. Download the dataset from Here and place it in the respective data file. Remember both the translation pipelines have diffterent data folder

Installation

Python 3.7
Install python libraries
```
conda install -r requirements.txt
```

Testing

Run training file (engtohindi.py or gertoeng.py) , it will build tokenised vocab for you.(this process is needed to be done only once)
Add check points to the Folder: Download the checkpoints from Here and place it in the respective checkpoints file. Remember both the translation pipelines have diffterent checkpoints folder
Edit the sentence variable in the eval.py file , for the sentce you want to translate .
run eval.py file
```
python GerToEng/eval.py
```

Training

Start from scratch
```
python GerToEng/GermanToEnglish.py
```
To resume training : change load parameter in hyperparameter file to true , the model will automatically load the checkpoints
```
python GerToEng/GermanToEnglish.py
```

References

MadeBy

Contact Yatharth Kapadia @yatharthk2.nn@gmail.com
Contact Abhinav Chandra @abhinavchandra0526@gmail.com
contact Siddharth Jain @jainsiddharth641@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.vscode		.vscode
EngtoHin		EngtoHin
GerToEng		GerToEng
IVG		IVG
__pycache__		__pycache__
.gitignore		.gitignore
README.md		README.md
links_for_intallation.txt		links_for_intallation.txt
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A state-of-the-art Translator to translate EnglishToHindi and GermanToEnglish

About The Project

Architecture: The Image depicts flowchart for working of Encoder and Decoder based architecture from Vaswani et. al.

GermanToEnglish :

EnglishToHindi :

Training loss plots:

For English to Hindi :

For German to English :

Built With

Getting Started

Installation

Testing

Training

References

MadeBy

About

Releases

Packages

Contributors 3

Languages

yatharthk2/SPRECHEN

Folders and files

Latest commit

History

Repository files navigation

A state-of-the-art Translator to translate EnglishToHindi and GermanToEnglish

About The Project

Architecture: The Image depicts flowchart for working of Encoder and Decoder based architecture from Vaswani et. al.

GermanToEnglish :

EnglishToHindi :

Training loss plots:

For English to Hindi :

For German to English :

Built With

Getting Started

Installation

Testing

Training

References

MadeBy

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages