- This repository contains an implementation of a Transformer model for German-to-English translation, inspired by the seminal paper "Attention Is All You Need". The model is built using TensorFlow and Keras and follows the original architecture proposed in the paper.
- Implements a full Transformer model from scratch
- Trains on the German-English dataset
- Uses self-attention mechanisms for efficient sequence modeling
- Supports positional encoding, multi-head attention, and layer normalization
pip install tensorflow numpy pandas
This model uses the German-English "DataSet".
The model is built using TensorFlow and Keras, following the original architecture proposed in the paper. It consists of an encoder-decoder structure with attention mechanisms at its core.
Since the Transformer model does not use recurrence or convolution, positional encoding is added to the input embeddings to inject information about the position of words in a sequence.
The Transformer relies heavily on attention mechanisms to capture relationships between words. The implementation of the Attention mechanism involves the following three steps:
Causal masking ensures that the model does not attend to future tokens during training, maintaining the autoregressive nature of the decoder.
This step computes attention scores based on the dot-product similarity of query, key, and value vectors, scaled by the square root of the key dimension.
Multi-head attention allows the model to focus on different parts of the sequence simultaneously, improving the capture of complex dependencies.
The encoder consists of multiple layers, each containing:
- Multi-head self-attention
- Feed-forward neural networks
- Layer normalization and dropout
The decoder follows a similar structure to the encoder but includes additional masked self-attention layers and cross-attention layers to attend to the encoder's output.
To train the Transformer model, run:
python transformer/transformer_de_to_en.py
After training, the model achieves competitive translation performance. Sample translations:
German | English (Predicted) |
---|---|
"Guten Morgen!" | "Good morning!" |
"Wie geht es dir?" | "How are you?" |
"Ich liebe maschinelles Lernen." | "I love machine learning." |
- Vaswani et al., Attention Is All You Need, 2017 (Paper)
- TensorFlow documentation on Transformer models
This project is licensed under the MIT License.