Skip to content

This is a PyTorch implementation of the Transformer model in the paper Attention Is All You Need.

License

Notifications You must be signed in to change notification settings

Uokoroafor/transformer_from_scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code style: black

transformer_from_scratch

This is a PyTorch implementation of the Transformer model in the paper Attention Is All You Need. I did this to help me better understand work I had already done on the Tutorial by Andrej Karpathy for nanoGPT and has certainly been helped by other open source repositories.

It enacts the components of the transformer architecture in the post-Norm style, which is the style used in the original paper.

The key components are:

  • Positional Encoding: This is a sine or cosine function that is added to the input embeddings to give the model a sense of position in the sequence.

  • Scaled Dot Product Attention: This is the attention mechanism used in the Transformer. It is a dot product between the query and key vectors, scaled by the square root of the dimension of the key vectors. The output is a weighted sum of the value vectors.

  • Multi-Head Attention: This is a concatenation of multiple attention heads. Each head is a scaled dot product attention mechanism. The output of each head is concatenated and then projected to the output dimension.

  • Feed Forward Network: This is a two layer fully connected network with a ReLU activation function in between the layers.

  • Residual Connections: These are connections that allow the gradients to flow through the network. They are added to the output of each sub-layer and then normalised by layer normalisation.

  • Layer Normalisation: This is a normalisation of the output of each sub-layer. It is a normalisation across the feature dimension.

  • Masking: This is a masking of the attention weights to prevent the model from attending to future tokens in the sequence.

Installation

git clone https://github.com/Uokoroafor/transformer_from_scratch
cd transformer_from_scratch
pip install -r requirements.txt

Project Structure

├── README.md
├── data
│   ├── __init__.py
│   └── europarl_fr_en
├── examples
│   ├── __init__.py
│   └── train_fr_en.py
├── models
│   ├── __init__.py
│   ├── decoder.py
│   ├── encoder.py
│   ├── multi_head_attention.py
│   ├── positional_encoding.py
│   ├── residual_block.py
│   └── transformer.py
├── embeddings
│   ├── __init__.py
│   ├── multi_head_attention.py
│   ├── positional_encoding.py
├── requirements.txt
└── utils
    ├── __init__.py
    ├── file_utils.py
    ├── train_utils.py
    ├── data_utils.py
    ├── logging_utils.py
    └── tokeniser.py

Usage

I have now included a number of utility files in the utils folder to help with handling the data and training the model. The main file to train on the europarl dataset is train_fr_en.py in the examples folder.

This file can be run with the following command:

python examples/train_fr_en.py

Note that it is training a model to translate from English to French but it is fairly easy to change this to any other language pair.

Results

TBC - the run will take a while to complete so this will be updated when there is capacity to run it.

References

License

License: MIT

About

This is a PyTorch implementation of the Transformer model in the paper Attention Is All You Need.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published