transformer_from_scratch

This is a PyTorch implementation of the Transformer model in the paper Attention Is All You Need. I did this to help me better understand work I had already done on the Tutorial by Andrej Karpathy for nanoGPT and has certainly been helped by other open source repositories.

It enacts the components of the transformer architecture in the post-Norm style, which is the style used in the original paper.

The key components are:

Positional Encoding: This is a sine or cosine function that is added to the input embeddings to give the model a sense of position in the sequence.
Scaled Dot Product Attention: This is the attention mechanism used in the Transformer. It is a dot product between the query and key vectors, scaled by the square root of the dimension of the key vectors. The output is a weighted sum of the value vectors.
Multi-Head Attention: This is a concatenation of multiple attention heads. Each head is a scaled dot product attention mechanism. The output of each head is concatenated and then projected to the output dimension.
Feed Forward Network: This is a two layer fully connected network with a ReLU activation function in between the layers.
Residual Connections: These are connections that allow the gradients to flow through the network. They are added to the output of each sub-layer and then normalised by layer normalisation.
Layer Normalisation: This is a normalisation of the output of each sub-layer. It is a normalisation across the feature dimension.
Masking: This is a masking of the attention weights to prevent the model from attending to future tokens in the sequence.

Installation

git clone https://github.com/Uokoroafor/transformer_from_scratch
cd transformer_from_scratch
pip install -r requirements.txt

Project Structure

├── README.md
├── data
│   ├── __init__.py
│   └── europarl_fr_en
├── examples
│   ├── __init__.py
│   └── train_fr_en.py
├── models
│   ├── __init__.py
│   ├── decoder.py
│   ├── encoder.py
│   ├── multi_head_attention.py
│   ├── positional_encoding.py
│   ├── residual_block.py
│   └── transformer.py
├── embeddings
│   ├── __init__.py
│   ├── multi_head_attention.py
│   ├── positional_encoding.py
├── requirements.txt
└── utils
    ├── __init__.py
    ├── file_utils.py
    ├── train_utils.py
    ├── data_utils.py
    ├── logging_utils.py
    └── tokeniser.py

Usage

I have now included a number of utility files in the utils folder to help with handling the data and training the model. The main file to train on the europarl dataset is train_fr_en.py in the examples folder.

This file can be run with the following command:

python examples/train_fr_en.py

Note that it is training a model to translate from English to French but it is fairly easy to change this to any other language pair.

Results

TBC - the run will take a while to complete so this will be updated when there is capacity to run it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

transformer_from_scratch

Installation

Project Structure

Usage

Results

References

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
blocks		blocks
data		data
embeddings		embeddings
examples		examples
layers		layers
models		models
utils		utils
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

Uokoroafor/transformer_from_scratch

Folders and files

Latest commit

History

Repository files navigation

transformer_from_scratch

Installation

Project Structure

Usage

Results

References

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages