This is a small implementation the original implementation from the "Attention Is All You Need" paper. Since the original paper was made for text translation, I only used the decoder part, to create a GPT.
This is what my model generates: (you can read a longer version here)
IO: Nay! OXFORD: Art my maid is gloves affection, I am yift, My lord, first, and good valour together; And all the wacks we holl. What news the prince watch is that way of her, And yet but that life
The entire project is very inspired by micrograd
Everything was made for educational purpouses.
I focused mainly on the transformer (the interesting part), but this project has 2 other side models which helped me understand the matter better. This project has 3 separate things, each one has its own jupyter notebook:
- A bigram
- A very small sized text model
- The actual "Attention is all you need" paper implementation
A few notes:
- The first two were made in order to introduce myself to the matter of text generation. Therefore they are implemented on a very low level.
- The first two models predict (create) new "name sounding" words.
- The big model predicts (creates) sheakepeare-resk texts.
These are the results I obtained from each model: I'm sure could fine tune the hyperparamteres better for each model, but that wasn't the point for this project.
Bigram:
- PSOPENEY
- BLEONDARSKICAT
- AKEITUR
- BATTEVERA
- QUHEWAN
- ROSAWAWSESKSHARINTTLSAIEKIT
Small model:
- WALFAILMAN
- TATTIN
- DUS
- SETTE
- BURG
- MAGNIS
Transformer:
IO: Nay! OXFORD: Art my maid is gloves affection, I am yift, My lord, first, and good valour together; And all the wacks we holl. What news the prince watch is that way of her, And yet but that life