diff --git a/docs/source/_ml_basics/transformer.rst b/docs/source/_ml_basics/transformer.rst index da7eb40..e4baeda 100644 --- a/docs/source/_ml_basics/transformer.rst +++ b/docs/source/_ml_basics/transformer.rst @@ -88,4 +88,13 @@ The outputs of all attention heads are concatenated and then linearly transforme References ------------ -Vaswani. `Attention is all you need. `_ Advances in neural information processing systems. 2017. +- **Original Attention paper:** Vaswani. `Attention is all you need. `_ Advances in neural information processing systems. 2017. +- **Blogpost on Attention Machanism:** + - `Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) `_ + - `Understanding and Coding the Self-Attention Mechanism of Large Language Models From Scratch `_ +- **Blogpost on Transformers:** + - `The Illustrated Transformer `_ + - `Transformer Explainer `_ +- **Tutorials on Building a Transformer with PyTorch:** + - `Building a Transformer with PyTorch `_ + - `The Annotated Transformer `_ \ No newline at end of file