Matrix Magic: Understanding Transformers with Matrices, Math, and Code.

Hello! I made this video to help fellow data scientists and developers better understand the Transformer architecture that underlies the GPT models and most modern large language models! I was struggling until I finally just created a spreadsheet with a toy example matrix and worked out each matrix transformation one step at a time. It was tough going, but once I was done the concepts finally "clicked" for me. If you are on the same journey of understanding, I hope this helps you too!

https://youtu.be/aXLIebCK0pE

Resources: Andrej Kaparthy’s YouTube video:

Let’s build GPT: from scratch, in code, spelled out. Andrej Kaparthy’s Colab Notebook:
Building a GPT The famous paper with the Transformer Architecture:
Attention is All You Need GPT Papers
Language Models are Few Shot Learners (GPT-3)
Language Models are Unsupervised Multitask Learners (GPT-2) YouTube video explaining Self-Attention:
Intuition Behind Self Attention in Transformer Networks Helpful blog post with matrix diagrams:
Step-by-Step Illustrated Explanations for Transformer Stanford lecture on word vectors:
Stanford CS224N: NLP with Deep Learning | Winter 2021 | Lecture 1 - Intro & Word Vectors

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Matrix Magic Youtube Presentation.pptx.pdf		Matrix Magic Youtube Presentation.pptx.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Matrix Magic: Understanding Transformers with Matrices, Math, and Code.

About

Releases

Packages

nadaataiyab/matrixmagic-transformer-talk

Folders and files

Latest commit

History

Repository files navigation

Matrix Magic: Understanding Transformers with Matrices, Math, and Code.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages