Implementing a GPT (Generative Pre-trained Transformer) model from scratch, guided by the insightful tutorial from the visionary genius himself: Andrej Karpathy trained on Shakespeare's work.
- About 10M parameters
- Trained on Shakespeare's work
- Implemented Self attention and Multiheaded Attention
- Decoder only model