I created this repository to deepen my understanding of fundamental models in deep learning. Specifically, I am focusing on computer vision models, including transformers and CNNs, as I use these extensively in my work. Drawing inspiration from Andrej Karpathy and AI-Summer, I strive to write better code by utilizing tools like einops and einsum. This is a long-term project for me, aimed at improving my implementation skills and growing as a machine learning engineer. While I'm not sure who else may benefit from this repository, I am committed to consistently improving and updating it.
- Transformer Encoder (Attention is all you need)
- Transformer Decoder (Attention is all you need)
- ViT
- Swin Transformer v1
- Swin Transformer v2
- BEiT
- yolov3
- UPerNet
- TBD