Skip to content

Latest commit

 

History

History

vit

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Vision Transformers: What are they?

If you are familiar with the paper "Attention is All You Need", you would be familiar with the Transformers architecture. With its attention mechanism, you would be able to consider the context and focus of words in a sentence. You can read the paper here: Attention is All You Need

The key difference between Transformers and Vision Transformers (ViT) is that instead of consuming word tokens, ViT takes in patch embeddings of images as input. Everything else works the same way. You can read the paper in the README file at the root of this repository. Refer to this link for more clarification: Vision Transformers (ViT) Explained