Name		Name	Last commit message	Last commit date
parent directory ..
utils		utils
README.md		README.md
main.py		main.py

README.md

Vision Transformers: What are they?

If you are familiar with the paper "Attention is All You Need", you would be familiar with the Transformers architecture. With its attention mechanism, you would be able to consider the context and focus of words in a sentence. You can read the paper here: Attention is All You Need

The key difference between Transformers and Vision Transformers (ViT) is that instead of consuming word tokens, ViT takes in patch embeddings of images as input. Everything else works the same way. You can read the paper in the README file at the root of this repository. Refer to this link for more clarification: Vision Transformers (ViT) Explained

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vit

vit

README.md

Vision Transformers: What are they?

Files

vit

Directory actions

More options

Directory actions

More options

Latest commit

History

vit

Folders and files

parent directory

README.md

Vision Transformers: What are they?