You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository contains the implementation of the Vision-and-Language Transformer (ViLT) model fine-tuned for Visual Question Answering (VQA) tasks. The project is structured to be easy to set up and use, providing a streamlined approach for experimenting with different configurations and datasets.