LAVIS - A One-stop Library for Language-Vision Intelligence
-
Updated
Oct 11, 2024 - Jupyter Notebook
LAVIS - A One-stop Library for Language-Vision Intelligence
A one stop repository for generative AI research updates, interview resources, notebooks and much more!
Code for ALBEF: a new vision-language pre-training method
Real-time and accurate open-vocabulary end-to-end object detection
Multimodal-GPT
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
Oscar and VinVL
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
日本語LLMまとめ - Overview of Japanese LLMs
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
My Reading Lists of Deep Learning and Natural Language Processing
Codebase for Aria - an Open Multimodal Native MoE
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Add a description, image, and links to the vision-and-language topic page so that developers can more easily learn about it.
To associate your repository with the vision-and-language topic, visit your repo's landing page and select "manage topics."