Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
-
Updated
May 28, 2024 - Python
Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Recognize Any Regions
Self-Supervised Visual-Tactile Representation Learning via Multimodal Contrastive Training
Freeze the Backbone: A Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-training. Thesis of MSc AI degree at Imperial College London.
Mini-batch selective sampling for knowledge adaption of VLMs for mammography.
Add a description, image, and links to the multimodal-representation-learning topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-representation-learning topic, visit your repo's landing page and select "manage topics."