Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
vision-language-pretraining
audio-language-pretraining
audiovisual-language-pretraining
multimodal-representation-learning
-
Updated
May 28, 2024 - Python