[NeurIPS 2024] Code, Dataset, Samples for the VATT paper “ Tell What You Hear From What You See - Video to Audio Generation Through Text”
generative-models neurips video-to-audio audio-generation generative-ai multi-modal-llms neurips-2024 neurips-2024-presentation sight-to-sound visual-to-audio
-
Updated
Jul 24, 2025 - Python