Skip to content

X-UP-Lab/3D-Audio-Visual-Segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

3D Audio-Visual Segmentation

Artem Sokolov, Swapnil Bhosale, Xiatian Zhu

NeurIPS 2024 Workshop on Audio Imagination

Project page arXiv Dataset

teaser

This repository is the official implementation of "3D Audio-Visual Segmentation". In this paper, we introduce a novel research problem, 3D Audio-Visual Segmentation, extending the existing AVS to the 3D output space. To facilitate this research, we create the very first simulation based benchmark, 3DAVS-S34-O7, providing photorealistic 3D scene environments with grounded spatial audio under single-instance and multi-instance settings, across 34 scenes and 7 object categories. Subsequently, we propose a new approach, EchoSegnet, characterized by integrating the ready-to-use knowledge from pretrained 2D audio-visual foundation models synergistically with 3D visual scene representation through spatial audio-aware mask alignment and refinement.

Updates

  • The 3DAVS-S34-O7 dataset and EchoSegnet code are now available !🎉
  • Data & Code coming soon!

3DAVS-S34-O7 dataset

The dataset can be downloaded from this link. Please refer to the description for details on the included data and usage.

The dataset is an Adapted Material derived from and compiled using several external resources. You must comply with the non-commercial terms of the original licenses:

  • Habitat-Matterport3D (HM3D): Restricted to Non-Commercial Research Use.
  • BBC Sound Effects: Restricted to Non-Commercial/Educational Use.
  • ESC-50 Dataset: Licensed under Creative Commons Attribution-NonCommercial 4.0 (CC-BY-NC 4.0).

By using the 3DAVS-S34-O7 dataset, you agree to the full terms of the CC-BY-NC 4.0 Public License.

Method: EchoSegnet

teaser

Installation

See the Installation Guide for environment setup and dependency installation to run the evaluation.

Evaluation

Refer to the Evaluation Guide for instructions on running evaluation of Echosegnet on the proposed 3DAVS-S34-O7 dataset.

Citation

If you find our project useful, please use the following BibTeX entry:

@inproceedings{sokolov20243daudiovisualsegmentation,
    title     = {3D Audio-Visual Segmentation},
    author    = {Sokolov, Artem and Bhosale, Swapnil and Zhu, Xiatian},
    booktitle = {Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation},
    year      = {2024}
}

Contact

For feedback or questions please contact Artem Sokolov

About

[NeurIPS 2024 Audio Imagination Workshop] Official implementation of the paper: 3D Audio-Visual Segmentation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •