A python implementation of “Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization”, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2021.
- Contributions
- A DP-RTF learning framework that embeds the sensor signals to a low-dimensional localization feature space is designed, which disentangles the localization cues from other factors including source signals, noise, reverberation, etc.
- a Novel DP-RTF Learning Network
- leveraging Monaural Speech Enhancement to Improve the Robustness of DP-RTF Estimation
- generalization to Unseen Binaural Configurations
- The DP-RTF learning based localization method takes full use of the spatial and spectral cues, which is demonstrated to perform better than several other methods on both simulated and real-world data in the noisy and reverberant environment.
- A DP-RTF learning framework that embeds the sensor signals to a low-dimensional localization feature space is designed, which disentangles the localization cues from other factors including source signals, noise, reverberation, etc.
- Head-related impulse responses (HRIRs): from CIPIC database
- Binaural room impulse responses (BRIRs): generated by Roomsim toolbox
- TIMIT dataset
- Diffuse noise: generated by arbitrary noise field generator with noise signals from NOISEX-92 database
-
Preparation
- Add soft link of "common" file to "DPRTF" file
ln -s [original path] [target path]
- Generate the lists of source signals and BRIRs, direct-path relative tranfer functions (DP-RTFs), room acoustic settings, and sensor signals for training, validation and test stages.
python -m common.getData --stage [*] --data [*]
- Add soft link of "common" file to "DPRTF" file
-
Training
python run.py --gpu-id [*]
-
Test
python run.py --gpu-id [*] --test
-
Pretrained models
- exp/00000000/model_12.pth: trained with fixed data
- exp/00000001/model_52.pth: trained with random data (generated on-the-fly)
If you find our work useful in your research, please consider citing:
@article{yang2021dprtf,
Author = "Bing Yang and Hong Liu and Xiaofei Li",
Title = "Learning deep direct-path relative transfer function for binaural sound source localization",
Journal = "{IEEE/ACM} Transactions on Audio, Speech, and Language Processing (TASLP)",
Volume = {29},
Pages = {3491-3503},
Year = {2021}}
@InProceedings{yang2021dprtf1,
author = "Bing Yang and Xiaofei Li and Hong Liu",
title = "Supervised direct-path relative transfer function learning for binaural sound source localization",
booktitle = "Proceedings of {IEEE} International Conference on Acoustics, Speech and Signal Processing (ICASSP)",
year = "2021",
pages = "825-829"}
MIT