DP-RTF-Learning

A python implementation of “Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization”, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2021.

Contributions
- A DP-RTF learning framework that embeds the sensor signals to a low-dimensional localization feature space is designed, which disentangles the localization cues from other factors including source signals, noise, reverberation, etc.
  - a Novel DP-RTF Learning Network
  - leveraging Monaural Speech Enhancement to Improve the Robustness of DP-RTF Estimation
  - generalization to Unseen Binaural Configurations
- The DP-RTF learning based localization method takes full use of the spatial and spectral cues, which is demonstrated to perform better than several other methods on both simulated and real-world data in the noisy and reverberant environment.

Datasets

Head-related impulse responses (HRIRs): from CIPIC database
Binaural room impulse responses (BRIRs): generated by Roomsim toolbox
TIMIT dataset
Diffuse noise: generated by arbitrary noise field generator with noise signals from NOISEX-92 database

Quick start

Preparation
- Add soft link of "common" file to "DPRTF" file
```
ln -s [original path] [target path]
```
- Generate the lists of source signals and BRIRs, direct-path relative tranfer functions (DP-RTFs), room acoustic settings, and sensor signals for training, validation and test stages.
```
python -m common.getData --stage [*] --data [*] 
```
Training
```
python run.py --gpu-id [*]
```
Test
```
python run.py --gpu-id [*] --test
```
Pretrained models
- exp/00000000/model_12.pth: trained with fixed data
- exp/00000001/model_52.pth: trained with random data (generated on-the-fly)

Citation

If you find our work useful in your research, please consider citing:

@article{yang2021dprtf,
    Author = "Bing Yang and Hong Liu and Xiaofei Li",
    Title = "Learning deep direct-path relative transfer function for binaural sound source localization",
    Journal = "{IEEE/ACM} Transactions on Audio, Speech, and Language Processing (TASLP)",
    Volume = {29},	
    Pages = {3491-3503},
    Year = {2021}}

@InProceedings{yang2021dprtf1,
    author = "Bing Yang and Xiaofei Li and Hong Liu",
    title = "Supervised direct-path relative transfer function learning for binaural sound source localization",
    booktitle = "Proceedings of {IEEE} International Conference on Acoustics, Speech and Signal Processing (ICASSP)",
    year = "2021",
    pages = "825-829"}

Licence

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
code		code
data		data
exp		exp
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DP-RTF-Learning

Datasets

Quick start

Citation

Licence

About

Releases

Packages

Languages

License

BingYang-20/DP-RTF-Learning

Folders and files

Latest commit

History

Repository files navigation

DP-RTF-Learning

Datasets

Quick start

Citation

Licence

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages