Hush-Hush Speak: Speech Reconstruction Using Silent Videos

Code for the paper Hush-Hush Speak: Speech Reconstruction Using Silent Videos available at: https://www.isca-speech.org/archive/Interspeech_2019/pdfs/3269.pdf

The paper details can also be accessed from: https://www.isca-speech.org/archive/Interspeech_2019/abstracts/3269.html

Authors: Shashwat Uttam*, Yaman Kumar*, Dhruva Sahrawat*, Mansi Aggarwal, Rajiv Ratn Shah, Debanjan Mahata, Amanda Stent (* --> Equal contribution)

The supplementary (auxillary) folder containing the reconstructed audio for both English as well as Hindi speech and the human annotations can be found at https://drive.google.com/open?id=1ZWS4L3SaZyb7SNwTaMpY96uJRYfFcVEG .

Alternate link: https://drive.google.com/open?id=1bvY9_1OT4xzDELnRnPJpWUFRdzQj0lu3

Instructions

First download OuluVS2
Run https://github.com/midas-research/hush-hush-speak/blob/master/prepare_files.py
Create a audio_names.txt which contains all the audio files you want to train the audio autoencoder on and then run https://github.com/midas-research/hush-hush-speak/blob/master/autoenc_train.py to train the audio autoencoder
For each view combination prepare a list.txt which contains the speaker id (sx where x is [1,53]) and the video number is (uy where y is (1,70) first 1-30 represent Oulu digits 31-60 represents phrases and 61-70 represents sentences) these will represent the whole dataset. Now in https://github.com/midas-research/hush-hush-speak/blob/master/preprocess_and_integrate_data.py set train_lines and test_lines there to the indexes where there is test and train data. Also set the total no of files N=60 you want to split the training data into. Also set views = ['1’] to the required view combination. And run the preprocess_and_integrate_data.py to get the processed data. Note you have to run it again for each view combination. Also you would have to set views variable again.
Now run https://github.com/midas-research/hush-hush-speak/blob/master/train_video_enc.py to get the trained model and the predicted values for test set.
Use https://github.com/midas-research/hush-hush-speak/blob/master/audio_proc.py functions like melspectrogram and inv_melspectrogram for further evaluation.

Dependencies

Note that the code was validated for only the following package versions:

Tensorflow version 1.11
Keras 2.2.4
Cuda v9.0.176)

Citation

We kindly remind you that if you find that our code/paper was useful for your research, please cite our paper in the following manner:

@inproceedings{Uttam2019,
  author={Shashwat Uttam and Yaman Kumar and Dhruva Sahrawat and Mansi Aggarwal and Rajiv Ratn Shah and Debanjan Mahata and Amanda Stent},
  title={{Hush-Hush Speak: Speech Reconstruction Using Silent Videos}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={136--140},
  doi={10.21437/Interspeech.2019-3269},
  url={http://dx.doi.org/10.21437/Interspeech.2019-3269}
}

Contact

You can contact us (the authors) at shashwatu.co@nsit.net.in, ykumar@adobe.com, dhruva15026@iiitd.ac.in (Please cc all of us for quick reply)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hush-Hush Speak: Speech Reconstruction Using Silent Videos

Instructions

Dependencies

Citation

Contact

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
audio_proc.py		audio_proc.py
autoenc_train.py		autoenc_train.py
prepare_files.py		prepare_files.py
preprocess_and_integrate_data.py		preprocess_and_integrate_data.py
train_video_enc.py		train_video_enc.py

midas-research/hush-hush-speak

Folders and files

Latest commit

History

Repository files navigation

Hush-Hush Speak: Speech Reconstruction Using Silent Videos

Instructions

Dependencies

Citation

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages