Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance bottleneck (not from model) #29

Open
ghost opened this issue Dec 16, 2020 · 2 comments
Open

Performance bottleneck (not from model) #29

ghost opened this issue Dec 16, 2020 · 2 comments

Comments

@ghost
Copy link

ghost commented Dec 16, 2020

When I've call the first time 'prep_data_nus.py', i've notice the long preprocessing time to generate the hdf5. Approximatly 3 hours on my computer to generate the 96 hdf5 files. I've notice the sp_to_mgc performance bottleneck (SPTK dependency).

To produce a 2m54 song (the Elton John one from the NUS database), my computer need more than 13 minutes. 10 minutes more than the song duration. I've think that it's because I call the model on my CPU (not GPU), but i've do some measurements and found that the problem is clearly not the model and the 'AI' part.

The inference call:

import models
import config

file_name = 'nus_JLEE_sing_15.hdf5'

singer_name = 'MPOL'
singer_index = config.singers.index(singer_name)

model = models.WGANSing()
model.test_file_hdf5_no_question(file_name, singer_index)

The test_file_hdf5_no_question is just the same as test_file_hdf5 without the questions, but with function timing measurment and only the synthesized audio generated (not the ground truth)

The timing result (in seconds)

- load_model [*]   :   2.7976150512695
- read_hdf5_file   :   0.0341496467590
- process_file [*] :   3.0663671493530
- feats_to_audio   : 770.0193181037903

[*] Tensorflow calls

Clearly, the AI part is very fast, even on CPU. The problem come from the audio regeneration.

Details of feats_to_audio calls (always in seconds)

- f0_to_hertz   :   0.0130412578582
- mfsc_to_mgc   :   0.7175555229187
- mgc_to_sp     : 737.2016060352325
- pw.synthesize :  25.4196729660034
- sf.write      :   0.7051553726196

The PyWorld synthesize call is acceptable with 25 seconds (14% of the global audio duration), but the SPTK call is not.

Sadly, to my knowledge, this is the only fast code (C code) to generate Mel-Generalized Cepstrum conversion. And this is not a question of GPU because this is a pure CPU code. What the hell with this algorithm ?!?

I know my computer is a oldskool one : Dell Workstation T7400 with an Intel Xeon 4 cores @ 2.33GHz and 16GB RAM. But it works very well for many things except the pure Deep Learning stuffs.

I don't know if something it's possible in the future with WGANSing because the MGC is in the heart of the project, but I will investigate to find a way to optimize this process. I'm sure it's possible to reduce the computation time with some tricks.

In any case, well done with WGANSing, love that kind of project !

@Kerry0123
Copy link

hi, to generate singing voice, it expects a .hdf5 file from the dataset. Generated .hdf5 needs wave file, Can it not use wave files?

@kehuantiantang
Copy link

I have the same experience with you, where sp_to_mfsc in data preparation and mfsc_to_mgc in inference is time consuming, and some hyper-parameter like 0.45 in sp_to_mfcs also make me confused. Maybe melGAN can work better than mfsc_to_mgc to convert spectrum into signal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants