How many steps would be enough if i train this model from start? #14

lixuyuan102 · 2024-07-18T07:02:21Z

Hi! Nice work!
Could you share how many steps would be sufficient to train a new model? I'm trying to train a 16k FAcodec. The results reconstructed by ckpt 130,000 still sound different from the real speech, especially for the speaker timbre.

lixuyuan102 · 2024-07-18T07:08:08Z

Here is the loss curve:

Plachtaa · 2024-07-18T13:34:38Z

The model released was trained for 670k steps, normally 400k would be sufficient for codec, according to descript-audio-codec's practice

lixuyuan102 · 2024-07-19T01:35:46Z

The model released was trained for 670k steps, normally 400k would be sufficient for codec, according to descript-audio-codec's practice

Thanks!

lixuyuan102 · 2024-07-22T02:04:22Z

I have trained the model on voxceleb2 for 400K steps. However, the reconstructed speech sounds not as good as the demo page and the reconstructed result of the noisy speech sounds even worse.
Here are the samples:
O1: https://github.com/lixuyuan102/FAcodec/blob/master/ZCwVV3niXxo_00179.m4a
R1: https://github.com/lixuyuan102/FAcodec/blob/master/ZCwVV3niXxo_00179.wav
O2: https://github.com/lixuyuan102/FAcodec/blob/master/Zsus9yFgaJM_00132.m4a
R2: https://github.com/lixuyuan102/FAcodec/blob/master/Zsus9yFgaJM_00132.wav

Is there a problem with the data scale or something else？

Plachtaa · 2024-07-22T10:48:00Z

I have checked the samples you shared.
One thing I am noticing is your samples sound quite noisy. I don't know whether they are from your train set or not, but I don't suggest to include anything else except clean vocal data, as FAcodec is designed for speech instead of a universal audio codec. If your speech data for training has not gone through a vocal separation process, it may indeed affect model performance.

lixuyuan102 · 2024-07-23T11:18:44Z

I have checked the samples you shared. One thing I am noticing is your samples sound quite noisy. I don't know whether they are from your train set or not, but I don't suggest to include anything else except clean vocal data, as FAcodec is designed for speech instead of a universal audio codec. If your speech data for training has not gone through a vocal separation process, it may indeed affect model performance.

Tks, I'll try to process my training data with denoise and separation model.

lixuyuan102 · 2024-08-01T12:25:20Z

I found that although the training data has been denoised, tuning the pertained Facodec on this still results in unstable pronunciation. Moreover, the unstable pronunciation seems more significant if the original audios sound poorer. Can I just tune the timbre module and freeze the other parts to make it adapt to new speakers?

lixuyuan102 · 2024-08-01T12:31:19Z

BTW, I find that only the content, prosody, and timbre latent features are used when training the Facodec Redecoder. May I ask why the z_r is not employed?

lixuyuan102 changed the title ~~How many steps would be enough if i trian this model from start?~~ How many steps would be enough if i train this model from start? Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How many steps would be enough if i train this model from start? #14

How many steps would be enough if i train this model from start? #14

lixuyuan102 commented Jul 18, 2024 •

edited

Loading

lixuyuan102 commented Jul 18, 2024

Plachtaa commented Jul 18, 2024 •

edited

Loading

lixuyuan102 commented Jul 19, 2024

lixuyuan102 commented Jul 22, 2024

Plachtaa commented Jul 22, 2024

lixuyuan102 commented Jul 23, 2024

lixuyuan102 commented Aug 1, 2024 •

edited

Loading

lixuyuan102 commented Aug 1, 2024

How many steps would be enough if i train this model from start? #14

How many steps would be enough if i train this model from start? #14

Comments

lixuyuan102 commented Jul 18, 2024 • edited Loading

lixuyuan102 commented Jul 18, 2024

Plachtaa commented Jul 18, 2024 • edited Loading

lixuyuan102 commented Jul 19, 2024

lixuyuan102 commented Jul 22, 2024

Plachtaa commented Jul 22, 2024

lixuyuan102 commented Jul 23, 2024

lixuyuan102 commented Aug 1, 2024 • edited Loading

lixuyuan102 commented Aug 1, 2024

lixuyuan102 commented Jul 18, 2024 •

edited

Loading

Plachtaa commented Jul 18, 2024 •

edited

Loading

lixuyuan102 commented Aug 1, 2024 •

edited

Loading