-
Notifications
You must be signed in to change notification settings - Fork 318
Description
Description
In the function [average_mcd](https://github.com/modelscope/ClearerVoice-Studio/blob/main/speechscore/scores/mcd.py), the condition
if MCD_mode == "plain":is checked twice in a row, causing logical redundancy and potential dead code paths.
Here is the problematic section:
if MCD_mode == "plain":
# pad 0
if len(loaded_ref_wav) < len(loaded_syn_wav):
loaded_ref_wav = np.pad(loaded_ref_wav, (0, len(loaded_syn_wav) - len(loaded_ref_wav)))
else:
loaded_syn_wav = np.pad(loaded_syn_wav, (0, len(loaded_ref_wav) - len(loaded_syn_wav)))
# extract MCEP features
ref_mcep_vec = self.wav2mcep_numpy(loaded_ref_wav, score_rate)
syn_mcep_vec = self.wav2mcep_numpy(loaded_syn_wav, score_rate)
if MCD_mode == "plain": # ← redundant check
path = []
for i in range(len(ref_mcep_vec)):
path.append((i, i))
elif MCD_mode == "dtw":
_, path = fastdtw(ref_mcep_vec[:, 1:], syn_mcep_vec[:, 1:], dist=euclidean)
elif MCD_mode == "dtw_sl":
cof = len(ref_mcep_vec)/len(syn_mcep_vec) if len(ref_mcep_vec)>len(syn_mcep_vec) else len(syn_mcep_vec)/len(ref_mcep_vec)
_, path = fastdtw(ref_mcep_vec[:, 1:], syn_mcep_vec[:, 1:], dist=euclidean)Expected Behavior
The outer if MCD_mode == "plain": should likely be removed or changed to allow all three modes (plain, dtw, dtw_sl) to follow a similar processing pipeline (padding + feature extraction).
Currently, the inner conditions for "dtw" and "dtw_sl" will never be executed, because they are nested inside a block that only runs if MCD_mode == "plain".
Actual Behavior
- When
MCD_modeis"dtw"or"dtw_sl", the code inside the innerelifblocks is never reached. - As a result, DTW-based MCD computation does not execute as intended.
Proposed Fix
Move the MCEP extraction part outside of the "plain" condition and structure it like this:
# pad shorter wav
if len(loaded_ref_wav) < len(loaded_syn_wav):
loaded_ref_wav = np.pad(loaded_ref_wav, (0, len(loaded_syn_wav) - len(loaded_ref_wav)))
else:
loaded_syn_wav = np.pad(loaded_syn_wav, (0, len(loaded_ref_wav) - len(loaded_syn_wav)))
# extract features
ref_mcep_vec = self.wav2mcep_numpy(loaded_ref_wav, score_rate)
syn_mcep_vec = self.wav2mcep_numpy(loaded_syn_wav, score_rate)
# choose path strategy
if MCD_mode == "plain":
path = [(i, i) for i in range(len(ref_mcep_vec))]
elif MCD_mode == "dtw":
_, path = fastdtw(ref_mcep_vec[:, 1:], syn_mcep_vec[:, 1:], dist=euclidean)
elif MCD_mode == "dtw_sl":
cof = len(ref_mcep_vec) / len(syn_mcep_vec) if len(ref_mcep_vec) > len(syn_mcep_vec) else len(syn_mcep_vec) / len(ref_mcep_vec)
_, path = fastdtw(ref_mcep_vec[:, 1:], syn_mcep_vec[:, 1:], dist=euclidean)Environment
- Repository: [modelscope/ClearerVoice-Studio](https://github.com/modelscope/ClearerVoice-Studio)
- File:
speechscore/scores/mcd.py - Function:
average_mcd - Python version: 3.11.13
- OS: Ubuntu 22.04
Additional Context
This bug likely causes incorrect MCD computation for "dtw" and "dtw_sl" modes, leading to invalid or nan results during evaluation.