We apologize for any disorganization in the file structure. Below is a detailed description of our project's directory, which should assist in evaluating our reproduction efforts.
Platform: Linux
/README.md- Provides a helpful description of our reproduction process.CVAE.py- Implements the CVAE as described in the paper, including the model's definition and training process. This script requires the preprocessed dataset to run.data.py- Data loader script forCVAE.py.cvae_weights.pth- Contains the trained weights for our CVAE model. The dataset for training CVAE is large, so only the model weights are included.optimizer_with_STOI.py- Executes the project on the test set, outputs metric results toresults.txt, and saves the convolved audio files in their respective directories.optimizer_with_STOI_self.py- Similar tooptimizer_with_STOI.py, but for self-testing with our samples. Outputs results toresults_self.txt.WAD.py- Compares each processed audio file intext-cleanwith its original version inprocessed-text-clean, saving results inWAD_results.txt.WAD_self.py- Compares each processed audio file inselfSamplingwith its original version, saving results inWAD_results_self.txt./logAnalyse/- Processes log data, generates figures usingmatplotlib.pyplot, and outputs statistical data.extractData.py- Extracts useful data from log files (e.g.,results.txtandWAD_results.txt) for statistical analysis.*.json- Results ofextractData.py.TestSetAnalysis.ipynb- Analyzes data and plots figures to visualize the results.SelfSampleAnalysis.ipynb- Analyzes data of self-samples and plots figures to visualize the results.
/text-clean/- Contains original audio from the test set./afterRIR-text-clean/- Contains audio files from the test set convolved with the RIR signal, without optimization./processed-text-clean/- Contains audio files from the test set processed by VoiceCloak (convolved with optimized delta)./selfSampling/- Contains original self-recorded samples./afterRIRSelfSampling/- Contains self-recorded samples convolved with the RIR signal, without optimization./processedSelfSampling/- Contains self-recorded samples processed by VoiceCloak (convolved with optimized delta).*.txt- Contains other necessary log files.
- Preprocessed the audio data to extract embeddings using mainstream ASR tools.
- Appended a one-hot label to the embeddings.
- Defined the model (layers, hyperparameters).
- Conducted training (initial settings, optimal hyperparameters).
- Selected an effective RIR as a template for delta.
- Optimized delta using the Gradient Descent Method.
- Data Source: (1) Test Set; (2) Self-samples.
- Metrics (completed): (1) DSR; (2) STOI; (3) WAD.
- Generated figures to display statistical information.
Step 1: Set up the required environment, including packages and the data necessary for execution.
Step 2: Execute optimizer_with_STOI.py to generate all audio files and calculate the STOI. Results are saved in results.txt.
Step 3: Run WAD.py to compare each processed audio file with its original version. Results are saved in WAD_results.txt.
Step 4: Use the logAnalyse script in ./logAnalyse to obtain additional statistical information (including DSR, STOI, and WAD). Please ensure to copy the log files or modify the path as needed.
NOTE:
- We employ
os.getcwd()combined with specific folder and file names. Ensure that your terminal's working directory is set to our project's root directory (/). - Files with the
_selfsuffix pertain to our self-samples for extended testing.