This repository contains the code base for reproducing the paper A perceptual similarity space for speech based on self-supervised speech representations by Bronya R. Chernyak, Ann R. Bradlow, Joseph Keshet, and Matthew Goldrick.
To reproduce the tables, the analysis scripts are under "analysis_scripts". Note each subdirectory has it's corresponding readme file. The distance files are located under "distance_files".
- Follow the installation instructions of fairseq: https://github.com/facebookresearch/fairseq/tree/main#requirements-and-installation
- Go back to the percept_sim repo
- Install the requirements with
pip install -r requirements.txt
To render the notebook plots, use nbviewer. For instance, for the Korean dataset notebook, use:
https://nbviewer.org/github/bronichern/percept_sim/blob/main/kr_trajectories.ipynb
To change the plots, you need hubert's features. To change the korean notebook, you can download the files Here To generate other features, see the instructions in section 3.
Instead of the notebook you can also run a script that will open the plots directly in a dedicated view:
- Assuming the files above were downloaded to:
/home/some_path
(change that with the actual location where you downloaded the files). - Now
/home/some_path
has 2 files:kr_en_full_0_1.npy
andkr_en_full_0_1.len
- In this repo, there is a yaml config file:
demo/kr_trajectories.yaml
. Edit the value offeat_path
in the yaml to have the path/home/some_path
. Then, dit the value of tsv_path, to the tsvs directory in this repor. - Run the following - change
path_to_config
to point to the location of the yaml file above:
python demo/kr_trajectories.py --path_to_config
The scripts in the repository rely on the dataset file structure. You can see it here or refer to SpeechBox metadata information.
The following script can resample the data to 16khz, split it by sentence (if textgrids are available) and create a tsv file in tsvs/.
python data_preprocess/prepare_data.py --path data_path --resample_waves --split_by_sentence --tsv_name name_of_tsv_file
Remove --resample_waves
if you don't want to resample the files and remove --split_by_sentence
if the files are relatively short(~1 second up to ~18 seconds) and there is no need for splitting by sentence/word.
name_of_tsv_file
- should be without '.tsv' at the end.
data_path
- directory with wave files.
- If splitting is relevant, textgrids should be within the same directory and have the same name as the corresponding wave file.
python data_preprocess/prepare_data.py --path kr_path --resample_waves --tsv_name kr
python data_preprocess/prepare_data.py --path allstar_path --resample_waves --split_by_sentence --prepare_allstar
Note that the script will create a tsv for each reading task in the dataset.
Throughout the next sections, there is a distinction between the ALLSTAR dataset, which has different reading tasks, and the Korean dataset, which doesn't.
Within the directory you choose to store Hubert's feature file, create a directory named kr_layerlayer-number
(i.e kr_layer1)
layer-number
- Hubert layer we want to have as a feature extractor
Within the directory you choose to store Hubert's feature file, create a directory named dataset_name
_reading_task
_layerlayer-number
(i.e cmn_lpp_layer1)
dataset_name
- 'cmn' or 'spn' for Mandarin or Spanish respectively.
reading_task
- "HT1"/"HT2"/"LPP"/"DHR".
layer-number
- Hubert layer we want to have as a feature extractor.
python run.py --reading_task task_name --data data_subset_name --layer hubert_layer --portion sentences_portion --verbose --feat_path hubert_feature_path --tsv_name tsv_name --project --output_path output_csv_dir
--data
: First, to know which dataset we process, --data
argument is required. --data cmn
or --data spn
are for the ALLSTAR dataset, with L1 English speakers and L2 english speakers with L1 Mandarin or L1 Spanish, respectively; --data kr
specifies to use the Korean dataset with L1 English speakers and L2 English speakers with Korean L1. The different datasets have different processing when creating their DataFrame.
--portion
: This parameter is only relevant for the ALLSTAR dataset (i.e., with arguments --data cmn
or --data spn
). For the ALLSTAR dataset, since there was a lot of data to process, the script processes subsets of the data. Meaning since we divided the waves by sentences, --portion
accepts integers between 1-9 that specifies to process sentences with an id that begins with sentences_portion
.
- If you wish to run using all the data instead of specifying
--portion
, use--run_all_data
.
--reading_task
is relevant for the ALLSTAR dataset where task_name can be one of "HT1","HT2","LPP","DHR".
--project
argument is specified to project using TSNE. Remove this argument if you want to measure distance using all of HuBERT's features. If you want to use 2 dimensions for TSNE instead of 3 (default), specify --tsne_dim 2
. If you want to use UMAP or K-PCA instead, add the flag --umap
or --pca
respectively.
At the end of processing, a CSV file will be created in the directory output_csv_dir
specified in the argument --output_path output_csv_dir
.
The script assumes that the required Hubert's feature file is saved in - see the next section for instructions on how to create Hubert feature file: hubert_feature_path
/data_subset_name
_task_name
_hubert_layer
/
For instance, for running ALLSTAR-HT1 dataset with L1 English speakers and L2 Mandarin speakers, with a feature file for layer 12 saved in hubert_feature_path/cmn_ht1_layer12, and with a tsv file created in tsvs/allstar_ht1.tsv run the following command:
python run.py --reading_task "HT1" --data "cmn" --layer 12 --portion 1 --verbose --feat_path hubert_feature_path/ --tsv_name allstar_ht1.tsv --project --output_path output_csv_dir
For running the dataset with L1 English speakers and L2 Korean speakers, with a feature file for layer 12 saved in hubert_feature_path/kr_ht1_layer12, and with a tsv file created in tsvs/kr.tsv run the following command:
python run.py --data "kr" --layer 12 --portion 1 --verbose --feat_path hubert_feature_path/ --tsv_name kr.tsv --project --output_path output_csv_dir
The following section uses fairseq's repo, which was installed inside this repo when you run the installation in section 1.
Download Hubert base model - Here.
Run the following to create the feature file:
cd fairseq
python examples/hubert/simple_kmeans/dump_hubert_feature.py tsvs tsv_name path_of_hubert_model layer_number 1 0 output_path_for_features
tsv
- Path to the directory where the tsv files are located
layer_number
- Hubert layer we want to have as a feature extractor
tsv_name
- the tsv file created in section 3 (should be located in the directory specified instead of the tsv argument)
output_path_for_features
is the path for saving the feature file. The one created in section 4 (i.e. cmn_ht1_layer12)