This work is based on earlier work in Zhang et al. (2019)
- Download the TIMIT corpus.
- To generate cochleagrams feature vectors:
- To generate features for all files at once:
util.generate_mat_for_all_data_in_dir(data_path, output_path)data_pathis the path to thedatadirectory in the TIMIT corpus.output_pathis the path where the cochleagrams feature vectors will be saved.file_typecan be set to "npy" if python SHMAX implementation is used. "mat" is default
- To generate features for a single file:
util.save_features_as_mat(wav_path)wav_pathis the path to the wav filefile_typecan be set to "npy" if python SHMAX implementation is used. "mat" is default- A second argument can be supplied to specify the output path.
- Feature vector is returned as a Numpy array.
- To generate features for all files at once:
Note: Using these functions to generate cochleagrams feature vectors may result in a poorly trained model. You may want to look into using the Matlab Auditory Toolbox instead.
SHMAX.train_SHMAX(train_path, output_path)train_pathis the path to the directory containg the cocheagrams feature vectors.output_pathis the path where the model output will be saved.
cochABX.generate_phoneme_matrices(corpus_data_path, phoneme_feature_path, result_path)corpus_data_pathis the path to the data directory in the TIMIT corpus.phoneme_feature_pathis the path to the directory where the phoneme feature vectors are saved.result_pathis the path where the resulting phoneme matrices will be saved.- Note: Files will be of type
.mat
- Create categories to perform ABX testing by
- Create a 2D list such that each sublist contains the phoneme names of a category. e.g.
[['aa', 'ae', 'ah'], ['ao', 'aw', 'ax'], ...] - Pass this list to
cochABX.convert_category_list_to_dict(cat_list)and store the resulting dictionary.
- Create a 2D list such that each sublist contains the phoneme names of a category. e.g.
cochABX.abx_testing(phoneme_mat_path, categories, num_categories)phoneme_mat_pathis the path to the directory containing the phoneme matrices.categoriesis the dictionary containing the phoneme names of the categories.num_categoriesis the number of categories.- Returns confusion matrix where the rows represent the true category and the columns represent the predicted category.
- If
result_pathis supplied, the confusion matrix will be saved to the specified path.
cochABX.general_classification_abx_testing(phoneme_mat_path, categories, num_categories)phoneme_mat_pathis the path to the directory containing the phoneme matrices.categoriesis the dictionary containing the phoneme names of the categories.num_categoriesis the number of categories.- Returns confusion matrix where the rows represent the true category and the columns represent the predicted category.
- If
result_pathis supplied, the confusion matrix will be saved to the specified path.
getPhonemeResponse.get_phoneme_response(corpus_data_path, SHMAX_data_path)corpus_data_pathis the path to the data directory in the TIMIT corpus.SHMAX_data_pathis the path to the directory containing the SHMAX model output.- Returns an m x n matrix where m is the number of phonemes and n is the number of computational units. At index (i, j) a list of the responses of the jth computational unit to the ith phoneme is stored.
- If
result_pathis supplied, the phoneme response matrix will be saved to the specified path. - The
categoriesargument can be used to specify categories to calculate responses for rather than individual phonemes. The correct form for this argument can be generated by following step 2 of ABX Testing- If this argument is supplied,
num_categoriesmust also be supplied.
- If this argument is supplied,
Note: CSI refers to the specificity towards some arbitrary category assignment rather than individual phonemes. If a CSI is desired simply calculate the response matrix for the desired category assignment.
getPSI.get_psi(responses)responsesis the phoneme response matrix.- Returns a m x n matrix where m is the number of phonemes and n is the number of computational units. At index (i, j) the CSI/PSI of the jth computational unit to the ith phoneme is stored.
- If
result_pathis supplied, the CSI/PSI matrix will be saved to the specified path.