Skip to content

AndoniSanguesa/SHMAX

Repository files navigation

This work is based on earlier work in Zhang et al. (2019)

Generating Input Cochleagrams

  1. Download the TIMIT corpus.
  2. To generate cochleagrams feature vectors:
    1. To generate features for all files at once:
      1. util.generate_mat_for_all_data_in_dir(data_path, output_path)
        1. data_path is the path to the data directory in the TIMIT corpus.
        2. output_path is the path where the cochleagrams feature vectors will be saved.
        3. file_type can be set to "npy" if python SHMAX implementation is used. "mat" is default
    2. To generate features for a single file:
      1. util.save_features_as_mat(wav_path)
        1. wav_path is the path to the wav file
        2. file_type can be set to "npy" if python SHMAX implementation is used. "mat" is default
        3. A second argument can be supplied to specify the output path.
        4. Feature vector is returned as a Numpy array.

Note: Using these functions to generate cochleagrams feature vectors may result in a poorly trained model. You may want to look into using the Matlab Auditory Toolbox instead.

Training the SHMAX Model

  1. SHMAX.train_SHMAX(train_path, output_path)
    1. train_path is the path to the directory containg the cocheagrams feature vectors.
    2. output_path is the path where the model output will be saved.

ABX Testing on Input Cochleagrams

  1. cochABX.generate_phoneme_matrices(corpus_data_path, phoneme_feature_path, result_path)
    1. corpus_data_path is the path to the data directory in the TIMIT corpus.
    2. phoneme_feature_path is the path to the directory where the phoneme feature vectors are saved.
    3. result_path is the path where the resulting phoneme matrices will be saved.
    4. Note: Files will be of type .mat
  2. Create categories to perform ABX testing by
    1. Create a 2D list such that each sublist contains the phoneme names of a category. e.g. [['aa', 'ae', 'ah'], ['ao', 'aw', 'ax'], ...]
    2. Pass this list to cochABX.convert_category_list_to_dict(cat_list) and store the resulting dictionary.

Perform Pair-Wise Machine ABX Testing on Input Cochleagrams

  1. cochABX.abx_testing(phoneme_mat_path, categories, num_categories)
  2. phoneme_mat_path is the path to the directory containing the phoneme matrices.
  3. categories is the dictionary containing the phoneme names of the categories.
  4. num_categories is the number of categories.
  5. Returns confusion matrix where the rows represent the true category and the columns represent the predicted category.
  6. If result_path is supplied, the confusion matrix will be saved to the specified path.

Perform General Machine ABX Classification on Input Cochleagrams

  1. cochABX.general_classification_abx_testing(phoneme_mat_path, categories, num_categories)
  2. phoneme_mat_path is the path to the directory containing the phoneme matrices.
  3. categories is the dictionary containing the phoneme names of the categories.
  4. num_categories is the number of categories.
  5. Returns confusion matrix where the rows represent the true category and the columns represent the predicted category.
  6. If result_path is supplied, the confusion matrix will be saved to the specified path.

Get SHMAX Phoneme/Category Response Matrix

  1. getPhonemeResponse.get_phoneme_response(corpus_data_path, SHMAX_data_path)
    1. corpus_data_path is the path to the data directory in the TIMIT corpus.
    2. SHMAX_data_path is the path to the directory containing the SHMAX model output.
    3. Returns an m x n matrix where m is the number of phonemes and n is the number of computational units. At index (i, j) a list of the responses of the jth computational unit to the ith phoneme is stored.
    4. If result_path is supplied, the phoneme response matrix will be saved to the specified path.
    5. The categories argument can be used to specify categories to calculate responses for rather than individual phonemes. The correct form for this argument can be generated by following step 2 of ABX Testing
      1. If this argument is supplied, num_categories must also be supplied.

Get PSI/CSI Matrix

Note: CSI refers to the specificity towards some arbitrary category assignment rather than individual phonemes. If a CSI is desired simply calculate the response matrix for the desired category assignment.

  1. getPSI.get_psi(responses)
    1. responses is the phoneme response matrix.
    2. Returns a m x n matrix where m is the number of phonemes and n is the number of computational units. At index (i, j) the CSI/PSI of the jth computational unit to the ith phoneme is stored.
    3. If result_path is supplied, the CSI/PSI matrix will be saved to the specified path.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages