This repository contains the following files, needed for the ABX discrimination task experiment on French and English vowels:
scripts
folder containing the scripts, needed to generate the tables, to cut the sound files, resample them etc.;outputs
folder containing all the tables, .csvs or other text files, generated by the scripts;stimuli
folder, containing the recorded .wav files and the annotated corresponding .TextGrids. It includes theintervals
folder, where all the segmented intervals are stored, and thetriplets
folder, where all the sound files used for the experiment are (concatenated three intervals: A, B and X sound);lmeds_material
folder, where all the necessary files for the online experiment are stored;analysis
folder containing the anonymized data, the scripts necessary for the analysis as well as the outputs of those scripts.
-
Have sound .wav files with their corresponding TextGrids and the silence file ready in
./stimuli
folder -
make
cut_intervals
: cuts the .wav files into intervals (only the stimuli) -
make
create_experimental_list
: creates the list that has balanced quantity of each possible vowel that will be tested, as well as balanced appearance of each possible speaker and the context C_C -
make
create_stim_filename_distance_lists
: adds the interval filenames to the experimental lists, creates theStimulus_list.txt
needed to concatenate intervals and creates adistance_list.csv
that has different column names (needed this format for the analysis) -
make
concatenate_intervals
: concatenates the intervals with the silence.wav
file into a concatenated triplet with a new name and creates also the.mp3
and.ogg
versions needed for the LMEDS -
make
make_lmeds_sequence
: creates the part of the LMEDS sequence where the triplet filenames are inserted as stimuli to be tested -
copy
.wav
,.mp3
,.ogg
files intoaudio_and_video
which is inlmeds_material/sounds
-
copy the contents of
lmeds_material
folder to the LMEDS folder on the server, fix the sequences (.txt
files) -
run the test
-
make
calculate_distances
: calculates the euclidean distances with the DTW algorithm and appends them to thedistance_list_final.csv
-
make
anonymize_lmeds_data
: anonymizes the names of the participants and creates two new subfolders with a language name where all data is anonymized -
make
split_output_to_results_files
: maps the results from the LMEDS to comprehensible tables (sequence, presurvey, postsurvey, postsurvey2) -
make
analysis_step1_preprocess_strut
: the data from sequence, postsurveys and the presurvey is combined and ordered into a new csv, ready for the analysis.
- LMEDS needs to be downloaded separately
- audio files need to be copied over into the
sounds/audio_and_video
- LMEDS sequence needs to be manually fixed
.cgi
and folderslmeds/test_name/individual_sequences
andlmeds/test_name/\outputs
need to have permissions to be modified by the test takers (chmod 777 $filename$
)- results need to be stored in separate subfolders named by language to allow the analysis processing
- more info inside the folder
./model/supervised/
- need kaldi (
http://kaldi-asr.org/
) and abkhazia (https://github.com/bootphon/abkhazia/blob/master/abkhazia/
) - same stimuli were used as for the human experience: for technical purposes, they were tested as separate files (one interval per file + 500ms of silence before and after), as triplets (same as people heard them) and within the carrier sentences as they were originally recorded
- the results of the supervised model contain only the stimuli, never the added silence or the carrier sentence.