Deep learning using CNN for tone classification of 4 Mandarin Chinese tones
Deep learning using CNN for tone classification of 4 Mandarin Chinese tones. Dataset used was the Tone Perfect dataset from Michigan State University (https://tone.lib.msu.edu/). The training data used was a monosyllabic Mandarin Chinese dataset of 9,860 audio files. The neural network was trained on either male, female, or combined data, and for each dataset split, either mel-frequency cepstral coefficients (MFCC), mel-spectrograms, or pitch contours were extracted from the audio files and fed as input features into the CNN. The highest test accuracy achieved in this research is 99.8%.