2021 Autumn Tongji University SSE ASR assignments
Requirements:
numpy
: math calculations
matplotlib
: draw some diagrams
librosa
: read and write .wav files
Function invoking chain:
read_audio(): read the .wav audio by librosa(sample rate 8000KHz)
|
V
pre_emphasis(): just do pre-emphasis
|
V
frame_divide(): divide the frame with length 25ms and movement 10ms
|
V
windowing(): just windowing using hamming window
|
V
audio_fft(): implementing FFT(by calling the API of numpy)
|
V
mel_filter(): apply the mel-filter to the FFTed data and calculate the energy(by summing up the data)
|
V
dct(): DCT implemented by myself using the formula in the slides
|
V
lifter(): just lifter
|
V
delta(): calculate the delta of the data (twice to calculate the 2-rank delta)
|
V
norm(): normalize the data to get the final model
References:
- https://blog.csdn.net/weixin_45272908/article/details/115641702
- https://blog.csdn.net/Magical_Bubble/article/details/90295814
- http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/
- https://github.com/jameslyons/python_speech_features
- https://blog.csdn.net/jojozhangju/article/details/18678861
Note
To generate diagrams simply run the code on PyCharm
This assignment is to implement an HMM Training code in Python. Actually we need to "translate" code in MATLAB to Python
Document available at [here](./assignment 2/python_code/document.md)