This project is centered around the idea of processing and analyzing speech data. We propose to carry two tasks viz. Gender identification and Speaker Recognition. In other words, we propose to carry out gender identification and speaker recognition based on the acoustic features of the features extracted. A corpus of audio files (speech data) will be processed, this would result in the generation of spectrograms (image files). The nature of each of these audio files would be studied through the use of visualization techniques. Feature extraction would then be carried to extract spectral features/acoustic features viz. MFCCs, Zero crossings etc. The resultant spectral features would then be supplied to Machine Learning models and Deep Learning models to carry out gender identification and speaker recognition.
• Extensive visualization of the audio corpus collected using libraries like librosa, matplotlib, pandas, seaborn etc.
• The extraction of spectral and other speech related features from the corpus using the librosa library.
• Pre-Processing of these spectral features
• The Construction of gender identification and speaker recognition models through the use of Neural Networks and other relevant machine learning models
• Deployment of the gender identification model on the audio sourced from a live subject.
Differences in frequency, acoustic variations etc. constitute the critical aspects of an audio specimen that are used for the purpose of carrying out gender identification and speaker recognition. The following is the workflow of our project:
The trained gender identification neural network model is deployed on a set of audio files collected from live specimens. 3 files are collected, 2 of them being male audio specimens and 1 being a female audio specimen