Voice-Based Age and Gender recognize: A Comparative Study of LSTM, RezoNet, and Hybrid CNNs-BiLSTM Architecture
Official implementation for the paper: Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet and Hybrid CNNs-BiLSTM Architecture. The paper has been accepted to The 15th International Conference on ICT Convergence (ICTC2024).
Please press ⭐ button and/or cite papers if you feel helpful.
In this study, we compared three architectures for the task of age and gender recognition from voice data: Long Short-Term Memory networks (LSTM), Hybrid of Convolutional Neural Networks Bidirectional Long Short-Term Memory (CNNs-BiLSTM), and the recently released RezoNet architecture. The dataset used in the study is sourced from Mozilla Common Voice in Japanese. Features such as pitch, magnitude, Mel-frequency cepstral coefficients (MFCCs), and filter-bank energies were extracted from the voice data for signal processing, and three architectures were evaluated. Our evaluation revealed that LSTM was slightly less accurate than RezoNet (83.1%), with hybrid CNNs-BiLSTM (93.1%) and LSTM achieving the best accuracy for gender recognition (93.5%). However, hybrid CNNs-BiLSTM architecture outperformed the other models in age recognition, with an accuracy of 69.75%, compared to 64.25% and 44.88% for LSTM and RezoNet, respectively. Using Japanese language data and the extracted characteristics, the hybrid CNNs-BiLSTM architecture model demonstrated the highest accuracy in both tests, highlighting its efficacy in voice-based age and gender detection. These results suggest promising avenues for future research and practical applications in this field.
Index Terms: Voice-Based Age and Gender Recognition, RezoNet, Convolutional Neural Network, Long Short-Term Memory, Bidirectional Long-Term Memory, Deep Learning.
In this study, we use voice dataset from Mozilla Comman Voice.
git clone "https://github.com/nhut-ngnn/Voice-Based-Age-and-Gender-Recogniton.git"
conda create -n Voice-Based-Age-and-Gender-Recogniton python=3.10 -y
conda activate Voice-Based-Age-and-Gender-Recogniton
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
It will be updated soon
It will be updated soon
It will be updated soon
For any information, please contact the main author:
Nhut Minh Nguyen at FPT University, Vietnam
Email: minhnhut.ngnn@gmail.com
GitHub: https://github.com/nhut-ngnn