A Deep Learning-Based Approach for Spoken Language Identification
- Kaggle's spoken language identification with 73080 samples from English, Spanish, and German languages.
- ShEMO a large-scale validated database for Persian speech emotion detection
Mel Spectrogram is used for feature extraction and results are saved into .npy
files. The model reads them using a custom data generator.
This project implemented two different architectures CNN and CRNN.
There is also a website!
You can watch presentation in Persian Here.