This research project undertakes a comprehensive analysis of speech emotion recognition. Har- monizing datasets involves negotiating disparate naming conventions and emotional expressions, establishing a standardized format for cohesive analysis. The distribution of emotions, excluding surprise, calm, and neutral, is well-balanced. The subsequent focus is on feature extraction through raw audio waveforms, frequency spectrum (FFT), short-time Fourier transform (STFT) spectrograms, and mel spectrograms. Three distinct convolutional neural network (CNN) models—Mel Spectrogram CNN, MFCCs CNN, and Mel Spectrogram CRNN—are developed and evaluated. Results indicate a 72% accuracy in classifying emotions, with Mel spectrogram and MFCC features displaying complementary strengths. The study concludes by suggesting avenues for improvement, emphasizing feature fusion, exploring specialized deep learning models, and addressing data imbalance. Future work involves real-life integration, applying sentiment analysis to predict stock market effects based on emotion-laden communications in S&P 500 earnings calls.
-
Notifications
You must be signed in to change notification settings - Fork 0
I harmonized diverse speech emotion datasets and developed convolutional neural network (CNN) models, including Mel Spectrogram CNN, MFCCs CNN, and Mel Spectrogram CRNN, achieving a 72% accuracy in emotion classification.
arzuisiktopbas/Speech-Emotion-Recognition
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
I harmonized diverse speech emotion datasets and developed convolutional neural network (CNN) models, including Mel Spectrogram CNN, MFCCs CNN, and Mel Spectrogram CRNN, achieving a 72% accuracy in emotion classification.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published