Skip to content

HydroxideX/Speech-Emotion-Recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Speech Emotion Recognition

Introduction

  • built a neural network from scratch using Octave that predicts human emotion from his voice with a 70.909% accuracy.
  • created a python application that predicts human emotion given a ".wav" audio file.

Python Packages used

  • librosa==0.6.3
  • numpy
  • sklearn.model_selection

To use the application

  • open Gui.pwy
  • browse a ".wav" audio file
  • an emoji will appear predicting the emotion of that file.

Dataset

We used 2 datasets :

  • RAVDESS : The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions.
  • TESS :A set of 200 target words were spoken in the carrier phrase "Say the word _____' by two actresses (aged 26 and 64 years) and recordings were made of the set portraying each of seven emotions (anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral).

To load and split the data

from numpy import loadtxt
from sklearn.model_selection import train_test_split
X= loadtxt('X.csv', delimiter=',')
y= loadtxt('y.csv', delimiter=',')
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=0)

Emotions

"neutral", "calm", "happy" "sad", "angry", "fear", "disgust" and "pleasant surprise".

Feature Extraction

We used the feature extractions available in librosa library including:

  • MFCC
  • Chromagram
  • MEL Spectrogram Frequency (mel)
  • Contrast
  • Tonnetz (tonal centroid features)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •