Blurb: Machine learning project for detecting emotion from vocal input.
Our current idea is a research-style project that revolves around detecting emotion through the way people speak. In everyday life, people express their words through a multitude of tones. While we currently have many types of softwares and algorithms that allow machines to understand our voices, there are some problems with it. These variances are exhibited in the way we speak when we are angry or excited, but are often difficult to translate well through voice recognition software. In other words, the way we express ourselves can produce a large amount of noisy data, which in turn limits the amount of relevant information that we can give to a machine, rather than providing the subtle details that humans are able to recognize to said machines to learn from. While accents and dialects are specific to the groups of people, we believe that the tones that people use to express certain emotions are universal. As such, this should not be a factor that hinders higher level voice recognition. This is what that we aim to address in this mentorship. As part of our procedure, we plan on training a machine learning model to be able to carry this task out. So far, we’ve found some publicly available datasets for training, yet we’re also considering using vocal clips from movie lines as part of them. Various factors can act as features for the model including the tone, amplitude, pitch, tempo but we aren’t sure on the implementation of this or to what degree these features should be taken into account relative to each other. A possible extension to this project that we may consider is the programming language to use when writing this. Although writing in Python, would be significantly easier to write as well as being more readable and comprehensible, it would be significantly slower than using a higher level programming language like C or even Java, especially if we’re considering using data in real time to detect emotions in.
- Allow machines to better understand a person’s mood
- Personal control and comfort tool to help a person cope with any emotional situation
- Entertainment tool better suited and specialized for varied audiences
Machines only take words literally and allowing them to input emotion allows a degree of nuance in their responses. With the increasing prevalence of personal assistants in homes, with the user’s consent, it can also be used to identify a their state of mind and mental health based off the tonality of their voice among other factors.
- Keras and Tensorflow
- Numpy
- nltk
- sklearn
- speechrecognition
- https://www.researchgate.net/publication/51753305_Authenticity_affects_the_recognition_of_emotions_in_speech_behavioral_and_fMRI_evidence
- https://www.researchgate.net/publication/51796867_Introducing_the_Geneva_Multimodal_Expression_Corpus_for_Experimental_Research_on_Emotion_Perception *https://www.researchgate.net/publication/325187111_The_Ryerson_Audio-Visual_Database_of_Emotional_Speech_and_Song_RAVDESS_A_dynamic_multimodal_set_of_facial_and_vocal_expressions_in_North_American_English
- https://ieeexplore.ieee.org/abstract/document/6849440
- http://poseidon.csd.auth.gr/papers/PUBLISHED/CONFERENCE/pdf/Ververidis2003b.pdf
- https://tspace.library.utoronto.ca/handle/1807/24487/browse?type=title&submit_browse=Title
- https://zenodo.org/record/1188976#.W9SNGJNKhPY (Considered great by other users)
- http://peeter.eki.ee:5000/
- http://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Improv.html
- https://semaine-db.eu/accounts/register/ (Takes up to a week to request account)