- A collection of dataset consists of a total of 8 English speech emotion dataset.
- This dataset will help you create a generalized deep learning model for SER.
- Most of the data also includes text data for voice, which can be used for multimodal modeling.
- Nov. 2, 2023 ~ Nov. 12, 2023
- Each dataset can be downloaded from the link below, and each data file should be located in appropriate folders.
- We use pandas profiling for simple EDA.
conda install -c conda-forge ydata-profiling
- A detailed description of each dataset is provided here.
- This jupyter notebook generates a single data frame containing the entire data paths and features.
License: Open Database License, https://opendatacommons.org/licenses/odbl/1-0/
- Number of Dataset: 7442
- 48 male and 43 female actors between the ages of 20 and 74.
- A variety of races and ethnicities (African America, Asian, Caucasian, Hispanic, and Unspecified).
6 Classes: Anger, Disgust, Fear, Happy, Neutral, Sad- In addition, it contains
andEmotion Level
Citation [1]: S. Poria, D. Hazarika, N. Majumder, G. Naik, R. Mihalcea, E. Cambria. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation. (2018)
Citation [2]: Chen, S.Y., Hsu, C.C., Kuo, C.C. and Ku, L.W. EmotionLines: An Emotion Corpus of Multi-Party Conversations. arXiv preprint arXiv:1802.08379 (2018).
- Train, Test and Dev (We only used the train data.)
- MELD has 1400+ dialogues and 13,000+ utterances from Friends.
7 Classes: Anger, Disgust, Sadness, Joy, Neutral, Surprise and FearSentiment
: Positive, Negative and Neutral
3) MLEnd
MLEnd has been created by students at the School of Electronic Engineering and Computer Science, Queen Mary University of London.
- 31 nationalities, 42 unique languages and 154 speakers
- Each audio recording corresponds to one English numeral (from "zero" to "billion") that is read using different intonations.
- Number of Dataset: 32654
4 Classes: Neutral, Bored, Excited and Question- It contains
Citation: "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)" by Livingstone & Russo is licensed under CC BY-NA-SC 4.0.
- 24 professional actors (12 female, 12 male) and two lexically-matched statements in a neutral North American accent.
- Number of Dataset: 1440
7 Classes: Calm, Happy, Sad, Angry, Fearful, Surprise and Disgust- Emotional
: Normal, Strong
License: Data files © Original Authors | Authors: Philip Jackson and Sanaul Haq
- Four native English male speakers and aged from 27 to 31 years
- A total of 120 utterances per speaker and 15 TIMIT sentences per emotion: 3 common, 2 emotion-specific and 10 generic sentences
7 Classes: Anger, Disgust, Fear, Happiness, Sadness and Surprise
License: Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), https://creativecommons.org/licenses/by-nc-nd/4.0/
- 200 target words were spoken in the carrier phrase "Say the word _' by two female actresses (aged 26 and 64 years).
7 Classes: Anger, Disgust, Fear, Happiness, Pleasant Surprise, Sadness, and Neutral- It contains
Citation: Kun Zhou, Berrak Sisman, Rui Liu and Haizhou Li, "Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset" ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 350 parallel utterances were spoken by 10 native English and 10 native Mandarin speakers.
- We only used English.
5 Classes: Neutral, Happy, Angry, Sad and Surprise
8) JL Corpus
License: CC0: Public Domain, https://creativecommons.org/publicdomain/zero/1.0/
Citation: Jesin James, Li Tian, Catherine Watson, "An Open Source Emotional Speech Corpus for Human Robot Interaction Applications", in Proc. Interspeech, 2018.
- Emotional speech corpus with balanced New Zealand English vowels, encompassing 5 primary and 5 secondary emotions.
10 Classes: Angry, Anxious, Apologetic, Assertive, Concerned, Encouraging, Excited, Happy, Neutral and Sad- In addition, it contains a
├── dataset # Not contained in this repository
│ ├── crema-d # CREMA-D Dataset
│ ├── meld # MELD Dataset
│ ├── mlend # MLEnd Dataset
│ ├── ravdess # RAVDESS Dataset
│ ├── savee # SAVEE Dataset
│ ├── tess # TESS Dataset
│ ├── esd # Emotional Voice Conversion: Theory, Databases and ESD Dataset
│ └── jl-corpus # JL Corpus Dataset
├── MakeEngSpeechDataset.ipynb # Make a Dataframe Including 8 Dataset
├── SpeechEDA.ipynb # EDA using Pandas Profiling
├── speech_dataset.csv # Main Dataset
└── report.html # EDA Report