The ChatGPT-generated data resource (i.e., emotion class captions) to reproduce our experimental results for the proposed CLAP4Emo framework. This work is presented and published in ICASSP'24. This repository contains detailed information about the utilized data (GPTEmo_Caps.json) used to train the model, along with the corresponding prompt list used to compute speech emotion retrieval performance as reported in the paper.
This data resource is solely released and published as part of the publication effort to help academic researchers benchmark or reproduce results. It will neither be maintained nor monitored in any way.
JSON file GPTEmo_Caps.json contains list of captions associating to a specific emotion type as the dictionary key,
- "A": "angry"
- "H": "happy"
- "N": "neutral"
- "S": "sad"
import json
with open("./GPTEmo_Caps.json") as f:
emocaps = json.load(f)
# get the list of "angry" captions generated by ChatGPT
angry_captions = emocaps["A"]
Angry |
---|
"speech has angry emotion" |
"the speech is confrontational and negative" |
"shouting to someone else" |
"someone speaks and suddenly raise their voice with negative tone" |
"the speech sounds violent with uncontrollable anger" |
"rage, wrath, fury or outrage" |
"someone speaks with threatening tone" |
"speech loudly and negative" |
"high volume, intensity and negative sound" |
"harsh or aggressive vocal tones" |
"the speech incorporating elements of anger, irritation, or even sarcasm" |
"expression of hate, disdain, dislike or denying" |
Happy |
---|
"speech has happy emotion" |
"the speech is cheerful and positive" |
"laughing sounds" |
"talking jokes, feeling happy and relaxed" |
"the speech sounds exciting, thrilling and joyful" |
"the speech makes people feel amusing and positive" |
"happiness, optimism or hope" |
"the sound is delightful and pleasant, making people positive" |
"friendly tone, affable and kind speech" |
Neutral |
---|
"speech has neutral emotion" |
"the speech is calm and neutral" |
"speaking with no emotions" |
"flat tone and boring speech" |
"steady, no changes and bored" |
"talking in a normal pace" |
"standard expression way" |
"not positive, not negative, in the middle" |
"nothing special or needs to be noticed" |
Sad |
---|
"speech has sad emotion" |
"screaming and crying sounds" |
"the speech is sad, making people feel depressed and suffering" |
"frustrated, unhappy, and sadness" |
"feel painful and upset" |
"conveys a sense of sorrow, grief, or melancholy" |
"low intensity and negative" |
"the speech sounds lifeless, spiritless or losing hopes" |
"quivering or trembling voice" |
"sighs and struggle with sadness" |
"low volume and negative voice" |
If you use the released resource or conduct CLAP4Emo framework, please cite the following paper:
Wei-Cheng Lin, Shabnam Ghaffarzadegan, Luca Bondi, Abinaya Kumar, Samarjit Das and Ho-Hsiang Wu, "CLAP4Emo: ChatGPT-Assisted Speech Emotion Retrieval with Natural Language Supervision", ICASSP 2024.
@InProceedings{LinCLAP4Emo_2024,
author={W.-C. Lin and S. Ghaffarzadegan and L. Bondi and A. Kumar and S. Das and H.-H. Wu},
title={{CLAP4Emo}: ChatGPT-Assisted Speech Emotion Retrieval with Natural Language Supervision},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024)},
volume={},
year={2024},
month={April},
pages={11791-11795},
address = {Seoul, Korea},
doi={10.1109/ICASSP48485.2024.10447102},
}
This repository is open-sourced under the CC-BY-SA-4.0 license. See the LICENSE file for details.