GitHub - 95anantsingh/Decoding-Emotions

Abstract

Recent advancements in transformer-based speech representation models have greatly transformed speech processing. However, there has been limited research conducted on evaluating these models for speech emotion recognition (SER) across multiple languages and examining their internal representations. This article addresses these gaps by presenting a comprehensive benchmark for SER with eight speech representation models and six different languages. We conducted probing experiments to gain insights into inner workings of these models for SER. We find that using features from a single optimal layer of a speech model reduces the error rate by 32% on average across seven datasets when compared to systems where features from all layers of speech models are used. We also achieve state-of-the-art results for German and Persian languages. Our probing results indicate that the middle layers of speech models capture the most important emotional information for speech emotion recognition.

Downloads

Download data and weights from - https://drive.google.com/drive/folders/1HW7R1ng7FLrYXnvjpHcQk7-ujyyX-8du?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.vscode		.vscode
data		data
history		history
jobs		jobs
models		models
summary		summary
utils		utils
weights		weights
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
datasets.py		datasets.py
env.yml		env.yml
finetune.py		finetune.py
main.py		main.py
model_summary.ipynb		model_summary.ipynb
summary.ipynb		summary.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Abstract

Downloads

About

Uh oh!

Releases

Packages

Languages

License

95anantsingh/Decoding-Emotions

Folders and files

Latest commit

History

Repository files navigation

Abstract

Downloads

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages