Speech Emotion Recognition

用 LSTM 进行语音情感识别，pytorch实现。

识别准确率 80% 左右。将原项目由Keras版本改写为pytorch版本（原项目中的CNN, MLP, SVM尚未改写）

Environment

Python 3.6.7

Pytorch 1.7.0

Structure

├── models/                // 模型实现
│   ├── common.py          // 所有模型的通用部分（即所有模型都会继承这个类）
│   ├── dnn                // 神经网络模型
│   │   ├── dnn.py         // 神经网络的通用部分
│   │   └── lstm.py        // LSTM
├── extract_feats/         // 特征提取
│   ├── librosa.py         // librosa 提取特征
│   └── opensmile.py       // Opensmile 提取特征
├── utils/
│   ├── files.py           // 用于整理数据集（分类、批量重命名）
│   ├── opts.py            // 使用 argparse 从命令行读入参数
│   └── common.py          // 加载模型、绘图（雷达图、频谱图、波形图）
├── features/              // 存储提取好的特征
├── config/                // 配置参数（.yaml）
├── train.py               // 训练模型
├── predict.py             // 用训练好的模型预测指定音频的情感
├── preprocess.py          // 数据预处理（提取数据集中音频的特征并保存）
└── opensmile-3.0-linux-x64 // opensmile工具

Requirments

Python

scikit-learn：划分训练集和测试集
pytorch：LSTM
librosa：提取特征、波形图
SciPy：频谱图
pandas：加载特征
Matplotlib：绘图
numpy

Tools

Opensmile：提取特征

Datasets

RAVDESS

英文，24 个人（12 名男性，12 名女性）的大约 1500 个音频，表达了 8 种不同的情绪（第三位数字表示情绪类别）：01 = neutral，02 = calm，03 = happy，04 = sad，05 = angry，06 = fearful，07 = disgust，08 = surprised。
[SAVEE README.md ](http://kahlan.eps.surrey.ac.uk/savee/Download.html)

英文，4 个人（男性）的大约 500 个音频，表达了 7 种不同的情绪（第一个字母表示情绪类别）：a = anger，d = disgust，f = fear，h = happiness，n = neutral，sa = sadness，su = surprise。
EMO-DB

德语，10 个人（5 名男性，5 名女性）的大约 500 个音频，表达了 7 种不同的情绪（倒数第二个字母表示情绪类别）：N = neutral，W = angry，A = fear，F = happy，T = sad，E = disgust，L = boredom。
CASIA

汉语，4 个人（2 名男性，2 名女性）的大约 1200 个音频，表达了 6 种不同的情绪：neutral，happy，sad，angry，fearful，surprised。
MEAD

英语，视频数据集，60个人的40小时的视频，表达了8种不同的情绪: angry, disgust, contempt, fear, happy, neutral, sad, surprise.，并且每个情绪分为3个level。

Usage

Prepare

安装依赖：

pip install -r requirements.txt

opensmile 解压到根目录

Configuration

在 configs/ 文件夹中的配置文件（YAML）里配置参数。

其中 Opensmile 标准特征集目前只支持：

IS09_emotion：The INTERSPEECH 2009 Emotion Challenge，384 个特征；
IS10_paraling：The INTERSPEECH 2010 Paralinguistic Challenge，1582 个特征；
IS11_speaker_state：The INTERSPEECH 2011 Speaker State Challenge，4368 个特征；
IS12_speaker_trait：The INTERSPEECH 2012 Speaker Trait Challenge，6125 个特征；
IS13_ComParE：The INTERSPEECH 2013 ComParE Challenge，6373 个特征；
ComParE_2016：The INTERSPEECH 2016 Computational Paralinguistics Challenge，6373 个特征。

如果需要用其他特征集，可以自行修改 extract_feats/opensmile.py 中的 FEATURE_NUM 项。

Preprocess

首先需要提取数据集中音频的特征并保存到本地。Opensmile 提取的特征会被保存在 .csv 文件中，librosa 提取的特征会被保存在 .p 文件中。

python preprocess.py --config configs/example.yaml

其中，configs/example.yaml 是你的配置文件路径。

Train

数据集路径可以在 configs/ 中配置，相同情感的音频放在同一个文件夹里（可以参考 utils/files.py 整理数据），如：

└── datasets
    ├── angry
    ├── happy
    ├── sad
    ...

然后：

python train.py --config configs/example.yaml

Predict

用训练好的模型来预测指定音频的情感。checkpoints 分支和 release 页面有一些已经训练好的模型。

python predict.py --config configs/example.yaml

Functions

Radar Chart

画出预测概率的雷达图。

来源：Radar

from utils.common import Radar
'''
输入:
    data_prob: 概率数组
    class_labels: 情感标签
'''
Radar(data_prob, class_labels)

Play Audio

播放一段音频

from utils.common import playAudio
playAudio(file_path)

Plot Curve

画训练过程的准确率曲线和损失曲线。

from utils.common import plotCurve
'''
输入:
    train(list): 训练集损失值或准确率数组
    val(list): 验证集损失值或准确率数组
    title(str): 图像标题
    y_label(str): y 轴标签
'''
plotCurve(train, val, title, y_label)

Waveform

画出音频的波形图。

from utils.common import Waveform
Waveform(file_path)

Spectrogram

画出音频的频谱图。

from utils.common import Spectrogram
Spectrogram(file_path)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Emotion Recognition

Environment

Structure

Requirments

Python

Tools

Datasets

Usage

Prepare

Configuration

Preprocess

Train

Predict

Functions

Radar Chart

Play Audio

Plot Curve

Waveform

Spectrogram

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
configs		configs
extract_feats		extract_feats
models		models
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_EN.md		README_EN.md
__init__.py		__init__.py
common.py		common.py
data_prepare.py		data_prepare.py
predict.py		predict.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
train.py		train.py

License

zhaishuyan/Speech-Emotion-Recognition

Folders and files

Latest commit

History

Repository files navigation

Speech Emotion Recognition

Environment

Structure

Requirments

Python

Tools

Datasets

Usage

Prepare

Configuration

Preprocess

Train

Predict

Functions

Radar Chart

Play Audio

Plot Curve

Waveform

Spectrogram

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages