Human Understanding

2022 휴먼이해 인공지능 논문경진대회

MMM: Multi-modal Emotion Recognition in conversation with MLP Mixer

Environment

python version: python3.8
OS type: Linux
requires packages: {
      'numpy==1.22.3',
      'pandas==1.4.2',
      'torch==1.11.0+cu113',
      'torchaudio==0.11.0+cu113',
      'scikit-learn',
      'transformers==4.18.0',
      'tokenizers==0.12.1',
      'soundfile==0.10.3.post1'
}

# setup environment
pip install numpy==1.22.3 pandas==1.4.2 scikit-learn transformers==4.18.0 tokenizers==0.12.1 soundfile==0.10.3.post1
pip install torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

Directory

코드 구현을 위해서는 KEMDy20 및 AI Hub 감성대화 말뭉치 음성파일이 알맞은 위치에 있어야합니다.

+--KEMDy20
      +--annotation
      +--wav
      # train과 inference 속도를 향상시키기 위해 pretrained Wav2Vec2모델에서 연산한 결과를 미리 저장하여 활용하였음.
            +--audio_embeddings    
                  +--hidden_state.json    
                  +--extract_feature.json
      # train과 inference 속도를 향상시키기 위해 pretrained Wav2Vec2모델에서 연산한 결과를 미리 저장하여 활용하였음.
            +--hidden_states
                +-- {file_name}.pt
      # AI Hub 감성대화 말뭉치 file들이 저장된 폴더
            +--emotiondialogue
                +--F_000001.wav
                ...
                +--M_005000.wav
            +--Sessoion01
            ...
            +--Session40
      +--TEMP
      +--IBI
      +--EDA
+--data
      +--processed_KEMDy20.json   # KEMDy20데이터와 감성대화 말뭉치를 전처리한 파일
+--models
      +--module_for clossattention
      +--multimodal.py
      +--multimodal_attention
      +--multimodal_cross_attention
      +--multimodal_mixer      
+--merdataset.py
+--preprocessing.py
+--utils.py
+--test.py
+--config.py
+--train.py
+--train_crossattention.py
+--train_mixer.py

Base Model

Encoder	Architecture	pretrained-weights
Audio Encoder	pretrained Wav2Vec 2.0	kresnik/wav2vec2-large-xlsr-korean
Text Encoder	pretrained Electra	beomi/KcELECTRA-base

Arguments

train.py

argument	description
--epochs {int}	epoch 횟수
--batch {int}	batch 사이즈
--save	체크포인트 저장 여부
--retrain	기존 모델을 불러와 다시 학습
--model_name {str}	모델을 불러오거나 저장할 때 사용할 모델명
--ws	weighted random sampling 수행
--shuffle {bool}	data shuffle 수행 여부 --ws와 동시 적용 X
--cuda {cuda:num}	사용할 gpu 번호
--K {int}	사용할 utterance context 수
--hidden	feature가 아닌 hidden state를 사용합니다
--class_weight {bool}	각 클래스별 loss에 가중치를 두어 학습합니다.
--use_threeway	three way concat 모델 구조로 학습을 진행합니다

# vanilla training
python train.py --model_name vanilla --save --class_weight False

# 3way concat training
python train.py --model_name three_way --save --class_weight False --use_threeway

train_crossattention.py

argument	description
--epochs {int}	epoch 횟수
--batch {int}	batch 사이즈
--save	체크포인트 저장 여부
--retrain	기존 모델을 불러와 다시 학습
--model_name {str}	모델을 불러오거나 저장할 때 사용할 모델명
--shuffle {bool}	data shuffle 수행 여부
--cuda {cuda:num}	사용할 gpu 번호

# multimodal cross attention training
python train_crossattention.py --model_name cross_attention --save

train_mixer.py

argument	description
--epochs {int}	epoch 횟수
--batch {int}	batch 사이즈
--save	체크포인트 저장 여부
--retrain	기존 모델을 불러와 다시 학습
--model_name {str}	모델을 불러오거나 저장할 때 사용할 모델명
--shuffle {bool}	data shuffle 수행 여부 --ws와 동시 적용 X
--cuda {cuda:num}	사용할 gpu 번호
--num_blocks {int}	MLP layer의 수를 설정합니다.

# multimodal mixer training
python train_mixer.py --model_name mlp_mixer --save

test.py

argument	description
--batch {int}	batch 사이즈
--cuda {cuda:num}	사용할 gpu 번호
--model_name	test할 model file 이름
--all	test_all 폴더의 모든 file을 test합니다.

python test.py --model_name {model_name}_epoch29 --batch 64

Model Architecture

Vanilla Concat & 3-way Concat

Multimodal Cross Attention & Multimodal MLP-Mixer

Mixer Layer

Experiments

Architecture

Index	Architecture	Accuracy	W-Precision	W-F1
1	Vanilla concat	71.753	69.587	70.250
2	3 way concat	73.127	72.088	72.138
3	Cross-Attention	77.749	78.298	77.343
4	MLP-Mixer	78.884	78.450	78.418

Data Source

일반인 대상 자유발화
AI Hub 감성대화 말뭉치

References

Noh, K.J.; Jeong, C.Y.; Lim, J.; Chung, S.; Kim, G.; Lim, J.M.; Jeong, H. Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets. Sensors 2021, 21, 1579. https://doi.org/10.3390/s21051579

Tsai, Yao-Hung Hubert, et al. Multimodal transformer for unaligned multimodal language sequences. Association for Computational Linguistics (ACL), 2019. https://github.com/yaohungt/Multimodal-Transformer

@inproceedings{tsai2019MULT,
  title={Multimodal Transformer for Unaligned Multimodal Language Sequences},
  author={Tsai, Yao-Hung Hubert and Bai, Shaojie and Liang, Paul Pu and Kolter, J. Zico and Morency, Louis-Philippe and Salakhutdinov, Ruslan},
  booktitle={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  month = {7},
  year={2019},
  address = {Florence, Italy},
  publisher = {Association for Computational Linguistics},
}

Junbum, Lee. KcELECTRA: Korean comments ELECTRA, 2021.

@misc{lee2021kcelectra,
  author = {Junbum Lee},
  title = {KcELECTRA: Korean comments ELECTRA},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/Beomi/KcELECTRA}}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human Understanding

MMM: Multi-modal Emotion Recognition in conversation with MLP Mixer

Environment

Directory

Base Model

Arguments

train.py

train_crossattention.py

train_mixer.py

test.py

Model Architecture

Vanilla Concat & 3-way Concat

Multimodal Cross Attention & Multimodal MLP-Mixer

Mixer Layer

Experiments

Data Source

References

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.idea		.idea
data		data
img		img
models		models
README.md		README.md
config.py		config.py
merdataset.py		merdataset.py
prepocessing.py		prepocessing.py
test.py		test.py
train.py		train.py
train_crossattention.py		train_crossattention.py
train_mixer.py		train_mixer.py
utils.py		utils.py

ISDS-Human-Understanding/HumanUnderstandingOpen

Folders and files

Latest commit

History

Repository files navigation

Human Understanding

MMM: Multi-modal Emotion Recognition in conversation with MLP Mixer

Environment

Directory

Base Model

Arguments

train.py

train_crossattention.py

train_mixer.py

test.py

Model Architecture

Vanilla Concat & 3-way Concat

Multimodal Cross Attention & Multimodal MLP-Mixer

Mixer Layer

Experiments

Data Source

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages