The official implementation for paper: Multimodal Emotion Recognition Calibration in Conversations, MM '24.
- Python 3.10.13
- PyTorch 1.13.1
- torch_geometric 2.4.0
- torch-scatter 2.1.0
- torch-sparse 0.5.15
- CUDA 11.7
- Download multimodal-features
- Save data/iemocap/iemocap_features_roberta.pkl, data/iemocap/IEMOCAP_features.pkl in
data/
; Save meld_features_roberta.pkl, data/meld/MELD_features_raw1.pkl indata/
.
- Download IEMOCAP_checkpoint.pkl, MELD_checkpoint.pkl from google drive, and put them in
checkpoints/
- Train M³Net model for the ERC task using the IEMOCAP dataset. Please refer to M³Net
python train_cm.py --base-model 'GRU' --dropout 0.5 --lr 0.0001 --batch-size 16 --graph_type 'hyper' \
--epochs 80 --graph_construct='direct' --multi_modal --mm_fusion_mthd='concat_DHT'\
--modals='avl' --Dataset='IEMOCAP' --norm BN --num_L 3 --num_K 4 --seed 1475 \
- Calculate the difficulty of each conversation.
python caldiff/cal_diff.py --base-model 'GRU' --dropout 0.5 --lr 0.0001 --batch-size 16 --graph_type 'hyper' \
--epochs 80 --graph_construct='direct' --multi_modal --mm_fusion_mthd='concat_DHT'\
--modals='avl' --Dataset='IEMOCAP' --norm BN --num_L 3 --num_K 4 --seed 1475 \
--ckpt_path='YOUR_M3NET_MODEL'
- Train CMERC on the M³Net model for the ERC task using the IEMOCAP dataset.
python train_cm.py --base-model 'GRU' --dropout 0.5 --lr 0.0001 --batch-size 16 --graph_type 'hyper' \
--epochs 80 --graph_construct='direct' --multi_modal --mm_fusion_mthd='concat_DHT'\
--modals='avl' --Dataset='IEMOCAP' --norm BN --num_L 3 --num_K 4 --seed 1475 \
--calibrate --rank_coff 0.005 \
--contrastlearning --mscl_coff 0.05 --cscl_coff 0.05 \
--courselearning --epoch_ratio 0.15 --scheduler_steps 1
- Train M³Net model for the ERC task using the MELD dataset. Please refer to M³Net
python -u train_cm.py --base-model 'GRU' --dropout 0.4 --lr 0.0001 --batch-size 16 --graph_type='hyper' \
--epochs=40 --graph_construct='direct' --multi_modal --mm_fusion_mthd='concat_DHT' \
--modals='avl' --Dataset='MELD' --norm BN --num_L=3 --num_K=3 --seed 67137
- Calculate the difficulty of each conversation.
python -u train_cm.py --base-model 'GRU' --dropout 0.4 --lr 0.0001 --batch-size 16 --graph_type='hyper' \
--epochs=40 --graph_construct='direct' --multi_modal --mm_fusion_mthd='concat_DHT' \
--modals='avl' --Dataset='MELD' --norm BN --num_L=3 --num_K=3 --seed 67137 \
--ckpt_path='YOUR_M3NET_MODEL'
- Train CMERC on the M³Net model for the ERC task using the MELD dataset.
python -u train_cm.py --base-model 'GRU' --dropout 0.4 --lr 0.0001 --batch-size 16 --graph_type='hyper' \
--epochs=40 --graph_construct='direct' --multi_modal --mm_fusion_mthd='concat_DHT' \
--modals='avl' --Dataset='MELD' --norm BN --num_L=3 --num_K=3 --seed 67137 \
--calibrate --rank_coff 0.002 \
--contrastlearning --mscl_coff 0.15 --cscl_coff 0.15 \
--courselearning --epoch_ratio 0.4 --scheduler_steps 1
We have already provided the difficulty for each conversation in caldiff/
directory, so we can skip steps 1-2 mentioned above and use our provided checkpoint for efficient training.
- For IEMOCAP:
python train_cm.py --base-model 'GRU' --dropout 0.5 --lr 0.0001 --batch-size 16 --graph_type 'hyper' \
--epochs 80 --graph_construct='direct' --multi_modal --mm_fusion_mthd='concat_DHT'\
--modals='avl' --Dataset='IEMOCAP' --norm BN --num_L 3 --num_K 4 --seed 1475 \
--calibrate --rank_coff 0.005 \
--contrastlearning --mscl_coff 0.05 --cscl_coff 0.05 \
--courselearning --epoch_ratio 0.15 --scheduler_steps 1
- For MELD:
python -u train_cm.py --base-model 'GRU' --dropout 0.4 --lr 0.0001 --batch-size 16 --graph_type='hyper' \
--epochs=40 --graph_construct='direct' --multi_modal --mm_fusion_mthd='concat_DHT' \
--modals='avl' --Dataset='MELD' --norm BN --num_L=3 --num_K=3 --seed 67137 \
--calibrate --rank_coff 0.002 \
--contrastlearning --mscl_coff 0.15 --cscl_coff 0.15 \
--courselearning --epoch_ratio 0.4 --scheduler_steps 1
- Evaluation CMERC on the M³Net model for the ERC task using the IEMOCAP dataset.
python -u train.py --base-model 'GRU' --dropout 0.5 --lr 0.0001 --batch-size 16 --graph_type='hyper' --epochs=0 --graph_construct='direct' --multi_modal --mm_fusion_mthd='concat_DHT' --modals='avl' --Dataset='IEMOCAP' --norm BN --testing
- Evaluation CMERC on the M³Net model for the ERC task using the MELD dataset.
python -u train.py --base-model 'GRU' --dropout 0.4 --lr 0.0001 --batch-size 16 --graph_type='hyper' --epochs=0 --graph_construct='direct' --multi_modal --mm_fusion_mthd='concat_DHT' --modals='avl' --Dataset='MELD' --norm BN --num_L=3 --num_K=3 --testing
If you find our work useful for your research, please kindly cite our paper as follows:
@inproceedings{tu2024calibrate,
title = {Multimodal Emotion Recognition Calibration in Conversations},
author = {Tu, Geng and Xiong, Feng and Liang, Bin and Wang, Hui and Zeng, Xi and and Xu, Ruifeng},
booktitle = {Proceedings of the 32nd ACM International Conference on Multimedia},
year = {2024}
}
Special thanks to the following authors for their contributions through open-source implementations.