Multimodal-Emotion-Recognition-using-AVTCA

This repository implements a multimodal network for emotion recognition using the Audio-Video Transformer Fusion with Cross Attention (AVT-CA) model, as given in the paper Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention. The implementation supports the RAVDESS dataset, which includes speech and frontal face view data across 8 distinct emotions: 01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, and 08 = surprised.

AVT-CA Model Diagram

Feel free to play around with the code, and let us know if you have any questions or face any issues!

Citation

If you use our work, please cite as:

@misc{AVTCA,
      title={Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention}, 
      author={Joe Dhanith P R and Shravan Venkatraman and Modigari Narendra and Vigya Sharma and Santhosh Malarvannan and Amir H. Gandomi},
      year={2024},
      eprint={2407.18552},
      archivePrefix={arXiv},
      primaryClass={cs.MM},
      url={https://arxiv.org/abs/2407.18552}, 
}

If you are referencing our work, please also cite the following related paper:

Chumachenko, K., Iosifidis, A., & Gabbouj, M. (2022). Self-attention fusion for audiovisual emotion recognition with incomplete data. arXiv. https://arxiv.org/abs/2201.11095

References

This work incorporates EfficientFace, available at EfficientFace GitHub repository. Please cite the paper titled "Robust Lightweight Facial Expression Recognition Network with Label Distribution Training" if you use EfficientFace. We appreciate @zengqunzhao for providing both the implementation and the pretrained model for EfficientFace!

The training pipeline code has been adapted from Efficient-3DCNNs GitHub repository, which is licensed under the MIT license. Additionally, parts of the fusion implementation are based on the timm library, available under the Apache 2.0 license. For data preprocessing, we utilized facenet-pytorch.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
__pycache__		__pycache__
datasets		datasets
img		img
models		models
ravdess_preprocessing		ravdess_preprocessing
.gitignore		.gitignore
EfficientFace_Trained_on_AffectNet7.pth		EfficientFace_Trained_on_AffectNet7.pth
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
main.py		main.py
model.py		model.py
opts.py		opts.py
requirements.txt		requirements.txt
train.py		train.py
transforms.py		transforms.py
unzip.py		unzip.py
utils.py		utils.py
validation.py		validation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal-Emotion-Recognition-using-AVTCA

Citation

References

About

Releases

Packages

Languages

License

shravan-18/AVTCA

Folders and files

Latest commit

History

Repository files navigation

Multimodal-Emotion-Recognition-using-AVTCA

Citation

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages