Knowledge-Guided Dynamic Modality Attention Fusion Framework for Multimodal Sentiment Analysis
The code was refactored to integrate all datasets; please contact me if you find any bugs. Thanks.
KuDA uses four MSA datasets and BERT in the corresponding languages: Chinese (CH-SIMS, CH-SIMSv2) and English (CMU-MOSI, CMU-MOSEI).
- CH-SIMS / CMU-MOSI / CMU-MOSEI can be downloaded from MMSA.
- CH-SIMSv2 can be downloaded from ch-sims-v2 (Supervised).
- CH-SIMS / CH-SIMSv2: bert-base-chinese.
- CMU-MOSI / CMU-MOSEI: bert-base-uncased.
The paper's basic training environment for its results is Python 3.8, Pytorch 1.9.0 with a single NVIDIA RTX 3090. Notably, different hardware and software environments can cause the results to fluctuate.
Note: The parameters of the two stages need to be modified for different datasets because the data lengths and dimensions are different.
There are two ways to obtain weights of knowledge injection:
-
Download the translated text file from this link (required for MOSI and MOSEI, not required for CH-SIMS and CH-SIMSv2), and execute the following command to pretrain each modality:
python pretrain.py
-
The weights we have previously trained can be downloaded from this link.
python train.py
-
In
Encoder_KIAdapter.py
, you need to modify the source code oftorch.nn.TransformerEncoder
to return the intermediate hidden statues. The code can be modified as follows:class TransformerEncoder(Module): r"""TransformerEncoder is a stack of N encoder layers """ __constants__ = ['norm'] def __init__(self, encoder_layer, num_layers, norm=None): super(TransformerEncoder, self).__init__() self.layers = _get_clones(encoder_layer, num_layers) self.num_layers = num_layers self.norm = norm def forward(self, src: Tensor, mask: Optional[Tensor] = None, src_key_padding_mask: Optional[Tensor] = None) -> Tensor: r"""Pass the input through the encoder layers in turn. """ output = src hidden_state_list = [] hidden_state_list.append(output) for mod in self.layers: output = mod(output, src_mask=mask, src_key_padding_mask=src_key_padding_mask) hidden_state_list.append(output) if self.norm is not None: output = self.norm(output) return output, hidden_state_list
-
After completing the preparation of data and models, the file structure is as follows:
├─core ├─data │ ├─CH-SIMS │ ├─CH-SIMSv2 │ ├─MOSI │ └─MOSEI ├─log ├─models ├─pretrainedModel │ ├─BERT │ └─KnowledgeInjectPretraining ├─opts.py ├─pretrain.py ├─train.py
-
We gratefully acknowledge the help of open-source projects used in this work 🎉🎉🎉, including MMSA, ALMT, TMBL, TETFN, CENet, CubeMLP, Self-MM, MMIM, BBFN, MISA, MulT, LMF, TFN, etc 😄.
Paper publication address:
Knowledge-Guided Dynamic Modality Attention Fusion Framework for Multimodal Sentiment Analysis
Please cite our paper if you find it having other limitations and valuable for your research (卑微求引用 T^T) :
@inproceedings{feng2024knowledge,
title={Knowledge-Guided Dynamic Modality Attention Fusion Framework for Multimodal Sentiment Analysis},
author={Feng, Xinyu and Lin, Yuming and He, Lihua and Li, You and Chang, Liang and Zhou, Ya},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2024},
pages={14755--14766},
year={2024}
}