This repository contains data and code for the first-place solution to the nlpcc2023 shared task: DiaASQ. See the project page for more details.
Our solution is a modified version of DiaASQ
To clone and install the repository, please run the following command:
git clone https://github.com/Joint-Laboratory-of-HUST-and-PAIC/nlpcc2023-shared-task-diaASQ.git
cd nlpcc2023-shared-task-diaASQ
conda create -n diaasq python=3.9 -y
conda activate diaasq
pip install -r requirements.txtThe architecture of our model is shown below:
We modified the baseline in the following aspects: + We use the [MacBERT] for both English and Chinese. + The English version is transfered from the final Chinese weights to achieve cross-lingual transfer. + We modified the loss weigths to make the model more robust. + We replaced multi-view interaction with three consecutive multi-head attention modules. + Cross-validation is used to select the best model and ensemble the models.The model is implemented using PyTorch. The versions of the main packages used in our experiments are listed below:ss
- torch==2.0.1
- transformers==4.29.1
Install the other required packages:
pip install -r requirements.txtWe recommend using conda python 3.9 for all experiments.
See Recipe for more details.
You can download the pretrained model from Google dirve and put it in ./recipes/en/model_fused_top3.tar or ./zh/model_fused_top3.tar. You can do inference with the following command:
cd recipes
bash kfold_inference.sh zh
bash kfold_inference.sh en
bash extract_and_apply_rules.sh # optional step, apply rules, improvement uknown,- GPU memory requirements
| Dataset | Batch size | GPU Memory |
|---|---|---|
| Chinese | 1 | 11GB. |
| English | 1 | 11GB. |
In all our experiments, we use a single RTX 3090 12GB.
Our final submission on the test set achieves the following results(slig):
Chinese:
| Item | Prec. | Rec. | F1 | TP | Pred. | Gold |
|---|---|---|---|---|---|---|
| Micro | 0.4339 | 0.3431 | 0.3832 | 187 | 431 | 545 |
| Iden | 0.4988 | 0.3945 | 0.4406 | 215 | 431 | 545 |
| Avg F1 | 0.4119 |
English:
| Item | Prec. | Rec. | F1 | TP | Pred. | Gold |
|---|---|---|---|---|---|---|
| Micro | 0.4887 | 0.3871 | 0.4320 | 216 | 442 | 558 |
| Iden | 0.5226 | 0.4140 | 0.4620 | 231 | 442 | 558 |
| Avg F1 | 0.4470 |
And the average F1 score for en/zh is 0.4295.
If you use our dataset, please cite the following paper:
@article{lietal2022arxiv,
title={DiaASQ: A Benchmark of Conversational Aspect-based Sentiment Quadruple Analysis},
author={Bobo Li, Hao Fei, Fei Li, Yuhan Wu, Jinsong Zhang, Shengqiong Wu, Jingye Li, Yijiang Liu, Lizi Liao, Tat-Seng Chua, Donghong Ji}
journal={arXiv preprint arXiv:2211.05705},
year={2022}
}
