This is the official GitHub page for the paper:
Gullal Singh Cheema, Sherzod Hakimov, Abdul Sittar, Eric Müller-Budack, Christian Otto, and Ralph Ewerth. 2022. “MM-Claims: A Dataset for Multimodal Claim Detection in Social Media.“ In Findings of the Association for Computational Linguistics: NAACL 2022, pages 962–979, Seattle, United States. Association for Computational Linguistics.
If you are interested in the binary task on check-worthiness estimation in multimodal claims, you can find the refined dataset with new test data released as part of the CLEF Checkthat! 2023 challenge: https://gitlab.com/checkthat_lab/clef2023-checkthat-lab/-/tree/main
The paper is available here: https://aclanthology.org/2022.findings-naacl.72/
Dataset with tweet IDs and labels are available at: https://data.uni-hannover.de/dataset/mm_claims
Annotation guideline document is available here: https://github.com/TIBHannover/MM_Claims/blob/main/misc_files/annotation_doc.pdf
For access to images and tweets, send an email with organization (university/institute) and purpose/usage details to gullal.cheema@tib.eu
- Create conda environment:
conda env create -f environment.yml
- Activate the environment:
conda activate mmclaim11
- Install thundersvm:
git clone https://github.com/Xtra-Computing/thundersvm.git
cd thundersvm
mkdir build
cd build
cmake ..
make -j
cd python
python setup.py install
-
Install clip:
pip install git+https://github.com/openai/CLIP.git
-
Add two changes to
ALBEF/models/model_ve.py
to avoid path errors:- At the top:
import sys sys.path.append('ALBEF/')
'ALBEF/'+config['bert_config']
in linebert_config = BertConfig.from_json_file(config['bert_config'])
- At the top:
- Download the training, validation and test split csvs in
data/
- Download and extract image zip files in
data/
- Download text jsons in
data/
- Download pre-trained ALBEF checkpoint from https://github.com/salesforce/ALBEF and move it to
albef_checkpoint/
- Extract CLIP features
python extraction/feat_extract_clip.py -c rn504
- Extract ALBEF features
python extraction/feat_extract_albef.py
-
Train with clip features on split with resolved label conflicts, Binary claim detection:
python training/train_svm.py -n 2 -m clip -c rn504 -d wrc
-
Train with clip features on split with resolved label conflicts, Tertiary claim detection:
python training/train_svm.py -n 3 -m clip -c rn50 -d wrc
-
Train with clip features on split without label conflicts, Tertiary claim detection:
python training/train_svm.py -n 3 -m clip -c vit16 -d woc
-
Replace
-m clip
with-m albef
to use albef features.
python training/finetune_albef_mm.py --fr_no 8 --bs 8 --cls 2
-
Download trained svm models (above) from here and move them in
models/
-
Evaluate svm trained with clip features on test splits, Binary claim detection:
python inference/eval_svm.py -m clip -c rn504 -d wrc
Output:
----------------- Number of classes: 2 Model: clip CLIP model: rn504 Train split type: with_resolved_conflicts ----------------- Number of test features and labels with resolved label conflicts: (585, 1280) (585,) Number of test features and labels wihtout label conflicts: (525, 1280) (525,) Test with resolved conflicts Acc/F1: 77.78/77.39 Test without conflicts Acc/F1: 79.43/78.39
-
Evaluate svm trained with albef features on test splits, Binary claim detection:
python inference/eval_svm.py -m albef -d wrc
Output:
----------------- Number of classes: 2 Model: albef CLIP model: vit Train split type: with_resolved_conflicts ----------------- Number of test features and labels with resolved label conflicts: (585, 768) (585,) Number of test features and labels wihtout label conflicts: (525, 768) (525,) Test with resolved conflicts Acc/F1: 76.92/76.46 Test without conflicts Acc/F1: 78.67/77.51
-
Evaluate svm trained with albef features on test splits, Tertiary claim detection:
python inference/eval_svm.py -m albef -n 3 -d woc
Output:
---------------- Number of classes: 3 Model: albef CLIP model: vit Train split type: without_conflicts ----------------- Number of test features and labels with resolved label conflicts: (585, 768) (585,) Number of test features and labels wihtout label conflicts: (525, 768) (525,) Test with resolved conflicts Acc/F1: 71.45/58.61 Test without conflicts Acc/F1: 75.43/55.54
-
Evaluate albef:
python inference/eval_albef.py --cls 2 --model models/mmc_albef_2cls_wrc.pth
If you find the data or the code useful, cite us:
@inproceedings{DBLP:conf/naacl/CheemaHSMOE22,
author = {Gullal Singh Cheema and
Sherzod Hakimov and
Abdul Sittar and
Eric M{\"{u}}ller{-}Budack and
Christian Otto and
Ralph Ewerth},
editor = {Marine Carpuat and
Marie{-}Catherine de Marneffe and
Iv{\'{a}}n Vladimir Meza Ru{\'{\i}}z},
title = {MM-Claims: {A} Dataset for Multimodal Claim Detection in Social Media},
booktitle = {Findings of the Association for Computational Linguistics: {NAACL}
2022, Seattle, WA, United States, July 10-15, 2022},
pages = {962--979},
publisher = {Association for Computational Linguistics},
year = {2022},
url = {https://aclanthology.org/2022.findings-naacl.72},
timestamp = {Mon, 18 Jul 2022 17:13:00 +0200},
biburl = {https://dblp.org/rec/conf/naacl/CheemaHSMOE22.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
If you use the refined dataset released in the CLEF Checkthat! 2023 challenge, please cite above and the below paper as well:
@inproceedings{DBLP:conf/ecir/BarronCedenoACMEGHRSNCAN23,
author = {Alberto Barr{\'{o}}n{-}Cede{\~{n}}o and
Firoj Alam and
Tommaso Caselli and
Giovanni Da San Martino and
Tamer Elsayed and
Andrea Galassi and
Fatima Haouari and
Federico Ruggeri and
Julia Maria Stru{\ss} and
Rabindra Nath Nandi and
Gullal S. Cheema and
Dilshod Azizov and
Preslav Nakov},
editor = {Jaap Kamps and
Lorraine Goeuriot and
Fabio Crestani and
Maria Maistro and
Hideo Joho and
Brian Davis and
Cathal Gurrin and
Udo Kruschwitz and
Annalina Caputo},
title = {The {CLEF-2023} CheckThat! Lab: Checkworthiness, Subjectivity, Political
Bias, Factuality, and Authority},
booktitle = {Advances in Information Retrieval - 45th European Conference on Information
Retrieval, {ECIR} 2023, Dublin, Ireland, April 2-6, 2023, Proceedings,
Part {III}},
series = {Lecture Notes in Computer Science},
volume = {13982},
pages = {506--517},
publisher = {Springer},
year = {2023},
url = {https://doi.org/10.1007/978-3-031-28241-6\_59},
doi = {10.1007/978-3-031-28241-6\_59},
timestamp = {Tue, 28 Mar 2023 19:49:31 +0200},
biburl = {https://dblp.org/rec/conf/ecir/BarronCedenoACMEGHRSNCAN23.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}