Automated Fact-Checking

Data, models, and code to reproduce our Pipeline and Dataset Generation for Automated Fact-checking in Almost Any Language paper. Currently in review for NCAA journal.

@article{drchal2023pipeline,
  title={Pipeline and Dataset Generation for Automated Fact-checking in Almost Any Language},
  author={Drchal, Jan and Ullrich, Herbert and Mlyn{\'a}{\v{r}}, Tom{\'a}{\v{s}} and Moravec, V{\'a}clav},
  journal={arXiv preprint arXiv:2312.10171},
  year={2023}
}

Code

QACG Data Generation -- our fork of the original QACG procedure.
ColBERTv2 -- our fork of ColBERTv2. The retrieval for FactSearch is realized via REST API.
anserini-indexing -- wrapper for ANSERINI BM25.The retrieval for FactSearch is realized via REST API.
FactSearch source is hosted in this repository.

Data to Train QACG Models

The following datasets were created by machine translation using DeepL. See the paper for more details.

QACG Models

Question Generation model trained on a concatenation of Czech, English, Polish, and Slovak SQuAD datasets:

mt5-large-qg-sum

Claim Generation model train on a concatenation of Czech, English, Polish, and Slovak QA2D datasets:

mt5-large-cg-sum

QACG Generated Data

All QACG-generated datasets are based on the corresponding Wikipedia snapshots using the QACG models above. The QACG-mix combines all four languages, preserving the size of each individual language dataset. The QACG-sum is a four-times larger concatenation of all individual language datasets.

ColBERTv2 Evidence Retrieval

colbertv2-QACG-SUM

NLI Veracity Evaluation

nli-QACG-sum

NLI Annotations

Here

Evidence Retrieval Annotations

Here

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
cfg/factsearch		cfg/factsearch
data		data
factsearch		factsearch
notebooks		notebooks
slurm		slurm
tex @ 4bb3c9e		tex @ 4bb3c9e
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Fact-Checking

Code

Data to Train QACG Models

QACG Models

QACG Generated Data

ColBERTv2 Evidence Retrieval

NLI Veracity Evaluation

NLI Annotations

Evidence Retrieval Annotations

About

Releases

Packages

Contributors 2

Languages

aic-factcheck/automated-fact-checking

Folders and files

Latest commit

History

Repository files navigation

Automated Fact-Checking

Code

Data to Train QACG Models

QACG Models

QACG Generated Data

ColBERTv2 Evidence Retrieval

NLI Veracity Evaluation

NLI Annotations

Evidence Retrieval Annotations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages