Skip to content

Alab-NII/ReasoningShortcutsInMRC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Measuring and Mitigating Reasoning Shortcuts in Machine Reading Comprehension

Related Survey papers

Measuring Shortcuts

Adversarial data evaluation

Label-Changed

Paper Add/Edit Level Creation Method Original Dataset Naturalness
Consistency Ribeiro et al. 2019 question auto SQuAD Yes
Contrast Sets Gardner et al. 2020 word experts DROP, Quoref, .. No
SAM Schlegel et al. 2021 word auto SQuAD, HotpotQA, DROP, NewsQA No
Break, Perturb, Build Geva et al. 2022 question auto DROP, HotpotQA, IIRC Yes
Unanswerable Questions
Not-answerable Questions Nakanishi et al. 2018 context auto SQuAD Yes
SQuADRUn Rajpurkar et al. 2018 question crowdworkers SQuAD Yes
Disconnected Reasoning Trivedi et al. 2020 context auto HotpotQA Yes
MuSiQue Trivedi et al. 2022 context auto MuSiQue-Ans Yes

Artifact-based models

Intermediate task evaluation

Paper Form Purpose Task Github Dataset Note
Inoue et al. 2020 Triple Evaluation & Training Derivation generation URL R4C based on HotpotQA
Ho et al. 2020 Triple Evaluation & Training Evidence generation URL 2WikiMultiHopQA
Wolfson et al. 2020 QDMR Training - URL Break it down based on ten datasets (e.g., HotpotQA & DROP)
Tang et al. 2021 Sub-question Evaluation QA about sub-questions URL 1000 samples based on HotpotQA
Geva et al. 2021 Sub-question Evaluation & Training QA about sub-questions URL StrategyQA implicit questions
Ho et al. 2022 Sub-question Evaluation & Training QA about sub-questions URL HieraDate only for comparison about Date information
Trivedi et al. 2022 Sub-question Evaluation & Training QA about sub-questions URL MuSiQue
Dalvi et al. 2021 Entailment Tree Evaluation & Training tree generation URL EntailmentBank based on ARC and WorldTree V2
Ribeiro et al. 2023 a graph Evaluation & Training graph generation URL STREET based on ARC, SCONE, GSM8K, AQUA-RAT, and AR-LSAT

Language skills evaluation

Mitigating Shortcuts

Training on adversarial data

Altering the training process

Utilizing intermediate tasks

Regularization