Contributed by Luxi Xing, Yuqiang Xie and Wei Peng.
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China.
Update on Nov 7, 2021.
-
[COPA] Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning. AAAI,2011. [paper / data]
Authors: Melissa Roemmele, Cosmin Adrian Bejan, Andrew S. Gordon
- Type: Multiple-Choice;
-
[WSC] The Winograd Schema Challenge. AAAI,2011. [paper /data]
Authors: Hector J. Levesque, Ernest Davis, Leora Morgenstern
- Type: Multiple-Choice;
-
[ROCStories; SCT] A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories. NAACL,2016. [paper / data]
Authors: Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, James Allen
- Type: Cloze;
-
[NarrativeQA] The NarrativeQA Reading Comprehension Challenge. TACL,2018. [paper / data]
Authors: Tomáš Kočiský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, Edward Grefenstette
- Type: Generation;
-
[SemEval-2018 Task 11] MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge. LERC,2018. [paper / data]
Authors: Simon Ostermann, Ashutosh Modi, Michael Roth, Stefan Thater, Manfred Pinkal
- Type: Multiple-Choice;
-
[story-commonsense] Modeling Naive Psychology of Characters in Simple Commonsense Stories. ACL,2018. [paper / data]
Authors: Hannah Rashkin, Antoine Bosselut, Maarten Sap, Kevin Knight, Yejin Choi
- Type: Multiple-Choice;
-
Event2Mind: Commonsense Inference on Events, Intents, and Reactions. ACL,2018. [paper / data]
Authors: Hannah Rashkin, Maarten Sap, Emily Allaway, Noah A. Smith, Yejin Choi
- Types: Generation;
-
ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning. AAAI,2019. [paper / data]
Authors: Maarten Sap, Ronan LeBras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A. Smith, Yejin Choi
- Types: Generation;
-
[ARC] Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. 2018. [paper / data]
Authors: Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord
- Type: Multiple-Choice;
-
[OpenBookQA] Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering. EMNLP,2018. [paper / data]
Authors: Todor Mihaylov, Peter Clark, Tushar Khot, Ashish Sabharwal
- Type: Multiple-Choice;
-
ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension. 2018. [paper / data]
Authors: Sheng Zhang, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Kevin Duh, Benjamin Van Durme
- Type: Cloze;
-
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge. NAACL,2019. [paper / data]
Authors: Alon Talmor, Jonathan Herzig, Nicholas Lourie, Jonathan Berant
- Type: Multiple-Choice;
-
ChID: A Large-scale Chinese IDiom Dataset for Cloze Test. ACL,2019. [paper / data]
Authors: Chujie Zheng, Minlie Huang, Aixin Sun
- Type: Cloze;
-
[sense-making] Does it Make Sense? And Why? A Pilot Study for Sense Making and Explanation. ACL,2019. [paper / data]
Authors: Cunxiang Wang, Shuailong Liang, Yue Zhang, Xiaonan Li, Tian Gao
- Type: Multiple-Choice;
-
HellaSwag: Can a Machine Really Finish Your Sentence? ACL,2019. [paper / data]
Authors: Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, Yejin Choi
- Type: Multiple-Choice;
-
SocialIQA: Commonsense Reasoning about Social Interactions. EMNLP,2019. [paper / data]
Authors: Maarten Sap, Hannah Rashkin, Derek Chen, Ronan LeBras, Yejin Choi
- Type: Multiple-Choice;
-
[ANLI] Abductive Commonsense Reasoning. 2019. [paper / data]
Authors: Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Scott Wen-tau Yih, Yejin Choi
- Type: Multiple-Choice;
-
Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning. EMNLP,2019. [paper / data]
Authors: Lifu Huang, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi
- Type: Multiple-Choice;
-
CODAH: An Adversarially-Authored Question Answering Dataset for Common Sense. ACL,2019,workshop. [paper / data ]
Authors: Michael Chen, Mike D’Arcy, Alisa Liu, Jared Fernandez, Doug Downey
- Type: Multiple-Choice;
-
CommonGen: A Constrained Text Generation Dataset Towards Generative Commonsense Reasoning. 2019. [ paper / data ]
Authors: Bill Yuchen Lin, Ming Shen, Yu Xing, Pei Zhou, Xiang Ren
- Type: Generative;
-
QASC: A Dataset for Question Answering via Sentence Composition. 2019. [paper / data ]
Authors: Tushar Khot, Peter Clark, Michal Guerquin, Peter Jansen, Ashish Sabharwal
- Type: Multiple-Choice;
-
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs. NAACL,2019. [paper]
Authors: Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner
- Type: Multi-type
- Metrics: EM/F1
-
WIQA: A dataset for "What if ..." reasoning over procedural text. EMNLP,2019. [paper]
- Type: Multiple-Choice
-
[ROPEs] Reasoning Over Paragraph Effects in Situations. ACL,2019,workshop. [paper]
- Type: Multiple-Choice
-
R^3: A Reading Comprehension Benchmark Requiring Reasoning Process. 2020. [paper]
-
ProtoQA: A Question Answering Dataset for Prototypical Common-Sense Reasoning. EMNLP,2020. [paper]
-
ESPRIT: Explaining Solutions to Physical Reasoning Tasks. ACL,2020. [paper]
-
R4C: A Benchmark for Evaluating RC Systems to Get the Right Answer for the Right Reason. ACL,2020. [paper]
-
LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning. IJCAI,2020. [paper]
-
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms. NAACL-HIT,2019. [paper]
-
QuaRel: A Dataset and Models for Answering Questions about Qualitative Relationships. AAAI,2019. [paper]
-
Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning. EMNLP,2019. [paper]
-
KILT: a Benchmark for Knowledge Intensive Language Tasks. 2020. [paper / data / code]
-
QED: A Framework and Dataset for Explanations in Question Answering. 2020. [paper]
-
Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering. EMNLP,2020. [paper]
-
QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions. EMNLP,2019. [paper]
-
IIRC: A Dataset of Incomplete Information Reading Comprehension Questions. EMNLP,2020. [paper]
-
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages. TACL,2020. [paper]
-
TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions. EMNLP,2020. [paper]
-
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning. ICLR,2020. [paper]
-
[StrategyQA] Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies. 2021. [paper]
- Type: Yes/No (Boolean)
-
[ARC-DA] Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge. 2021. [paper]
- Type: Generation
-
SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning. 2021. [paper]
- Type:
-
CommonsenseQA 2.0: Exposing the Limits of AI through Gamification. 2021. [paper]
- Type: Yes/No (Boolean)
-
NOAHQA: Numerical Reasoning with Interpretable Graph Question Answering Dataset. 2021. [paper]
- Type:
-
ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers. 2021. [paper]
- Type: Extractive, Yes/No
-
SituatedQA: Incorporating Extra-Linguistic Contexts into QA. EMNLP,2021. [paper]
- Type:
Note: Here Only consider the benchmark datasets/tasks which require knowledge to complete.