Skip to content

Latest commit

 

History

History
227 lines (114 loc) · 11.8 KB

datasets-kmrc.md

File metadata and controls

227 lines (114 loc) · 11.8 KB

Benchmark Datasets on Knowledge-based Machine Reading Comprehension.

Contributed by Luxi Xing, Yuqiang Xie and Wei Peng.

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China.

Update on Nov 7, 2021.


  1. [COPA] Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning. AAAI,2011. [paper / data]

    Authors: Melissa Roemmele, Cosmin Adrian Bejan, Andrew S. Gordon

    • Type: Multiple-Choice;
  2. [WSC] The Winograd Schema Challenge. AAAI,2011. [paper /data]

    Authors: Hector J. Levesque, Ernest Davis, Leora Morgenstern

    • Type: Multiple-Choice;
  3. [ROCStories; SCT] A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories. NAACL,2016. [paper / data]

    Authors: Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, James Allen

    • Type: Cloze;
  4. [NarrativeQA] The NarrativeQA Reading Comprehension Challenge. TACL,2018. [paper / data]

    Authors: Tomáš Kočiský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, Edward Grefenstette

    • Type: Generation;
  5. [SemEval-2018 Task 11] MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge. LERC,2018. [paper / data]

    Authors: Simon Ostermann, Ashutosh Modi, Michael Roth, Stefan Thater, Manfred Pinkal

    • Type: Multiple-Choice;
  6. [story-commonsense] Modeling Naive Psychology of Characters in Simple Commonsense Stories. ACL,2018. [paper / data]

    Authors: Hannah Rashkin, Antoine Bosselut, Maarten Sap, Kevin Knight, Yejin Choi

    • Type: Multiple-Choice;
  7. Event2Mind: Commonsense Inference on Events, Intents, and Reactions. ACL,2018. [paper / data]

    Authors: Hannah Rashkin, Maarten Sap, Emily Allaway, Noah A. Smith, Yejin Choi

    • Types: Generation;
  8. ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning. AAAI,2019. [paper / data]

    Authors: Maarten Sap, Ronan LeBras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A. Smith, Yejin Choi

    • Types: Generation;
  9. [ARC] Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. 2018. [paper / data]

    Authors: Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord

    • Type: Multiple-Choice;
  10. [OpenBookQA] Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering. EMNLP,2018. [paper / data]

    Authors: Todor Mihaylov, Peter Clark, Tushar Khot, Ashish Sabharwal

    • Type: Multiple-Choice;
  11. ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension. 2018. [paper / data]

    Authors: Sheng Zhang, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Kevin Duh, Benjamin Van Durme

    • Type: Cloze;
  12. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge. NAACL,2019. [paper / data]

    Authors: Alon Talmor, Jonathan Herzig, Nicholas Lourie, Jonathan Berant

    • Type: Multiple-Choice;
  13. ChID: A Large-scale Chinese IDiom Dataset for Cloze Test. ACL,2019. [paper / data]

    Authors: Chujie Zheng, Minlie Huang, Aixin Sun

    • Type: Cloze;
  14. [sense-making] Does it Make Sense? And Why? A Pilot Study for Sense Making and Explanation. ACL,2019. [paper / data]

    Authors: Cunxiang Wang, Shuailong Liang, Yue Zhang, Xiaonan Li, Tian Gao

    • Type: Multiple-Choice;
  15. HellaSwag: Can a Machine Really Finish Your Sentence? ACL,2019. [paper / data]

    Authors: Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, Yejin Choi

    • Type: Multiple-Choice;
  16. SocialIQA: Commonsense Reasoning about Social Interactions. EMNLP,2019. [paper / data]

    Authors: Maarten Sap, Hannah Rashkin, Derek Chen, Ronan LeBras, Yejin Choi

    • Type: Multiple-Choice;
  17. [ANLI] Abductive Commonsense Reasoning. 2019. [paper / data]

    Authors: Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Scott Wen-tau Yih, Yejin Choi

    • Type: Multiple-Choice;
  18. Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning. EMNLP,2019. [paper / data]

    Authors: Lifu Huang, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi

    • Type: Multiple-Choice;
  19. CODAH: An Adversarially-Authored Question Answering Dataset for Common Sense. ACL,2019,workshop. [paper / data ]

    Authors: Michael Chen, Mike D’Arcy, Alisa Liu, Jared Fernandez, Doug Downey

    • Type: Multiple-Choice;
  20. CommonGen: A Constrained Text Generation Dataset Towards Generative Commonsense Reasoning. 2019. [ paper / data ]

    Authors: Bill Yuchen Lin, Ming Shen, Yu Xing, Pei Zhou, Xiang Ren

    • Type: Generative;
  21. QASC: A Dataset for Question Answering via Sentence Composition. 2019. [paper / data ]

    Authors: Tushar Khot, Peter Clark, Michal Guerquin, Peter Jansen, Ashish Sabharwal

    • Type: Multiple-Choice;
  22. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs. NAACL,2019. [paper]

    Authors: Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner

    • Type: Multi-type
    • Metrics: EM/F1
  23. WIQA: A dataset for "What if ..." reasoning over procedural text. EMNLP,2019. [paper]

    • Type: Multiple-Choice
  24. [ROPEs] Reasoning Over Paragraph Effects in Situations. ACL,2019,workshop. [paper]

    • Type: Multiple-Choice
  25. R^3: A Reading Comprehension Benchmark Requiring Reasoning Process. 2020. [paper]

  26. ProtoQA: A Question Answering Dataset for Prototypical Common-Sense Reasoning. EMNLP,2020. [paper]

  27. ESPRIT: Explaining Solutions to Physical Reasoning Tasks. ACL,2020. [paper]

  28. R4C: A Benchmark for Evaluating RC Systems to Get the Right Answer for the Right Reason. ACL,2020. [paper]

  29. LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning. IJCAI,2020. [paper]

  30. MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms. NAACL-HIT,2019. [paper]

  31. QuaRel: A Dataset and Models for Answering Questions about Qualitative Relationships. AAAI,2019. [paper]

  32. Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning. EMNLP,2019. [paper]

  33. KILT: a Benchmark for Knowledge Intensive Language Tasks. 2020. [paper / data / code]

  34. QED: A Framework and Dataset for Explanations in Question Answering. 2020. [paper]

  35. Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering. EMNLP,2020. [paper]

  36. QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions. EMNLP,2019. [paper]

  37. IIRC: A Dataset of Incomplete Information Reading Comprehension Questions. EMNLP,2020. [paper]

  38. TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages. TACL,2020. [paper]

  39. TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions. EMNLP,2020. [paper]

  40. ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning. ICLR,2020. [paper]

  41. [StrategyQA] Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies. 2021. [paper]

    • Type: Yes/No (Boolean)
  42. [ARC-DA] Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge. 2021. [paper]

    • Type: Generation
  43. SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning. 2021. [paper]

    • Type:
  44. CommonsenseQA 2.0: Exposing the Limits of AI through Gamification. 2021. [paper]

    • Type: Yes/No (Boolean)
  45. NOAHQA: Numerical Reasoning with Interpretable Graph Question Answering Dataset. 2021. [paper]

    • Type:
  46. ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers. 2021. [paper]

    • Type: Extractive, Yes/No
  47. SituatedQA: Incorporating Extra-Linguistic Contexts into QA. EMNLP,2021. [paper]

    • Type:

Note: Here Only consider the benchmark datasets/tasks which require knowledge to complete.