- Website: https://sites.google.com/uit.edu.vn/uit-nlp/datasets-projects
- Paper: A Vietnamese Dataset for Evaluating Machine Reading Comprehension, COLING'20
A Vietnamese Dataset for Evaluating Machine Reading Comprehension.
This dataset comprises over 23,000 human-generated question-answer pairs based on 5,109 passages of 174 Vietnamese articles from Wikipedia. In particular, we propose a new process of dataset creation for Vietnamese MRC.
- Website: https://sites.google.com/uit.edu.vn/uit-nlp/datasets-projects?authuser=0
- Paper: New Vietnamese Corpus for Machine Reading Comprehension of Health News Articles
New Vietnamese Corpus for Machine Reading Comprehension of Health News Articles.
The corpus comprises 22,057 human-generated question-answer pairs. Crowd-workers create the questions and their answers based on a collection of over 4,416 online Vietnamese healthcare news articles, where the answers comprise spans extracted from the corresponding articles. In particular, we develop a process of creating a corpus for the Vietnamese machine reading comprehension.
📜 Papers
📁 Open sources