SQuAD [Rajpurkar et al. 2016] is a large scale dataset for training of question answering systems on factoid questions. It contains more than 100,000 question-answer pairs about passages from 536 articles chosen from various domains of Wikipedia.
SQuAD-uk is derived from the SQuAD dataset and it is obtained through semi-automatic translation of the SQuAD dataset into Ukrainian. It represents a large-scale dataset for open question answering processes on factoid questions in Ukrainian. The dataset contains more than 30,000 question/answer pairs derived from the original English dataset. The dataset is training set to support the replicability of the benchmarking of QA systems:
squad-train-v1.1-uk-mini.json
: it contains MINI training examples derived from the original SQuAD 1.1 trainig material.squad-train-v1.1-uk.json
: it contains training examples derived from the original SQuAD 1.1 trainig material.squad-uk-1.1.zip
: it contains training examples from squad-train-v1.1-uk.json and split into train-v1.1-uk.json and dev-v1.1-uk.json.