Train | Valid | Test | External | |
---|---|---|---|---|
en | 24256 | 5198 | 5198 | 166804 |
pt | 24256 | 5198 | 5198 | 166804 |
The dataset questions and answers span a period from January 2013 to December 2019.
We additionally translated to Portuguese and used external data from here, which is a binary classification dataset "a QNLI medical-like". We adapted to value 5 or 0.
from datasets import load_dataset
data = load_dataset("ju-resplande/askD", split="train_pt")
# ['train_en', 'validation_en', 'test_en', 'external_en', 'train_pt', 'validation_pt', 'test_pt', 'external_pt']
@misc{Gomes20202,
author = {GOMES, J. R. S.},
title = {AskDocs: A medical QA dataset},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ju-resplande/askD}},
commit = {42060c4402c460e174cbb75a868b429c554ba2b7}
}
@viniciusplo and @ruanchaves for giving the idea. 😃