-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closes #615 | Add Dataloader IDK-MRC-NLI #631
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@muhammadravi251001 : The dataloader looks good. Perhaps you can update the _CITATION
since it is already May 1st?
Btw, I notice that the number of sample in the homepage is slightly different to the one in the CSV file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@muhammadravi251001 : Thank you for the update! LGTM!
Thanks for the approval, Sir! |
A friendly reminder for @luckysusanto to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code works well, but, I noticed that there are only two labels in the dataset:
0 and 2.
I checked the original homepage, and the owner did state that there are 3 labels:
Entailment (0), neutral (1), and contradiction (2).
However, the original dataset only contains two labels: either entailment or contradiction.
I think it would be better for us to turn "contradiction" into (1) [changed from (2)], and then put a comment/note on the file. I fear that currently, it might cause some confusion for users later on
cc: @holylovenia
It was done on purpose, Lucky. I've already made the explanation/clarification on this comment for the same task of my NLI dataset: #633 (comment) |
I see, in that case, approved! |
Alright, thanks for the approval, Lucky! |
Title: Add Dataloader IDK-MRC-NLI
First line PR Message: Closes #615
Notes
_CITATION
field, I will add it later.Checkbox
seacrowd/sea_datasets/{my_dataset}/{my_dataset}.py
(please use only lowercase and underscore for dataset folder naming, as mentioned in dataset issue) and its__init__.py
within{my_dataset}
folder._DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_LOCAL
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_SEACROWD_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneSEACrowdConfig
for the source schema and one for a seacrowd schema.datasets.load_dataset
function.python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py
orpython -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py --subset_id {subset_name_without_source_or_seacrowd_suffix}
.