Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for AC-IQuAD #612

Closed
SamuelCahyawijaya opened this issue Apr 8, 2024 · 4 comments · Fixed by #641
Closed

Create dataset loader for AC-IQuAD #612

SamuelCahyawijaya opened this issue Apr 8, 2024 · 4 comments · Fixed by #641
Assignees
Labels
pr-ready A PR that closes this issue is Ready to be reviewed

Comments

@SamuelCahyawijaya
Copy link
Collaborator

SamuelCahyawijaya commented Apr 8, 2024

Dataloader name: ac_iquad/ac_iquad.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?ac_iquad

Dataset ac_iquad
Description This is an automatically-produced question answering dataset generated from Indonesian Wikipedia articles. Each entry in the dataset consists of a context paragraph, the question and answer, and the question's equivalent SPARQL query. Questions are separated into two subsets: simple (question consists of a single SPARQL triple pattern) and complex (question consists of two triples plus an optional typing triple).
Subsets simple, complex
Languages ind
Tasks Question Answering
License Creative Commons Attribution 4.0 (cc-by-4.0)
Homepage https://www.kaggle.com/datasets/realdeo/indonesian-qa-generated-by-kg
HF URL -
Paper URL https://link.springer.com/article/10.1007/s10579-023-09702-y
@muhammadravi251001
Copy link
Collaborator

#self-assign

@holylovenia holylovenia added pr-ready A PR that closes this issue is Ready to be reviewed and removed staled-issue labels May 2, 2024
@sabilmakbar
Copy link
Collaborator

I don't think this dataset is a CC-licensed dataset. The Kaggle URL indicates an unknown license, and the section on the paper that indicates CC [section Rights and permissions] refers to the article's license, not the dataset's license.

cc @holylovenia @muhammadravi251001

@muhammadravi251001
Copy link
Collaborator

I don't think this dataset is a CC-licensed dataset. The Kaggle URL indicates an unknown license, and the section on the paper that indicates CC [section Rights and permissions] refers to the article's license, not the dataset's license.

cc @holylovenia @muhammadravi251001

I would ask the creator of the dataset/paper. Since he was my Senior & TA in my days as a college student back then.

@muhammadravi251001
Copy link
Collaborator

I don't think this dataset is a CC-licensed dataset. The Kaggle URL indicates an unknown license, and the section on the paper that indicates CC [section Rights and permissions] refers to the article's license, not the dataset's license.
cc @holylovenia @muhammadravi251001

I would ask the creator of the dataset/paper. Since he was my Senior & TA in my days as a college student back then.

The creator says the license is Creative Commons Attribution 4.0 (cc-by-4.0). This is my screen-capture bubble WhatsApp chat with him (excluding all of the chat before and after for privacy matters).
image

CC. @holylovenia @sabilmakbar

muhammadravi251001 added a commit that referenced this issue May 26, 2024
* finishing ac_iquad dataloader

* change dataset retrieval to public from local

* change subset name: single to simple

* add meta for seacrowd schema

* add type feature to complex schema

* fix bug on source schema

* cleaning code for moving tipe key to type key

* fix default config name to _simple

* fix default config name to _simple (2)

* change license to CC_BY_4_0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-ready A PR that closes this issue is Ready to be reviewed
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants