Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for UP2.0 #571

Closed
SamuelCahyawijaya opened this issue Apr 1, 2024 · 6 comments · Fixed by #660
Closed

Create dataset loader for UP2.0 #571

SamuelCahyawijaya opened this issue Apr 1, 2024 · 6 comments · Fixed by #660
Assignees
Labels
pr-ready A PR that closes this issue is Ready to be reviewed source-only

Comments

@SamuelCahyawijaya
Copy link
Collaborator

Dataloader name: up2/up2.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?up2

Dataset up2
Description Southeast Asian language subsets from Universal Propositions (UP) 2.0 dataset. Semantic role labeling (SRL) is a shallow semantic parsing task that identifies “who did what to whom when, where etc” for each predicate in a sentence. It provides an intermediate (shallow) level of a semantic representation that helps the map from syntactic parse structures to more fully-specified representations of meaning.
Subsets ind, vie
Languages ind, vie
Tasks Semantic Role Labeling
License Community Data License Agreement – Permissive, Version 1.0 (cdla-permissive-1.0)
Homepage https://github.com/UniversalPropositions/UP_Indonesian-GSD, https://github.com/UniversalPropositions/UP_Vietnamese-VTB
HF URL -
Paper URL https://aclanthology.org/2022.lrec-1.181.pdf
@fhudi
Copy link
Collaborator

fhudi commented Apr 8, 2024

#self-assign

@fhudi
Copy link
Collaborator

fhudi commented Apr 24, 2024

@SamuelCahyawijaya @holylovenia Hi, I have some doubts here:

  1. There is no Tasks.SEMANTIC_ROLE_LABELING, shall I make it as a sequential labeling schema (token-to-label)?
    Does anyone know where to get the whole set of roles?

  2. It is necessary to merge with UD to get the token, but it seems the original tools for merging requires additional library. Shall I use try-catch or is it better to re-implement the merging with UD, i.e. mapping ID to get only the sequence of token?

fhudi added a commit to fhudi/seacrowd-datahub that referenced this issue May 1, 2024
@fhudi
Copy link
Collaborator

fhudi commented May 1, 2024

For the time being, I implemented this as source-only.
Please advice when possible.
@SamuelCahyawijaya @holylovenia

@holylovenia holylovenia added the pr-ready A PR that closes this issue is Ready to be reviewed label May 2, 2024
@holylovenia
Copy link
Contributor

@SamuelCahyawijaya @holylovenia Hi, I have some doubts here:

  1. There is no Tasks.SEMANTIC_ROLE_LABELING, shall I make it as a sequential labeling schema (token-to-label)?
    Does anyone know where to get the whole set of roles?
  2. It is necessary to merge with UD to get the token, but it seems the original tools for merging requires additional library. Shall I use try-catch or is it better to re-implement the merging with UD, i.e. mapping ID to get only the sequence of token?

Hi @fhudi, sorry for the late response.

  1. We can do a source-only schema for this dataloader. What do you mean by "the whole set of roles"?
  2. Have you taken a look at common_parser.py and if it can help or not? If it's insufficient, we can use try-except to import the additional library.

@fhudi
Copy link
Collaborator

fhudi commented May 2, 2024

Thanks @holylovenia for the reply.

  1. We can do a source-only schema for this dataloader. What do you mean by "the whole set of roles"?
  2. Have you taken a look at common_parser.py and if it can help or not? If it's insufficient, we can use try-except to import the additional library.
  1. The semantic roles (column 11 of UP).

  2. Yeah, I wrote that parser. I do believe the common_parser.py suffices.
    The reason I highlighted this is due to not following the steps from original UP dataset reconstruction.

Tag: @ijindal
Could you please clarify these 2 points?

@ijindal
Copy link
Contributor

ijindal commented May 29, 2024

@fhudi
The whole set of roles in UP means all EN propbank rolesets. Be careful though all the roleset are consistent with V3.0 of EN propbank. The current labeling schema may not work with the current version of EN propbank which is v3.4.

  • Yes, But the steps mentioned are specific to UP2.0 datasets not for semantic role labeling in general.

sabilmakbar pushed a commit that referenced this issue May 31, 2024
* Create dataset loader for UP2.0 (#571)

* Update seacrowd/sea_datasets/up2/up2.py

Co-authored-by: Lj Miranda <12949683+ljvmiranda921@users.noreply.github.com>

* Update seacrowd/sea_datasets/up2/up2.py

Co-authored-by: Lj Miranda <12949683+ljvmiranda921@users.noreply.github.com>

* Update seacrowd/sea_datasets/up2/up2.py

Co-authored-by: Lj Miranda <12949683+ljvmiranda921@users.noreply.github.com>

* Update up2.py

* Update up2.py, reformat from makefile.

* Update common_parser.py for a safer IO process.

---------

Co-authored-by: Lj Miranda <12949683+ljvmiranda921@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-ready A PR that closes this issue is Ready to be reviewed source-only
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants