Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #24 | Add dataset loader for UIT-ViSD4SA #72

Merged
merged 10 commits into from
Dec 19, 2023

Conversation

jensan-1
Copy link
Collaborator

@jensan-1 jensan-1 commented Nov 19, 2023

Please name your PR after the issue it closes. You can use the following line: "Closes #24 " where you replace the ISSUE-NUMBER with the one corresponding to your dataset.

Checkbox

  • Confirm that this PR is linked to the dataset issue.
  • Create the dataloader script seacrowd/sea_datasets/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
  • Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _SEACROWD_VERSION variables.
  • Implement _info(), _split_generators() and _generate_examples() in dataloader script.
  • Make sure that the BUILDER_CONFIGS class attribute is a list with at least one SEACrowdConfig for the source schema and one for a seacrowd schema.
  • Confirm dataloader script works with datasets.load_dataset function.
  • Confirm that your dataloader script passes the test suite run with python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py.
  • If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.

@jamesjaya jamesjaya linked an issue Nov 19, 2023 that may be closed by this pull request
@holylovenia holylovenia requested review from yongzx and removed request for SamuelCahyawijaya, afaji and fajri91 November 20, 2023 08:42
@jensan-1 jensan-1 marked this pull request as draft November 20, 2023 08:59
@jensan-1
Copy link
Collaborator Author

jensan-1 commented Nov 20, 2023

To reviewers:
I revised some of the implementations regarding the dataset loader for seacrowd.
This task is a span detection for aspect-based sentiment analysis.
Currently, the closest task is supposed to be Tasks.ASPECT_BASED_SENTIMENT_ANALYSIS.
However, since there are aspects with contradictory labels in the same data and to preserve the span, I moved the task to Tasks.COREFERENCE_RESOLUTION and contain the information in schema seacrowd_kb instead of the schema seacrowd_text_multi.

Please let me know of the possible adjustments or any suggestions regarding the implementation of the dataloader.

@jensan-1 jensan-1 marked this pull request as ready for review November 20, 2023 09:41
@holylovenia
Copy link
Contributor

Hi @jen-santoso, thanks for your contribution!! 🥳 Before the review, please let me throw a quick question to the other maintainers: I'll make a new task Tasks.SPAN_BASED_ABSA for this dataloader if you both agree @sabilmakbar @SamuelCahyawijaya.

@sabilmakbar
Copy link
Collaborator

Sounds perfect to me, @holylovenia! Thanks!

@SamuelCahyawijaya
Copy link
Collaborator

@holylovenia @jen-santoso : Does the task can simply be framed as sequence labelling using the IOB tag?

Copy link
Contributor

@holylovenia holylovenia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SamuelCahyawijaya, thanks for the brilliant suggestion. @jen-santoso and I discussed the possibility of using the schema seq_label for this Tasks.SPAN_BASED_ABSA and agreed with your suggestion.

I have changed the schema-task mapping accordingly in the master branch and @jen-santoso will change the kb schema to seq_label for this dataloader implementation.

@jensan-1
Copy link
Collaborator Author

@holylovenia @SamuelCahyawijaya done changing the schema to seq_label. Please check!

Copy link
Contributor

@holylovenia holylovenia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jen-santoso, thanks for the dataloader! It works well. I have a minor request though, could you please provide the _LANGUAGES constant?

@jensan-1
Copy link
Collaborator Author

@holylovenia updated

@holylovenia
Copy link
Contributor

@holylovenia updated

Ah sorry I wasn't clear before, array form please. 🙏

@jensan-1
Copy link
Collaborator Author

jensan-1 commented Dec 19, 2023

@holylovenia updated

Ah sorry I wasn't clear before, array form please. 🙏

oops, ok now in array form >.<

Copy link
Contributor

@holylovenia holylovenia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your help, @jen-santoso!! Looks good to me~ Let's wait for @SamuelCahyawijaya's review before we proceed to merging.

Copy link
Collaborator

@SamuelCahyawijaya SamuelCahyawijaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jen-santoso : thank you for updating the schema. LGTM!.

Based on our internal discussion, we decided to give +2 bonus points for this contribution.

@SamuelCahyawijaya SamuelCahyawijaya merged commit ff6a3bc into SEACrowd:master Dec 19, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create dataset loader for UIT-ViSD4SA
5 participants