Closes #25 | Create dataset loader for Typhoon Yolanda Tweets #56

IvanHalimP · 2023-11-14T14:48:24Z

Closes #25

Checkbox

Confirm that this PR is linked to the dataset issue.
Create the dataloader script seacrowd/sea_datasets/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _SEACROWD_VERSION variables.
Implement _info(), _split_generators() and _generate_examples() in dataloader script.
Make sure that the BUILDER_CONFIGS class attribute is a list with at least one SEACrowdConfig for the source schema and one for a seacrowd schema.
Confirm dataloader script works with datasets.load_dataset function.
Confirm that your dataloader script passes the test suite run with python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py.
If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.

jamesjaya

just some nits

seacrowd/sea_datasets/typhoon_yolanda_tweets/typhoon_yolanda_tweets.py

jamesjaya · 2023-11-18T14:36:42Z

seacrowd/sea_datasets/typhoon_yolanda_tweets/typhoon_yolanda_tweets.py

+# TODO: Name the dataset class to match the script name using CamelCase instead of snake_case
+class TyphoonYolandaTweets(datasets.GeneratorBasedBuilder):


remove TODO since it has been done

Suggested change

# TODO: Name the dataset class to match the script name using CamelCase instead of snake_case

class TyphoonYolandaTweets(datasets.GeneratorBasedBuilder):

class TyphoonYolandaTweets(datasets.GeneratorBasedBuilder):

removed in the new commit. thanks

jamesjaya · 2023-11-18T14:37:48Z

seacrowd/sea_datasets/typhoon_yolanda_tweets/typhoon_yolanda_tweets.py

+        emos = [-1, 0, 1]
+        # TODO: KEEP if your dataset is LOCAL; remove if NOT
+        if self.config.name == "typhoon_yolanda_tweets_source" or self.config.name == "typhoon_yolanda_tweets_seacrowd_text":
+            train_path = dl_manager.download_and_extract({emo: _URLS["train"][emo] for emo in emos})
+
+            test_path = dl_manager.download_and_extract({emo: _URLS["test"][emo] for emo in emos})


remove TODO, remove extra new line

Suggested change

emos = [-1, 0, 1]

# TODO: KEEP if your dataset is LOCAL; remove if NOT

if self.config.name == "typhoon_yolanda_tweets_source" or self.config.name == "typhoon_yolanda_tweets_seacrowd_text":

train_path = dl_manager.download_and_extract({emo: _URLS["train"][emo] for emo in emos})

test_path = dl_manager.download_and_extract({emo: _URLS["test"][emo] for emo in emos})

emos = [-1, 0, 1]

if self.config.name == "typhoon_yolanda_tweets_source" or self.config.name == "typhoon_yolanda_tweets_seacrowd_text":

train_path = dl_manager.download_and_extract({emo: _URLS["train"][emo] for emo in emos})

test_path = dl_manager.download_and_extract({emo: _URLS["test"][emo] for emo in emos})

jamesjaya · 2023-11-18T14:38:33Z

seacrowd/sea_datasets/typhoon_yolanda_tweets/typhoon_yolanda_tweets.py

+        for row in df.itertuples():
+            print(row)


remove print

Suggested change

for row in df.itertuples():

print(row)

for row in df.itertuples():

jamesjaya · 2023-11-18T14:39:48Z

seacrowd/sea_datasets/typhoon_yolanda_tweets/typhoon_yolanda_tweets.py

+from seacrowd.utils.configs import SEACrowdConfig
+from seacrowd.utils.constants import Licenses, Tasks
+
+_SUPPORTED_TASKS = [Tasks.NAMED_ENTITY_RECOGNITION, Tasks.DEPENDENCY_PARSING]


_SUPPORTED_TASKS is declared twice, please consolidate

deleted this one in the new commit

jamesjaya · 2023-11-18T14:49:35Z

seacrowd/sea_datasets/typhoon_yolanda_tweets/typhoon_yolanda_tweets.py

+            for emo, file in filepath.items():
+                with open(file) as f:
+                    t = f.readlines()
+                    l = [str(emo) for i in range(len(t))]


Suggested change

l = [str(emo) for i in range(len(t))]

l = [str(emo)] * len(t)

…weets.py Co-authored-by: James Jaya <2089265+jamesjaya@users.noreply.github.com>

Updated according to comments. Please tell me if there are something else that I miss.

removed "TODO" and extra newlines

jamesjaya

tested, lgtm

gentaiscool

lgtm

sabilmakbar · 2023-11-22T16:48:19Z

Shall we merge this @jamesjaya @gentaiscool since both of you have approved?

…SEACrowd#56) * Typhoon Yolanda Tweets dataloader * Create __init__.py * Update seacrowd/sea_datasets/typhoon_yolanda_tweets/typhoon_yolanda_tweets.py Co-authored-by: James Jaya <2089265+jamesjaya@users.noreply.github.com> * Update typhoon_yolanda_tweets.py Updated according to comments. Please tell me if there are something else that I miss. * Update typhoon_yolanda_tweets.py removed "TODO" and extra newlines --------- Co-authored-by: James Jaya <2089265+jamesjaya@users.noreply.github.com>

Typhoon Yolanda Tweets dataloader

abd5de4

IvanHalimP requested review from holylovenia, SamuelCahyawijaya, fajri91 and afaji as code owners November 14, 2023 14:48

Create __init__.py

a211e47

holylovenia requested review from gentaiscool and jamesjaya and removed request for SamuelCahyawijaya, afaji, fajri91 and holylovenia November 16, 2023 03:43

jamesjaya requested changes Nov 18, 2023

View reviewed changes

IvanHalimP and others added 3 commits November 19, 2023 16:33

Update seacrowd/sea_datasets/typhoon_yolanda_tweets/typhoon_yolanda_t…

640d46a

…weets.py Co-authored-by: James Jaya <2089265+jamesjaya@users.noreply.github.com>

Update typhoon_yolanda_tweets.py

62850e7

Updated according to comments. Please tell me if there are something else that I miss.

Update typhoon_yolanda_tweets.py

6f6fb23

removed "TODO" and extra newlines

jamesjaya approved these changes Nov 19, 2023

View reviewed changes

gentaiscool approved these changes Nov 22, 2023

View reviewed changes

sabilmakbar merged commit 4922146 into SEACrowd:master Nov 23, 2023
1 check passed

jamesjaya self-assigned this Jan 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closes #25 | Create dataset loader for Typhoon Yolanda Tweets #56

Closes #25 | Create dataset loader for Typhoon Yolanda Tweets #56

IvanHalimP commented Nov 14, 2023 •

edited

Loading

jamesjaya left a comment

jamesjaya Nov 18, 2023

IvanHalimP Nov 19, 2023

jamesjaya Nov 18, 2023

jamesjaya Nov 18, 2023

jamesjaya Nov 18, 2023

IvanHalimP Nov 19, 2023

jamesjaya Nov 18, 2023

jamesjaya left a comment

gentaiscool left a comment

sabilmakbar commented Nov 22, 2023

		# TODO: Name the dataset class to match the script name using CamelCase instead of snake_case
		class TyphoonYolandaTweets(datasets.GeneratorBasedBuilder):

	for row in df.itertuples():
	print(row)
	for row in df.itertuples():

	l = [str(emo) for i in range(len(t))]
	l = [str(emo)] * len(t)

Closes #25 | Create dataset loader for Typhoon Yolanda Tweets #56

Closes #25 | Create dataset loader for Typhoon Yolanda Tweets #56

Conversation

IvanHalimP commented Nov 14, 2023 • edited Loading

Checkbox

jamesjaya left a comment

Choose a reason for hiding this comment

jamesjaya Nov 18, 2023

Choose a reason for hiding this comment

IvanHalimP Nov 19, 2023

Choose a reason for hiding this comment

jamesjaya Nov 18, 2023

Choose a reason for hiding this comment

jamesjaya Nov 18, 2023

Choose a reason for hiding this comment

jamesjaya Nov 18, 2023

Choose a reason for hiding this comment

IvanHalimP Nov 19, 2023

Choose a reason for hiding this comment

jamesjaya Nov 18, 2023

Choose a reason for hiding this comment

jamesjaya left a comment

Choose a reason for hiding this comment

gentaiscool left a comment

Choose a reason for hiding this comment

sabilmakbar commented Nov 22, 2023

IvanHalimP commented Nov 14, 2023 •

edited

Loading