Create dataset loader for Alorese Collection #448

SamuelCahyawijaya · 2024-02-18T12:08:31Z

Dataloader name: alorese/alorese.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?alorese

Dataset	alorese
Description	Alorese Collection or Alorese Corpus is a collection of language data in a couple of Alorese variation (Alor and Pantar Alorese). The collection is available in video, audio, and text formats with genres ranging from Experiment or task, Stimuli, Discourse, and Written materials.
Subsets	-
Languages	aol, ind
Tasks	Language Modeling, Automatic Speech Recognition, Machine Translation
License	Unknown (unknown)
Homepage	https://hdl.handle.net/1839/e10d7de5-0a6d-4926-967b-0a8cc6d21fb1
HF URL	-
Paper URL	https://scholarlypublications.universiteitleiden.nl/handle/1887/70891

The text was updated successfully, but these errors were encountered:

patrickamadeus · 2024-03-16T05:21:07Z

#self-assign

* feat: dataloader for text2text MT * nitpick: block sp2t to pass tc for t2t task * nitpick join * feat: support sptext, sptext_translated * feat: final alorese_source code * chore: scrape entire URLs * nitpick * nitpick: config builder naming * fix: nitpick naming a bit * nitpick PR: formatting, abs import, invalid schema handler * docs: add docstring scraping approach * fix: add URL scrape timestamp, revise code formatting, citation * nitpick year * nitpick review * fix: revise schema and remove subset * nitpick formatting * Update seacrowd/sea_datasets/alorese/alorese.py Co-authored-by: Salsabil Maulana Akbar <maulana.1998@yahoo.co.id> * Update alorese.py fix formatting on `yield` of `_generate_examples`

github-actions bot assigned patrickamadeus Mar 16, 2024

patrickamadeus mentioned this issue Mar 19, 2024

Closes #448 | Add/Update Dataloader alorese #541

Merged

8 tasks

sabilmakbar added the pr-ready A PR that closes this issue is Ready to be reviewed label Mar 20, 2024

sabilmakbar added the bonus +3 label Apr 12, 2024

sabilmakbar closed this as completed in #541 Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create dataset loader for Alorese Collection #448

Create dataset loader for Alorese Collection #448

SamuelCahyawijaya commented Feb 18, 2024

patrickamadeus commented Mar 16, 2024

Create dataset loader for Alorese Collection #448

Create dataset loader for Alorese Collection #448

Comments

SamuelCahyawijaya commented Feb 18, 2024

patrickamadeus commented Mar 16, 2024