Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for ProSub #683

Open
SamuelCahyawijaya opened this issue May 27, 2024 · 0 comments
Open

Create dataset loader for ProSub #683

SamuelCahyawijaya opened this issue May 27, 2024 · 0 comments

Comments

@SamuelCahyawijaya
Copy link
Collaborator

Dataloader name: prosub/prosub.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?prosub

Dataset prosub
Description ProSub is a collection of datasets and corpus annotations dealing with pronoun substitutes and related linguistic categories (personal pronouns, honorific titles, address terms). Pronoun substitutes are non-pronominal expressions (e.g. 'mother', 'aunt', 'teacher') used to refer to the speaker and the addressee, thus functioning like 1st and 2nd person personal pronouns. Pronoun substitutes are very common in languages in SEA, Japan and Korea, but extremely limited elsewhere. The Common subset is based on a common questionnaire. It provides information about whether a given concept (e.g. 'child') can be used as 1st person, 2nd person, title and address term. If the use exists, example sentences are also given. The Annotations subset contains annotation of 1st and 2nd person expressions, including both personal pronouns and pronoun substitutes, and address terms. The corpora used differ from language to language. However, the annotation scheme is the same across languages.
Subsets Common, Annotations
Languages zsm, ind, jav, tha, vie, mya
Tasks Word Sense Disambiguation, Word lists, Semantic Role Labeling, Machine Translation
License Creative Commons Attribution 4.0 (cc-by-4.0)
Homepage https://github.com/matbahasa/ProSub
HF URL -
Paper URL https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/P9-4.pdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

1 participant