Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for LEXiTRON #614

Closed
SamuelCahyawijaya opened this issue Apr 8, 2024 · 1 comment · Fixed by #646
Closed

Create dataset loader for LEXiTRON #614

SamuelCahyawijaya opened this issue Apr 8, 2024 · 1 comment · Fixed by #646
Assignees
Labels
pr-ready A PR that closes this issue is Ready to be reviewed

Comments

@SamuelCahyawijaya
Copy link
Collaborator

Dataloader name: lexitron/lexitron.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?lexitron

Dataset lexitron
Description Corpus-based dictionary of Thai and English languages. This dataset contains frequently-used words from trusted publications such as novels, academic documents and newspaper. The dataset link contains Thai-English and English-Thai lexicons. Thai-English vocabulary consists of vocabulary, type of word (part of speech), translation, synonym (synonym) and sample sentences with a list of Thai-> English words, 53,000 words and English vocabulary list -> Thai, 83,000 words. See more details at http://lexitron.nectec.or.th.
Subsets version 2.0
Languages tha, eng
Tasks Word-level Translation, Machine Translation
License Custom NECTEC license
Homepage https://opend-portal.nectec.or.th/dataset/lexitron-2-0
HF URL -
Paper URL -
@muhammadravi251001
Copy link
Collaborator

#self-assign

@holylovenia holylovenia added pr-ready A PR that closes this issue is Ready to be reviewed and removed staled-issue labels May 2, 2024
ljvmiranda921 added a commit that referenced this issue May 20, 2024
* finishing lexitron dataloader

* update citation

Co-authored-by: Lj Miranda <12949683+ljvmiranda921@users.noreply.github.com>

* do formatter with make check_file

---------

Co-authored-by: Lj Miranda <12949683+ljvmiranda921@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-ready A PR that closes this issue is Ready to be reviewed
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants