-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create initial xri_etl script #473
Labels
Comments
This was referenced Aug 3, 2024
ddaspit
added
enhancement
New feature or request
pipeline 2: extract
Issue related to extracting parallel corpora
labels
Aug 9, 2024
github-project-automation
bot
moved this from 🏗 In progress
to ✅ Done
in SIL-NLP Research
Aug 15, 2024
I accidentally closed this. A new one is here: #491 |
This was
linked to
pull requests
Aug 22, 2024
Another PR: #493 |
This was
linked to
pull requests
Sep 5, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Overview
Create the initial script defined by the parent issue: #472
The parent issue has most of the details around how the script will work, so I won't repeat that all here.
The scope for this issue is just a basic POC script that produces the
*.all.txt
and*.(train/val/test).txt
files. More complex things like data transformation will be left for later tickets.The text was updated successfully, but these errors were encountered: