Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research methods of adding tags during training and using them during translation. #423

Open
davidbaines opened this issue Jun 19, 2024 · 1 comment
Labels
research Research topics

Comments

@davidbaines
Copy link
Collaborator

We would like to tag different genres of Scripture texts during training, and include tags with the source text during inferencing. The hope is that this will improve the drafts produced by the model.

We would need a flexible method of tagging verses and including the tags as tokens.
Here are a few ideas we could test. They are listed here to give an idea of the kinds of tagging support that might be useful.

Tag each verse with the book it is from.
Tag each verse with the name of the author.
Tag each verse with a genre.
Tag each verse with the book, author, and genre.
Tag verses with a language family or dialect.

Tags should be optional, training and inferencing should continue for untagged verses.

@davidbaines davidbaines added the research Research topics label Jun 19, 2024
@davidbaines
Copy link
Collaborator Author

When we tag verses as verses, names as names and key terms as key terms we can then request translation of verses as verses. This should help to avoid the reduction in scores that we've seen after adding names to the training data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
research Research topics
Projects
None yet
Development

No branches or pull requests

1 participant