-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert "word numbers" to their decimal representations #273
base: main
Are you sure you want to change the base?
Conversation
Hi, this PR is still a work-in-progress; please do not review (UPDATE: ready now) |
Is this still in a do not review stage? The deadline for me to submit reviews is the 18th I believe |
Thanks @rteehas , still a WIP, will update shortly! |
Tagging @kaustubhdhole to see if I can hold off on reviewing this until it is in its final state |
Thanks -- almost done! |
Hey @rteehas done! Apologies for the delay. Please see the updated code, PR title, and PR comments. Let me know if you have any questions! Perhaps surprisingly, this transformation was rather nontrivial. I hope it will add value as an augmentation! |
|
||
|
||
class WordsToNumbers(SentenceOperation): | ||
tasks = [TaskType.TEXT_CLASSIFICATION, TaskType.TEXT_TO_TEXT_GENERATION, TaskType.TEXT_TAGGING] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Task TEXT_TAGGING is not applicable here because of a change in the number of words (Ex: I have two hundred fifty books --> I have 250 books.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add tasks PARAPHRASE_DETECTION, TEXTUAL_ENTAILMENT
|
||
from text2nums import * | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a docstring to the class WordToNumbers.
|
||
## Previous Work | ||
|
||
Several webpages exist to do this (as the code is fairly simple) but have various errors: | ||
|
||
- https://www.browserling.com/tools/words-to-numbers cannot handle capital letters | ||
- https://www.dcode.fr/writing-words-numbers does not provide source code | ||
|
||
Our code is very loosely adapted from | ||
https://stackoverflow.com/questions/493174/is-there-a-way-to-convert-number-words-to-integers, though our implementation | ||
is more general and handles sentences where only part of the sentence refers to a number. | ||
|
||
## What are the limitations of this transformation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a robustness evaluation section here like PR #218.
Our code is very loosely adapted from | ||
https://stackoverflow.com/questions/493174/is-there-a-way-to-convert-number-words-to-integers, though our implementation | ||
is more general and handles sentences where only part of the sentence refers to a number. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you might want to add this to the README and mention a line or two how your transformation is different. https://github.com/GEM-benchmark/NL-Augmenter/blob/main/transformations/number-to-word/transformation.py
@motiwari would you like to address the above comments? |
Thanks for the ping -- I'll address the comments this week |
Hi @kaustubhdhole and @ashish3586 , thanks for the comments above -- I've addressed all your comments except adding the robustness evaluation. I just rebased this branch on
|
@kaustubhdhole @ashish3586 @rteehas @sebastianGehrmann can you take a look at this? |
This transformation converts "word numbers" to their decimal representations in sentences, e.g.:
There are three hundred twelve million, five hundred thirty four thousand, six hundred seventy two people in the United States and one in every two is female.
->There are 312,534,672 people in the United States and 1 in every 2 is female.
This is a rather nontrivial transformation (see code). It is something of a reverse transformation to PR#71 and PR#39