Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert "word numbers" to their decimal representations #273

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

motiwari
Copy link

@motiwari motiwari commented Sep 1, 2021

This transformation converts "word numbers" to their decimal representations in sentences, e.g.:

There are three hundred twelve million, five hundred thirty four thousand, six hundred seventy two people in the United States and one in every two is female. -> There are 312,534,672 people in the United States and 1 in every 2 is female.

This is a rather nontrivial transformation (see code). It is something of a reverse transformation to PR#71 and PR#39

@motiwari
Copy link
Author

motiwari commented Sep 7, 2021

Hi, this PR is still a work-in-progress; please do not review (UPDATE: ready now)

@motiwari motiwari changed the title Adding DateFormat augmentation [WIP - Do Not Review] Adding DateFormat augmentation Sep 7, 2021
@rteehas
Copy link
Contributor

rteehas commented Sep 16, 2021

Hi, this PR is still a work-in-progress; please do not review

Is this still in a do not review stage? The deadline for me to submit reviews is the 18th I believe

@motiwari
Copy link
Author

Thanks @rteehas , still a WIP, will update shortly!

@rteehas
Copy link
Contributor

rteehas commented Sep 18, 2021

Tagging @kaustubhdhole to see if I can hold off on reviewing this until it is in its final state

@motiwari
Copy link
Author

Tagging @kaustubhdhole to see if I can hold off on reviewing this until it is in its final state

Thanks -- almost done!

@motiwari motiwari changed the title [WIP - Do Not Review] Adding DateFormat augmentation Convert "word numbers" to their decimal representiations Sep 18, 2021
@motiwari
Copy link
Author

Hey @rteehas done! Apologies for the delay.

Please see the updated code, PR title, and PR comments. Let me know if you have any questions!

Perhaps surprisingly, this transformation was rather nontrivial. I hope it will add value as an augmentation!



class WordsToNumbers(SentenceOperation):
tasks = [TaskType.TEXT_CLASSIFICATION, TaskType.TEXT_TO_TEXT_GENERATION, TaskType.TEXT_TAGGING]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task TEXT_TAGGING is not applicable here because of a change in the number of words (Ex: I have two hundred fifty books --> I have 250 books.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add tasks PARAPHRASE_DETECTION, TEXTUAL_ENTAILMENT


from text2nums import *


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring to the class WordToNumbers.

Comment on lines 19 to 31

## Previous Work

Several webpages exist to do this (as the code is fairly simple) but have various errors:

- https://www.browserling.com/tools/words-to-numbers cannot handle capital letters
- https://www.dcode.fr/writing-words-numbers does not provide source code

Our code is very loosely adapted from
https://stackoverflow.com/questions/493174/is-there-a-way-to-convert-number-words-to-integers, though our implementation
is more general and handles sentences where only part of the sentence refers to a number.

## What are the limitations of this transformation?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a robustness evaluation section here like PR #218.

Our code is very loosely adapted from
https://stackoverflow.com/questions/493174/is-there-a-way-to-convert-number-words-to-integers, though our implementation
is more general and handles sentences where only part of the sentence refers to a number.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you might want to add this to the README and mention a line or two how your transformation is different. https://github.com/GEM-benchmark/NL-Augmenter/blob/main/transformations/number-to-word/transformation.py

@kaustubhdhole
Copy link
Collaborator

@motiwari would you like to address the above comments?

@motiwari
Copy link
Author

Thanks for the ping -- I'll address the comments this week

@motiwari motiwari changed the title Convert "word numbers" to their decimal representiations Convert "word numbers" to their decimal representations Nov 2, 2021
@motiwari
Copy link
Author

motiwari commented Nov 2, 2021

Hi @kaustubhdhole and @ashish3586 , thanks for the comments above -- I've addressed all your comments except adding the robustness evaluation.

I just rebased this branch on main and am now getting the following error that don't look related to my code; do you know how to fix? It's for the robustness evaluation:

(base) ➜  NL-Augmenter git:(dateformat) ✗ python evaluate.py -t WordsToNumbers -task "TEXT_CLASSIFICATION" -m "textattack/roberta-base-imdb" -d "imdb" -p 20
Traceback (most recent call last):
  File "evaluate.py", line 3, in <module>
    from evaluation.evaluation_engine import evaluate
  File "/Users/motiwari/Desktop/NL-Augmenter/evaluation/evaluation_engine.py", line 1, in <module>
    from evaluation import (
  File "/Users/motiwari/Desktop/NL-Augmenter/evaluation/evaluate_ner_tagging.py", line 2, in <module>
    from datasets import load_dataset
ModuleNotFoundError: No module named 'datasets'

@motiwari
Copy link
Author

@kaustubhdhole @ashish3586 @rteehas @sebastianGehrmann can you take a look at this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants