Character duplication #184

marco-digio · 2021-07-30T07:33:18Z

No description provided.

kaustubhdhole · 2021-08-02T02:35:00Z

Hi @marco-digio thank you very much for your changes. I would suggest combining character_duplication, underscore_trick into one. Also, please mention in the README about other previous PRs (created as well as merged) which are similar to your PR.

marco-digio · 2021-08-04T08:36:05Z

Hi @marco-digio thank you very much for your changes. I would suggest combining character_duplication, underscore_trick into one. Also, please mention in the README about other previous PRs (created as well as merged) which are similar to your PR.

Hi @kaustubhdhole thank you. I am sorry but I messed up a bit with git. I accidentally included an old commit in the new branch. Now it is fixed by removing the underscore_trick to the character_duplication pull request, since I already did a separate underscore_trick pull request.
However, if you believe that they should be merged into a single transformation, I will fix it, but I think that they should remain separate since they affect the texts in quite different ways.

uyaseen

Hi @marco-digio, I am a reviewer assigned to this PR. Overall, everything looks good, you just need to include keywords for the transformation as explained here. I will provide the final feedback after you have added the keywords.

Few minor comments:

There is a typo in the last line of README.md ("perfornamce")
If this transformation was proposed in any existing work then please include the relevant citation.

marco-digio · 2021-09-09T14:56:49Z

Hi @marco-digio, I am a reviewer assigned to this PR. Overall, everything looks good, you just need to include keywords for the transformation as explained here. I will provide the final feedback after you have added the keywords.

Few minor comments:

There is a typo in the last line of README.md ("perfornamce")

If this transformation was proposed in any existing work then please include the relevant citation.

Thank you @uyaseen for the feedback. I have inserted the keywords now in 22a16e7 and I have fixed the README typo in 97866a7 .
To the best of my knowledge there is no existing work proposing this transformation. However, if I have missed something, I will surely include it as relevant citation.

uyaseen

@marco-digio thanks for making the changes.

Here's my general review:

Clarity: The README clearly explains the transformation
Correctness: All checks have passed
Interface: The interface seems correct

Adding New Libraries: No new libraries were added
Test Cases: 5 test cases added
Evaluating Robustness: Robustness evaluation is not yet conducted

asnota · 2021-09-28T15:58:59Z

transformations/character_duplication/test.json

+        "sentence": "Andrew finally returned the French book to Chris that I bought last week"
+      },
+      "outputs": [{
+        "sentence": "Anndrew ffinnallly returrned thee  French book too Chhris that I bought last  week"


Triple duplication in the same word doesn't seem like a typical situation. I would suggest adding some rules to limit the generation of such unlikely human input.

Here triple duplication happens just because one of the two ‘l’ chars in the word “finally” was duplicated, obtaining the same letter 3 times in total.
I am not sure how likely is this in real data with respect to duplication of characters that appears once in the word.
However I believe that trained models should be able to process words like “ffinallly” in the similar way as “finally”, since humans can easily understand the meaning of the word with this kind of typo.

asnota · 2021-09-28T16:01:35Z

transformations/character_duplication/test.json

+        "sentence": "Alice in Wonderland is a 2010 American live-action/animated dark fantasy adventure film"
+      },
+      "outputs": [{
+        "sentence": "Allice inn WWondderland  is a 200110 American livve-aaction/animated dark fanntasy adventure film"


The same is for the double letter in the beginning or a 6-figure number, which should represent a year. Please consider adding some rules to change that behaviour.

For the same reason as before, I disagree about the double letter in the beginning, but I agree with you about not duplicating digits. I have added a rule to exclude digits from duplication in eb09bbc. Thank you for the suggestion

asnota · 2021-09-28T16:02:23Z

transformations/character_duplication/transformation.py

+from interfaces.SentenceOperation import SentenceOperation
+from tasks.TaskTypes import TaskType
+
+


Please consider adding doc strings, comments and error handling logic.

I added a brief doc string in eb09bbc. I believe that the code is simple enough to understand everything without the need of more comments.

Please add the description of your arguments, using the doc string convetion:
`def complex(real=0.0, imag=0.0):
"""Form a complex number.

Keyword arguments: real -- the real part (default 0.0) imag -- the imaginary part (default 0.0) """ if imag == 0.0 and real == 0.0: return complex_zero ...`

as stated in the official doc string convention for Python: https://www.python.org/dev/peps/pep-0257/

Don't forget about error handling logic - what happens if the user enters the illegal value for some of the parameters? Will he receive a human-readable message, pointing out what he/she did wrong or a generic Python error log, when the wrong parameter will break the code?

asnota · 2021-09-28T16:03:30Z

transformations/character_duplication/transformation.py

+    tasks = [
+        TaskType.TEXT_CLASSIFICATION,
+        TaskType.TEXT_TO_TEXT_GENERATION,
+        TaskType.TEXT_TAGGING,


How the TaskType.TEXT_TAGGING is relevant to this PR?

Thank you for spotting this, you are completely right and I have removed it in eb09bbc

asnota · 2021-09-28T16:09:01Z

Thank you for the contribution. Could you please explain the value or your PR compared to the [transformatio]https://github.com/GEM-benchmark/NL-Augmenter/tree/main/transformations/butter_fingers_perturbation), which addresses the same typo issue?

marco-digio · 2021-09-28T16:16:35Z

Thank you for the contribution. Could you please explain the value or your PR compared to the [transformatio]https://github.com/GEM-benchmark/NL-Augmenter/tree/main/transformations/butter_fingers_perturbation), which addresses the same typo issue?

The transformation is similar, because it adds noise similar to typos. However, Butter Fingers Perturbation swap two characters, while this PR (character duplication) duplicate a character.
Example:
Original sentence: "benchmark"
Butter Fingers Perturbation (possible) output: "benchnark" ("m" and "n" are close in the english keyboard)
Character Duplication (possible) output: "benchmmark"

I hope that this clarifies your doubts @asnota

asnota · 2021-09-29T13:18:33Z

transformations/character_duplication/transformation.py

+from interfaces.SentenceOperation import SentenceOperation
+from tasks.TaskTypes import TaskType
+
+


Please add the description of your arguments, using the doc string convetion:
`def complex(real=0.0, imag=0.0):
"""Form a complex number.

Keyword arguments: real -- the real part (default 0.0) imag -- the imaginary part (default 0.0) """ if imag == 0.0 and real == 0.0: return complex_zero ...`

as stated in the official doc string convention for Python: https://www.python.org/dev/peps/pep-0257/

asnota · 2021-09-29T13:23:07Z

transformations/character_duplication/transformation.py

+from interfaces.SentenceOperation import SentenceOperation
+from tasks.TaskTypes import TaskType
+
+


Don't forget about error handling logic - what happens if the user enters the illegal value for some of the parameters? Will he receive a human-readable message, pointing out what he/she did wrong or a generic Python error log, when the wrong parameter will break the code?

Marco Di Giovanni added 2 commits July 29, 2021 16:51

Added underscore_trick

03b31e2

Added character_duplication

4cdd971

Removed wrong files (different branch)

92da719

KennethEnevoldsen mentioned this pull request Aug 6, 2021

List of potentially new augmenters KennethEnevoldsen/augmenty#24

Closed

37 tasks

uyaseen suggested changes Sep 9, 2021

View reviewed changes

marco-digio added 2 commits September 9, 2021 16:49

Fix typo

97866a7

Add keywords

22a16e7

uyaseen approved these changes Sep 9, 2021

View reviewed changes

Merge branch 'main' into character_duplication

00542c0

asnota reviewed Sep 28, 2021

View reviewed changes

Fix tasks, add doc string and remove digits from being duplicated

eb09bbc

asnota suggested changes Sep 29, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Character duplication #184

Character duplication #184

marco-digio commented Jul 30, 2021

kaustubhdhole commented Aug 2, 2021

marco-digio commented Aug 4, 2021

uyaseen left a comment

marco-digio commented Sep 9, 2021

uyaseen left a comment

asnota Sep 28, 2021

marco-digio Sep 29, 2021

asnota Sep 28, 2021

marco-digio Sep 29, 2021

asnota Sep 28, 2021

marco-digio Sep 29, 2021

asnota Sep 29, 2021

asnota Sep 29, 2021

asnota Sep 28, 2021

marco-digio Sep 29, 2021

asnota commented Sep 28, 2021

marco-digio commented Sep 28, 2021

asnota Sep 29, 2021

asnota Sep 29, 2021

		from interfaces.SentenceOperation import SentenceOperation
		from tasks.TaskTypes import TaskType

Character duplication #184

Are you sure you want to change the base?

Character duplication #184

Conversation

marco-digio commented Jul 30, 2021

kaustubhdhole commented Aug 2, 2021

marco-digio commented Aug 4, 2021

uyaseen left a comment

Choose a reason for hiding this comment

marco-digio commented Sep 9, 2021

uyaseen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asnota commented Sep 28, 2021

marco-digio commented Sep 28, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment