Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Font change #262

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Font change #262

wants to merge 13 commits into from

Conversation

pithysr
Copy link
Contributor

@pithysr pithysr commented Sep 1, 2021

Inspired by social media posts, this transformation add noise to an input sentence by randomly changing the font of words
in a sentence.

@marco-digio
Copy link
Contributor

Hi @pithysr
I was assigned as one of your reviewers.
Thanks for this contribution!
I really like this idea and I believe it will be really helpful.

I will leave some comments in the code.

Correctness:All checks have failed, please fix this (maybe fetch upstream is enough)
Interface: I think you have chosen the correct interface(s).
Applicable Tasks & Keywords: I think you have chosen the relevant task but why not TaskType.TEXT_TO_TEXT_GENERATION, TaskType.TEXT_TAGGING ? Please also insert keywords as explained here
Specificity: The change is very general.
Novelty: This transformation is not yet implemented in NL-Augmenter.
Adding New Libraries: It requires nltk==3.6.2 as explained in requirements.txt
Description: The README is clear and explains what the transformation is aiming to do.
Data and code source: Data and code source missing. Please add references in the README
Test Cases: 6 test cases have been added.
Evaluating Robustness: No robustness evaluation has been performed.
Languages other than English: I believe that the transformation works also for other languages even if the author specify just english.


class FontChange(SentenceOperation):
tasks = [TaskType.TEXT_CLASSIFICATION]
languages = ["en"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why just English? I believe that this transformation works well for many other languages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can work with other languages in its current form but the dictionary I used (reference in the code and in the README file) does not cover the accented letters or characters outside the English alphabet. If such a resource is added, we can use this for other languages without skipping those characters.

for ttc in tokens_to_change:
while True:
font = random.sample(list(fonts.keys()), 1)[0]
if font != "normal":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just removing "normal" from the dictionary in the JSON file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was by choice. I did not want any discrepancy between my dictionary and the original resource. Of course, it can be removed.

@pithysr
Copy link
Contributor Author

pithysr commented Sep 15, 2021

Hello @marco-digio
Thank you very much for your review.
I agree that we can add the other two tasks. Should I add them and commit again?

from tasks.TaskTypes import TaskType

nltkdl("stopwords")
nltkdl("punkt")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be moved inside the constructor I think..

nltkdl("maxent_ne_chunker")
nltkdl("punkt")
nltkdl("averaged_perceptron_tagger")
nltkdl("stopwords")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even this could go inside the constructor..


def __init__(self, seed=666, max_outputs=1):
super().__init__(seed, max_outputs=max_outputs)
self.nlp = spacy.load("en_core_web_sm")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of loading new spacy, please use global spacy like below:

self.nlp = spacy_nlp if spacy_nlp else spacy.load("en_core_web_sm")

return perturbed_texts


class HashtagifyTransformation(SentenceOperation):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring for all the classes and functions with a proper description of params and return type.

return perturbed_texts


class FontChange(SentenceOperation):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, add docstrings for classes and functions.

TaskType.TEXT_CLASSIFICATION,
TaskType.TEXT_TAGGING,
]
languages = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the list is very long, better to create a separate file for this and import it here.

@pithysr
Copy link
Contributor Author

pithysr commented Oct 4, 2021

The conflicts are with my other submission that is already approved. Should I open a new PR?

@AbinayaM02
Copy link
Collaborator

The conflicts are with my other submission that is already approved. Should I open a new PR?

Hi @pithysr: You can continue fixing the conflict in this same PR. Thanks!

## Target Tasks

This transformation can be used for data augmentation in text classification tasks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a Data And Code Provenance section to point out the correct source of all the files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaustubhdhole Hi, I had this information in the README file but I added a separate section. However, now I cannot commit due to an error (No module named 'torchtext'). I did not have this error previously when submitting the code. Can I submit with git commit --no-verify?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants