Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

French Adjectives Transformation #249

Conversation

Louanes1
Copy link

No description provided.

@Louanes1
Copy link
Author

Louanes1 commented Sep 2, 2021

The pytest succeed when I try to run it locally, it shouldn't fail in the build :
image

Copy link
Contributor

@gxywang gxywang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Louanes1, I'm one of the assigned reviewers for your transformation. Below is my general review, see comments for more specific feedback.

Correctness: Failed the build with an AssertionError. I saw the comment regarding passing the unit test locally. Maybe see if your local version is up-to-date? Otherwise, I'm also unsure why it failed.
Interface: Looks good.
Applicable Tasks & Keywords: Task types have been added, missing keywords.
Specificity: The task is specific to text classification, generation, and tagging.
Novelty: This transformation is not implemented in NL-Augmenter already.
Adding New Libraries: All added to requirements, only nltk's version has not been specified.
Description: Readme looks good.
Data and code source: None. Should add a sentence stating it was fully created by author (or cite relevant sources).
Paraphrasers and Data Augmenters: N/A
Test Cases: 3 examples added, 5 recommended.
Evaluating Robustness: Evaluation has not been provided.
Languages other than English: French.
Decent Programming Practise: No docstrings, consider adding some.

If anything needs clarification, please let me know. Otherwise, thanks for the contribution!

This perturbation would benefit all tasks which have a sentence/paragraph/document as input like text classification, text generation, etc. that requires synthetic data augmentation / diversification.

## What are the limitation of this transformation ?
This tool does not take the general context into account, sometimes, the ouput will not match the general sense of te sentence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

misspelling: *the

If possible, the results of the evaluate.py test should be added to the Readme as well.


This transformation change some words with synonyms according to if their POS tag is a ADJ for simple french sentences. It requires Spacy_lefff (an extention of spacy for french POS and lemmatizing) and nltk package with the open multilingual wordnet dictionary.

Authors : Lisa Barthe and Louanes Hamla from Fablab by Inetum in Paris
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emails should be added to the authors, please.


}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended to include at least 5 examples here.

from spacy_lefff import LefffLemmatizer, POSTagger
from spacy.language import Language
from nltk.corpus import wordnet
import nltk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nltk is imported twice.

TaskType.TEXT_TAGGING,
]
languages = ["fr"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relavent keywords should also be added.

fr-core-news-md @ https://github.com/explosion/spacy-models/releases/download/fr_core_news_md-3.0.0/fr_core_news_md-3.0.0-py3-none-any.whl
spacy-lefff==0.4.0
textblob_fr==0.2.0
nltk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should nltk's version be specified as well?

perturbed_texts = synonym_transformation(
sentence
)
print("perturbed text inside of class",perturbed_texts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to delete this print line.

@mille-s
Copy link
Contributor

mille-s commented Sep 21, 2021

Should there be just one transformation for Verbs #250, Nouns #247 and Adjectives, since the method seems to be the same? The transformation could be called "Lexical susbstitutions for French".

@kaustubhdhole
Copy link
Collaborator

Agree with @mille-s above. By default, you can return all (nouns,adjectives and adverbs) and still have it parametrized. Since we are tabulating the PRs, I am closing this. Please try to address all the suggested changes of the reviews in all the 3 PRs to #250

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants