Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

French Adjectives Transformation #249

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions transformations/french_synonym_adjectives_transformation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Adjective Synonym Substitution 🦎 + ⌨️ → 🐍


This transformation change some words with synonyms according to if their POS tag is a ADJ for simple french sentences. It requires Spacy_lefff (an extention of spacy for french POS and lemmatizing) and nltk package with the open multilingual wordnet dictionary.

Authors : Lisa Barthe and Louanes Hamla from Fablab by Inetum in Paris
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emails should be added to the authors, please.


## What type of transformation it is ?
This transformation allows to create paraphrases with a different word in french. The general meaning of the sentence remains but it can be declined on different paraphrases with one adjective variation.

## Supported Task

This perturbation can be used for any French task.

## What does it intend to benefit ?

This perturbation would benefit all tasks which have a sentence/paragraph/document as input like text classification, text generation, etc. that requires synthetic data augmentation / diversification.

## What are the limitation of this transformation ?
This tool does not take the general context into account, sometimes, the ouput will not match the general sense of te sentence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

misspelling: *the

If possible, the results of the evaluate.py test should be added to the Readme as well.

Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from .transformation import *

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#fr core news md model :
fr-core-news-md @ https://github.com/explosion/spacy-models/releases/download/fr_core_news_md-3.0.0/fr_core_news_md-3.0.0-py3-none-any.whl
spacy-lefff==0.4.0
textblob_fr==0.2.0
nltk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should nltk's version be specified as well?

40 changes: 40 additions & 0 deletions transformations/french_synonym_adjectives_transformation/test.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
{
"type": "french_synonym_adjectives_transformation",
"test_cases": [


{
"class": "FrenchAdjectivesSynonymTransformation",
"inputs": {
"sentence": "Le sanglier a des dents pointues et un pelage sombre"
},
"outputs": [{
"sentence": "Le sanglier a des dents pointues et un pelage noir"

}]
},
{
"class": "FrenchAdjectivesSynonymTransformation",
"inputs": {
"sentence": "Ce fut un impressionnant festival de mode, les mannequins sont tous jolis."
},
"outputs": [{
"sentence": "Ce fut un imposant festival de mode, les mannequins sont tous jolis."
}]

},

{
"class": "FrenchAdjectivesSynonymTransformation",
"inputs": {
"sentence": "Nous sommes venu en grand nombre."
},
"outputs": [{
"sentence": "Nous sommes venu en important nombre."
}]

}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended to include at least 5 examples here.

]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
from textblob import TextBlob, Blobber, Word
import re
from textblob_fr import PatternTagger, PatternAnalyzer
import nltk
nltk.download('wordnet')
from textblob.wordnet import NOUN, VERB, ADV, ADJ
import spacy
from spacy_lefff import LefffLemmatizer, POSTagger
from spacy.language import Language
from nltk.corpus import wordnet
import nltk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nltk is imported twice.

nltk.download('omw')

from interfaces.SentenceOperation import SentenceOperation
from tasks.TaskTypes import TaskType

@Language.factory('french_lemmatizer')
def create_french_lemmatizer(nlp, name):
return LefffLemmatizer()

@Language.factory('POSTagger')
def create_POSTagger(nlp, name):
return POSTagger()


nlp = spacy.load('fr_core_news_md')

nlp.add_pipe('POSTagger', name ='pos')
nlp.add_pipe('french_lemmatizer', name='lefff', after='pos')



def synonym_transformation(text):
doc = nlp(text)
adjectives = [d.text for d in doc if d.pos_ == "ADJ"]

synonyms_adjective_list = []
for i in adjectives:
dict_adjective_synonyms = {}
dict_adjective_synonyms['adjective'] = i
dict_adjective_synonyms['synonyms'] = list(set([l.name() for syn in wordnet.synsets(i, lang = 'fra', pos = ADJ) for l in syn.lemmas('fra')]))
if len(dict_adjective_synonyms['synonyms']) > 0:
synonyms_adjective_list.append(dict_adjective_synonyms)

valid_adjective_list = []
for j in synonyms_adjective_list:
for k in j['synonyms']:
valid_adjective_dict = {}
valid_adjective_dict['adjective'] = j['adjective']
valid_adjective_dict['syn'] = k
if nlp(j['adjective']).similarity(nlp(k)) > .50 and not nlp(j['adjective']).similarity(nlp(k)) >= .999:
valid_adjective_list.append(valid_adjective_dict)
text_adjective_generated = []
for l in valid_adjective_list:
text_adjective_generated.append(text.replace(l['adjective'], l['syn']))
pertu=[]
text_adjective_generated.sort()

for i in text_adjective_generated :
if nlp(text).similarity(nlp(i)) > .90 and not nlp(text).similarity(nlp(i)) >= .999:
pertu.append(i)
break

return pertu


class FrenchAdjectivesSynonymTransformation(SentenceOperation):
tasks = [
TaskType.TEXT_CLASSIFICATION,
TaskType.TEXT_TO_TEXT_GENERATION,
TaskType.TEXT_TAGGING,
]
languages = ["fr"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relavent keywords should also be added.

def __init__(self, seed=0, max_outputs=1):
super().__init__(seed, max_outputs=max_outputs)

def generate(self, sentence : str):
perturbed_texts = synonym_transformation(
sentence
)
print("perturbed text inside of class",perturbed_texts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to delete this print line.

return perturbed_texts