Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add gender randomizer #229

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

tk-sugumar
Copy link

No description provided.

Author name: Tabitha Sugumar
Author email: __
Author Affiliation: __

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your changes @tk-sugumar . Please add your email and affiliation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!


## Examples of this transformation

Because this is a randomized transformation, in both the selection of gender and selection of name, test examples are impossible -- the output for a single sentence is expected to be different in each successive run. Instead I've provided some example sentences and outputs for reference.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can use a default seed in the argument in init of your GenderRandomizer transformation so you can generate consistent results for your test cases so you can include them in your test.json

Quite a few of the PRs use this approach for test cases.

See for example:
https://github.com/GEM-benchmark/NL-Augmenter/pull/164/files

Copy link
Author

@tk-sugumar tk-sugumar Sep 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Timothy! When I tried this, the same name was predicted for each sentence, so for use as intended I think the user would have to modify the code after downloading. Should I still go ahead and do this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Timothy, I added in the seed in the initializer, the name names does get predicted each time though, I hope it's ok! Test cases are also added in the test.json

Author Affiliation: Elsevier

## What type of a transformation is this?
This transformation changes names in English texts, randomizing selection so there's an even chance of male and female names. It modifies pronouns to match the selected name.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an acknowledgement that names are not deterministic identifiers of someones pronouns/gender :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

Randomizes names in text for a 50/50 gender breakdown. Handles pronouns.
"""
nlp = spacy.load("en_core_web_sm", disable=["lemmatizer"])
nlp.add_pipe("coreferee")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to use spacy like this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified as given in example

class GenderRandomizer(SentenceOperation):
tasks = [TaskType.TEXT_TO_TEXT_GENERATION]
languages = ["en"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add some keywords here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@mille-s mille-s self-requested a review September 30, 2021 10:58
## What tasks does it intend to benefit?
This is intended to avoid gender bias in natural language processing models. Run this transformation on text data prior to using it to train a model.

## Previous Work
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importantly please add a Data and Code Provenance section to your transformation. Also, seems you've added about a 109 files which are hard to evaluate. I would suggest moving this into a separate pip project out of this and then adding it to the requirements.txt.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've expanded on the data and code provenance, and put the description in a Data and Code Provenance section in the Readme.

On the 109 files -- most of them come from the coreferee directory -- this actually already exists as a library installable by pip, but when I was working on this was only installable in python 3.8 and the current version requires python 3.9. Since these transformations are required to be compatible with python 3.7, I downloaded here to make it installable in python 3.7.

@kaustubhdhole
Copy link
Collaborator

Hi @tk-sugumar, it won't be a good idea to merge all of these in the repository. It would be better to make a pip library out of it in a separate repository and call only the relevant parts here. @AbinayaM02 thoughts

@AbinayaM02
Copy link
Collaborator

Hi @tk-sugumar, it won't be a good idea to merge all of these in the repository. It would be better to make a pip library out of it in a separate repository and call only the relevant parts here. @AbinayaM02 thoughts

Agreed. Like @kaustubhdhole mentioned, you should be installing the library (specify it in the reuirements.txt) and use it for your transformation @tk-sugumar. You can check if the library works fine for python 3.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants