GenFair

A Genetic Fairness-Enhancing Data Generation Framework

Read the paper (and the additional materials)

Do you want to use our algorithm? It's pretty easy, follow the tutorial!

Background

Many datasets used to train machine learning models are biased towards a particular group defined by a sensitive attribute, such as Women (for Gender), Black people (for Race), or people below a certain Age threshold. This human bias is propagated in the trained model, and therefore in its decision. Various fairness-enhancing algorithms have been proposed, operating either on the training set, on the model, or on the model’s output.

The Preferential Sampling algorithm attempts to “balance” the training dataset, mitigating its bias. The algorithm identifies four groups of instances: Discriminated instances with a Positive label (e.g., Women, Hired), Privileged instances with a Negative label (e.g., Men, Not-Hired), Discriminated instances with a Negative label, and Privileged instances with a Positive label. DN and PP instances near a model’s decision boundary are removed, whereas DP and PN instances are duplicated. It has been proved that a model trained on the balanced dataset makes more fair predictons.

Our proposal

Preferential Sampling has four key limitations: it only works with one sensitive attribute, it only works with binary sensitive attribute (Race should be binarized), it only duplicates existing data, and the user must know beforehand the discriminated value.

GenFair addresses all these points. The Discrimination Test identifies which values of a (potentially non-binary) sensitive attributes are discriminated, without the user’s input. In the Instance Removal step, instances closer to a model’s decision boundary are then removed, as in Preferential Sampling. The Combination Test finds the best combination of target values and values from multiple sensitive attributes in order to balance the dataset (e.g., Black Woman, Hired). These combinations are called constraint. Finally, a genetic algorithm called GenSyn (Genetic Synthesiser) creates new instances following the constraints, resulting in a fair dataset to be used with machine learning models.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
genfair		genfair
.gitattributes		.gitattributes
GenFair Additional Materials.pdf		GenFair Additional Materials.pdf
GenFair Steps.png		GenFair Steps.png
GenFair Tutorial.ipynb		GenFair Tutorial.ipynb
README.md		README.md
genfair_adult_test.json		genfair_adult_test.json
genfair_adult_train.json		genfair_adult_train.json
genfair_compas_test.json		genfair_compas_test.json
genfair_compas_train.json		genfair_compas_train.json
genfair_german_test.json		genfair_german_test.json
genfair_german_train.json		genfair_german_train.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenFair

Background

Our proposal

About

Releases

Packages

Languages

FedericoMz/GenFair

Folders and files

Latest commit

History

Repository files navigation

GenFair

Background

Our proposal

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages