This repository contains the implementation of 'Automatic Location of Disparities' (ALD) for conducting algorithmic audits.
# install.packages("remotes")
remotes::install_github("https://github.com/moritzvz/ald")
ALD is dependent on several other packages for handling data, modeling, and generating reports: partykit, assertthat, magrittr, tidyselect, tibble, dplyr, tidyr, readr, rmarkdown, flextable, stringr, ggplot2, ggparty, cowplot, scales, hms
The ALD audit:
- is performed on a dataset of your choice that must be provided as a .csv file
- requires notion of fairness to be set to 'statistical parity' or 'equalized odds'
- in case of 'statistical parity' you must set the outcome_variable argument to the name of the outcome variable in your dataset
- for 'equalized odds' you must set the prediction_variable and ground_truth_variable arguments to the names of the prediction and ground truth variables in your dataset
- by default all other variables (not outcome, prediction, ground truth) in you dataset will be used as sensitive attributes in the audit. You can use the sensitive_attributes argument to specifically set the sensitive attributes to a subset of your dataset varaiables
- requires a ranking mechanism which must be 'confidence' or 'magnitude'
- requires a maximum number of groups in the report (n_grp)
- requires a number of trees to model in partykit::cforest (ntree)
- requires a alpha argument passed to partykit::cforest (alpha)
- optionally takes a p-value adjustment method to pass to stats::p.adjust (adjust_method), either "BH" (Benjamini-Hochberg, by default) or "bonferroni" (Bonferroni correction).
- optionally takes a random seed number that can be used for reproducibility of results
- writes a report to the directory that you set with the dir argument, with data_name argument used in the name
# for example
ald_audit(
file = "my_data.csv",
prediction_variable = "prediction",
ground_truth_variable = "ground_truth",
notion_of_fairness = "equalized odds",
ranking_mechanism = "confidence",
data_name = "data_title",
dir = here::here(""),
n_grp = 3,
ntree = 25,
alpha = 0.1)
ald_audit(
file = "my_data.csv",
outcome_variable = "outcome",
notion_of_fairness = "statistical parity",
ranking_mechanism = "confidence",
data_name = "data_title",
dir = here::here(""),
n_grp = 3,
ntree = 25,
alpha = 0.1)
Please consider citing us if you find this helpful for your work:
@inproceedings{vonZahn.2023,
title={Locating disparities in machine learning},
author={von Zahn, Moritz and Hinz, Oliver and Feuerriegel, Stefan},
booktitle={2023 IEEE International Conference on Big Data (BigData)},
pages={1883--1894},
year={2023},
organization={IEEE}
}