The code is a python project to calculate bias metrics of deep learning systems. It consists of two parts:
- Calculating confusion matrix from raw prediction results of deep learning system
- Calculating bias metrics based on confusion matrix
metric_calculation/
|-- README.md
|-- calculate.sh
|-- calculate_ba.py
|-- calculate_di.py
|-- calculate_eo.py
|-- calculate_fp.py
|-- calculate_md.py
|-- calculate_sp.py
|-- confusion_matrix.py
`-- utils.py
-
calculate.sh
: A bash script file to run metric calculation for all modelscalculate_ba.py
: A python file to calculate Bias Amplification (BA)calculate_di.py
: A python file to calculate Disparate Impact (DI)calculate_eo.py
: A python file to calculate Equality of Opportunity (EO)calculate_fp.py
: A python file to calculate False Positive Subgroup Fairness (FPSF)calculate_md.py
: A python file to calculate Demographic Parity (DP)calculate_sp.py
: A python file to calculate Statistical Parity Subgroup Fairness (SPSF)
-
confusion_matrix.py
: A python file to create confusion matrix for each classification result -
utils.py
: A python file that contains utility functions used in metric calculation
-
Requirements
- python >= 3.7.4
- numpy >= 1.20.2
- pandas >= 0.25.1
-
Prediction result of each model should in a CSV format.
- They should be placed on
metric_calculation/prediction/
.
- They should be placed on
-
Raw CSV file containing classification result should be formatted as:
- Multi-class classification model
idx ground_truth prediction_result protected_label 0 3 2 0 1 4 4 1 2 3 2 0 - Multi-label classification model
idx ground_truth prediction_result protected_label 0 [0,1,0] [1,1,0] 0 1 [0,1,1] [1,0,1] 1 2 [1,1,0] [1,1,0] 0
Model_Set
: An example main directory that contains multiple classification modelModel_n_n
: An example directory that contains multiple trials of a single model
metric_calculation/prediction/
|-- Model_Set_1
| |-- Model_1_1
| | |-- try_00.csv
| | `-- try_01.csv
| `-- Model_1_2
| |-- Model_1_2_1
| | |-- try_00.csv
| | `-- try_01.csv
| `-- Model_1_2_2
| |-- try_00.csv
| `-- try_01.csv
`-- Model_Set_2
|-- Model_2_1
| |-- try_00.csv
| `-- try_01.csv
`-- Model_2_2
|-- try_00.csv
`-- try_01.csv
In order to calculate confusion matrix, run the code on metric_calculation/
:
python confusion_matrix.py
-
cm
directory contains confusion matrix in JSON format. -
count
directory contains summary of each prediction cases in JSON format. -
Only
cm
directories are used to calculate metrics.
metric_caluclation/
|-- Model_Set_1_output
| |-- cm
| | |-- Model_1_1
| | | |-- try_00.json
| | | `-- try_01.json
| | `-- Model_1_2
| | |-- Model_1_2_1
| | | |-- try_00.json
| | | `-- try_01.json
| | `-- Model_1_2_2
| | |-- try_00.json
| | `-- try_01.json
| `-- count
| | |-- Model_1_1
| | | |-- try_00.json
| | | `-- try_01.json
| | `-- Model_1_2
| | |-- Model_1_2_1
| | | |-- try_00.json
| | | `-- try_01.json
| | `-- Model_1_2_2
| | |-- try_00.json
| | `-- try_01.json
| |
`-- Model_Set_2_output
|-- cm
| |-- Model_2_1
| | |-- try_00.json
| | `-- try_01.json
| `-- Model_2_2
| |-- try_00.json
| `-- try_01.json
`-- count
|-- Model_2_1
| |-- try_00.json
| `-- try_01.json
`-- Model_2_2
|-- try_00.json
`-- try_01.json
-
1st level: Protected Group
-
2nd level: Ground Truth (Class)
-
3rd level: Confusion Matrix on Classification Result
{
"0": {
"0": {"TP": 3, "FP": 2, "TN": 5, "FN": 1},
"1": {"TP": 5, "FP": 1, "TN": 3, "FN": 2}
},
"1": {
"0": {"TP": 5, "FP": 1, "TN": 2, "FN": 1},
"1": {"TP": 2, "FP": 1, "TN": 5, "FN": 1}
}
}
Above JSON corresponds to below confusion matrix example:
Protected group = 0 | Ground Truth (Class) = 0 | Ground Truth (Class) = 1 |
---|---|---|
Prediction = 0 | 3 | 2 |
Prediction = 1 | 1 | 5 |
Protected group = 1 | Ground Truth (Class) = 0 | Ground Truth (Class) = 1 |
---|---|---|
Prediction = 0 | 5 | 1 |
Prediction = 1 | 2 | 3 |
-
Confusion matrix must be ready by running
confusion_matrix.py
. -
Run the script at
metric_calculation/
to calculate all metrics:
./calculate.sh
-
All calculated values are stored in
metric_calculation/Model_Set_output/result/
directory in each model set output directory. -
JSON files: Raw values of each metric calculation
-
CSV files: Aggregated values of each metric calculation
metric_calculation/
|-- Model_Set_1_output
| |-- cm
| |-- count
| `-- result
| |-- bias_amplification.csv
| |-- bias_amplification_raw.json
| |-- disparate_impact_factor.csv
| |-- disparate_impact_factor_raw.json
| |-- equality_of_odds_false_positive.csv
| |-- equality_of_odds_false_positive_raw.json
| |-- equality_of_odds_true_positive.csv
| |-- equality_of_odds_true_positive_raw.json
| |-- false_positive_subgroup_fairness.csv
| |-- false_positive_subgroup_fairness_raw.json
| |-- mean_difference_score.csv
| |-- mean_difference_score_raw.json
| |-- statistical_parity.csv
| `-- statistical_parity_raw.json
`-- Model_Set_2
|-- cm
|-- count
`-- result
|-- bias_amplification.csv
|-- bias_amplification_raw.json
|-- disparate_impact_factor.csv
|-- disparate_impact_factor_raw.json
|-- equality_of_odds_false_positive.csv
|-- equality_of_odds_false_positive_raw.json
|-- equality_of_odds_true_positive.csv
|-- equality_of_odds_true_positive_raw.json
|-- false_positive_subgroup_fairness.csv
|-- false_positive_subgroup_fairness_raw.json
|-- mean_difference_score.csv
|-- mean_difference_score_raw.json
|-- statistical_parity.csv
`-- statistical_parity_raw.json
-
Each metric is calculated for individual model sets.
Model_Set_1
is an example set of models for binary classification task:class 0
andclass 1
Model_Set_1
has two models:Model_1_1
andModel_1_2
- For each model, there are 16 trial results.
-
1st level: Model name
-
2nd level:
- A list of calculated metric value for each class (
class 0
,class 1
) - Last element: overall value among all classes
- A list of calculated metric value for each class (
{
"Model_1_1/": [
[0.34759806, 0.47476809, 0.41118307], % Trial 1
[0.34759806, 0.47476809, 0.41118307], % Trial 2
...
],
"Model_1_2/": [
[0.02062884, 0.02176104, 0.02119494], % Trial 1
[0.01833622, 0.02196790, 0.02012520], % Trial 2
...
]
}
-
First column is an aggregated path of a model, which is the path where raw prediction results are stored
-
For each classes, 6 values are calculated to aggregate 16 trials for each model.
max_diff
: maximum absolute difference valuemax
: maximum valuemin
: minimum valuestd_dev
: standard deviation valuemean
: mean valuerel_maxdiff(%)
: relative max difference percentage- All values are represented in 4 significant digits.
-
overall
row indicates overall value for all classes
max_diff | max | min | std_dev | mean | rel_maxdiff(%) | ||
---|---|---|---|---|---|---|---|
./Model_1/Model_1_1 | class 0 | 0.06010 | 0.1144 | 0.05432 | 0.01537 | 0.06919 | 86.86 |
./Model_1/Model_1_1 | class 1 | 0.03493 | 0.05119 | 0.01627 | 0.01135 | 0.02943 | 118.7 |
./Model_1/Model_1_1 | overall | 0.04126 | 0.07752 | 0.03626 | 0.01196 | 0.04931 | 83.68 |
./Model_1/Model_1_2 | class 0 | 0.1586 | 0.1636 | 0.005035 | 0.04217 | 0.06067 | 261.3 |
./Model_1/Model_1_2 | class 1 | 0.08165 | 0.1140 | 0.03236 | 0.02139 | 0.05808 | 140.6 |
./Model_1/Model_1_2 | overall | 0.1163 | 0.1388 | 0.02245 | 0.03109 | 0.05938 | 195.9 |