This manuscript outlines a viable approach for training and evaluating machine learning (ML) systems for high-stakes, human-centered, or regulated applications using common Python programming tools. The accuracy and intrinsic interpretability of two types of constrained models, monotonic gradient boosting machines (MGBMs) and explainable neural networks (XNNs), a deep learning architecture well-suited for structured data, are assessed on simulated data and publicly available mortgage data. For maximum transparency and the potential generation of personalized adverse action notices, the constrained models are analyzed using post-hoc explanation techniques including plots of partial dependence (PD) and individual conditional expectation (ICE) and with global and local Shapley feature importance. The constrained model predictions are also tested for disparate impact (DI) and other types of discrimination using measures with long-standing legal precedents, adverse impact ratio (AIR), marginal effect (ME), and standardized mean difference (SMD), along with straightforward group fairness measures. By combining interpretable models, post-hoc explanations, and discrimination testing with accessible software tools, this text aims to provide a template workflow for important ML applications that require high accuracy and interpretability and that mitigate risks of discrimination.
See article-information-2019.pdf.
$ pip install virtualenv
$ cd notebooks
$ virtualenv -p python3.6 env
$ source env/bin/activate
$ pip install -r ../requirements.txt
$ ipython kernel install --user --name=information-article # Set up Jupyter kernel based on virtualenv
$ jupyter notebook
- Raw dataset: see hmda_lar_2018_orig_mtg_sample.csv
- For lending trainset before preprocessing, see hmda_train.csv
- For lending testset before preprocessing, see hmda_test.csv
- For lending trainset after preprocessing, see hmda_train_processed.csv
- For lending testset after preprocessing, see hmda_test_processed.csv
- For simulated trainset before preprocessing, see simu_train.csv
- For simulated testset before preprocessing, see simu_test.csv
- For simulated trainset after preprocessing, see train_simulated_processed.csv
- For simulated testset after preprocessing, see test_simulated_processed.csv
- For lending data preprocessing, see hmda_preprocessing.ipynb
- For simulated data preprocessing, see simulated_preprocessing.ipynb
- For GBM and MGBM model training on the lending dataset, see mgbm_hmda.ipynb
- For GBM and MGBM model training on the simulated dataset, see mgbm_simulated.ipynb
- For XNN model training on the lending dataset, see xnn_notebook_hmda.ipynb
- For XNN model training on the simulated dataset, see xnn_notebook_simulated_data.ipynb
- For ANN model training on the lending dataset, see hmda_ann.ipynb
- For ANN model training on the simulated dataset, see simulation_ann.ipynb
- For GBM and MGBM performance evaluation and interpretation on the lending dataset (Table 2, Figures 2 - 4), see perf_pdp_ice_shap_mgbm_hmda.ipynb
- For GBM and MGBM performance evaluation and interpretation on the simulated dataset (Table 1, Figures A2 - A4), see perf_pdp_ice_shap_mgbm_sim.ipynb
- For XNN performance evaluation and interpretation on the lending dataset (Table 2, Figures 5 - 7), see xnn_analysis_hmda_from_files.ipynb
- For XNN performance evaluation and interpretation on the simulated dataset (Table 1, Figure 1, Figures A5 and A6), see xnn_analysis_simulation_from_files.ipynb
- For ANN performance evaluation and interpretation on the lending dataset (Table 2), see ann_analysis_hmda_from_files.ipynb
- For ANN performance evaluation and interpretation on the simulated dataset (Table 1), see ann_analysis_simulation_from_files.ipynb
- For discrimination testing analysis, see disparity_measurement.py
- For discrimination testing results (Tables 3 and A1), see Disparity Tables for Paper.xlsx