Skip to content

Commit

Permalink
Add causal-predictive metrics dataset for others to analyze
Browse files Browse the repository at this point in the history
  • Loading branch information
bradyneal committed Mar 27, 2021
1 parent cdeb575 commit eec6005
Show file tree
Hide file tree
Showing 3 changed files with 1,589 additions and 0 deletions.
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,25 @@ If step 1 above fails, try the following instead: `conda env create -f environme
└── utils.py - various utility functions
```

## Do your own analysis on our causal-predictive metric dataset

We trained a total of 1568 different estimators.
We recorded all of the predictive metrics that sklearn provides (e.g. RMSE, MAE, precision, recall, etc.) and many different causal metrics that RealCause provides (e.g. ATE bias, ATE RMSE, PEHE, etc.).
Taking all of these metrics plus estimator specification (meta-estimator, outcome model, and propensity score model) yields a total of 77 columns.
Cells are "nan" where that cell doesn't make sense (e.g. the propensity score model cell for a standardization estimator, a regression metric for an IPW estimator, a classification metric for a standardization estimator, etc.).

We provide this dataset in [causal-predictive-analysis.csv](https://github.com/bradyneal/causal-benchmark/blob/master/causal-predictive-analysis.csv).
We did one analysis on this dataset in Section 6 of our paper (in [experiments/uai_analysis.py](https://github.com/bradyneal/causal-benchmark/blob/master/experiments/uai_analysis.py)).
However, there are many more possible analyses that can be run on it.
For example, one might want to fit machine learning models to predict causal metrics from predictive metrics and use something like [SHAP](https://github.com/slundberg/shap) to interpret the associations these models find.
To get started, simply load the dataset from [causal-predictive-analysis.csv](https://github.com/bradyneal/causal-benchmark/blob/master/causal-predictive-analysis.csv).
Example loading:

```
import pandas as pd
df = pd.read_csv('causal-predictive-analysis.csv')
```

## Loading RealCause pre-computed datasets

Expand Down
Loading

0 comments on commit eec6005

Please sign in to comment.