Skip to content

Commit 609d666

Browse files
committed
Use HistGradientBoostingRegressor/Classifier
1 parent d46e025 commit 609d666

File tree

1 file changed

+7
-19
lines changed

1 file changed

+7
-19
lines changed

docs/source/example_notebooks/gcm_cps2015_dist_change_robust.ipynb

Lines changed: 7 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@
9191
"\n",
9292
"The multiply-robust causal change attribution method is based on a combination of *regression* and *re-weighting* approaches. In the regression approach, we learn the dependence between a node and its parents in one sample, and then use the data from the other sample to shift the distribution of that node. In the re-weighting approach, we average the data giving more weight to those observations that closely resemble the target distribution. \n",
9393
"\n",
94-
"By default, ```dowhy.gcm.distribution_change_robust``` uses linear and logistic regression to learn the regression function and the weights. Here, since our dataset is quite large, we will use the more flexible algorithms ```LGBMRegressor``` and ```LGBMClassifier``` instead.\n",
94+
"By default, ```dowhy.gcm.distribution_change_robust``` uses linear and logistic regression to learn the regression function and the weights. Here, since our dataset is quite large, we will use the more flexible algorithms ```HistGradientBoostingRegressor``` and ```HistGradientBoostingClassifier``` instead.\n",
9595
"\n",
9696
"We also use ```IsotonicRegression``` to calibrate the probabilities that make up the weights for the re-weighting approach on a leave-out calibration sample. This is optional, but it has been shown to improve the performance of the method in simulations.\n",
9797
"\n",
@@ -107,38 +107,26 @@
107107
},
108108
"outputs": [],
109109
"source": [
110-
"from lightgbm import LGBMClassifier, LGBMRegressor\n",
110+
"from sklearn.ensemble import HistGradientBoostingRegressor, HistGradientBoostingClassifier\n",
111111
"from sklearn.isotonic import IsotonicRegression\n",
112112
"from dowhy.gcm.ml.classification import SklearnClassificationModelWeighted\n",
113113
"from dowhy.gcm.ml.regression import SklearnRegressionModelWeighted\n",
114114
"from dowhy.gcm.util.general import auto_apply_encoders, auto_fit_encoders, shape_into_2d\n",
115115
"\n",
116116
"def make_custom_regressor():\n",
117-
" return SklearnRegressionModelWeighted(LGBMRegressor(random_state = 0, n_jobs = -1, verbose = -100))\n",
117+
" return SklearnRegressionModelWeighted(HistGradientBoostingRegressor(random_state = 0))\n",
118118
"\n",
119119
"def make_custom_classifier():\n",
120-
" return SklearnClassificationModelWeighted(LGBMClassifier(random_state = 0, n_jobs = -1, verbose = -100, is_unbalance = True))\n",
120+
" return SklearnClassificationModelWeighted(HistGradientBoostingClassifier(random_state = 0))\n",
121121
"\n",
122122
"def make_custom_calibrator():\n",
123123
" return SklearnRegressionModelWeighted(IsotonicRegression(out_of_bounds = 'clip'))\n",
124124
"\n",
125-
"dist_change_fun_kwargs = {\n",
126-
" 'regressor' : LGBMRegressor, \n",
127-
" 'regressor_kwargs' : {'random_state' : 0, 'n_jobs' : -1, 'verbose' : -100},\n",
128-
" 'classifier' : LGBMClassifier,\n",
129-
" 'classifier_kwargs' : {'random_state' : 0, 'n_jobs' : -1, 'is_unbalance' : True, 'verbose' : -100},\n",
130-
" 'calibrator' : IsotonicRegression,\n",
131-
" 'calibrator_kwargs' : {'out_of_bounds' : 'clip'},\n",
132-
" 'calib_size' : 0.2,\n",
133-
" 'sample_weight' : 'weight',\n",
134-
" 'xfit' : False\n",
135-
"}\n",
136-
"\n",
137125
"gcm.distribution_change_robust(causal_model, data_old, data_new, 'wage', sample_weight = 'weight',\n",
138-
" xfit = False, \n",
126+
" xfit = False, calib_size = 0.2,\n",
139127
" regressor = make_custom_regressor,\n",
140128
" classifier = make_custom_classifier,\n",
141-
" calibrator = make_custom_calibrator, calib_size = 0.2)"
129+
" calibrator = make_custom_calibrator)"
142130
]
143131
},
144132
{
@@ -366,7 +354,7 @@
366354
"\n",
367355
"First, notice that the Shapley values for $P(\\mathtt{educ})$, $P(\\mathtt{occup} \\mid \\mathtt{educ})$ and $P(\\mathtt{wage} \\mid \\mathtt{occup}, \\mathtt{educ})$ add up to the total effect.\n",
368356
"\n",
369-
"Second, the Shapley value for $P(\\mathtt{educ})$ is positive and statistically significant. One way to interpret this measure is that, if men and women differed only in their $P(\\mathtt{educ})$ (but their other causal mechanisms were the same), women would earn \\\\$1.12/hour more than men on average. Conversely, the Shapley value for $P(\\mathtt{educ} \\mid \\mathtt{educ})$ is negative, statistically significant and of slightly larger magnitude as the first Shapley value, hence cancelling out with the effect of differences in education. These effects measure two things:\n",
357+
"Second, the Shapley value for $P(\\mathtt{educ})$ is positive and statistically significant. One way to interpret this measure is that, if men and women differed only in their $P(\\mathtt{educ})$ (but their other causal mechanisms were the same), women would earn \\\\$1.13/hour more than men on average. Conversely, the Shapley value for $P(\\mathtt{educ} \\mid \\mathtt{educ})$ is negative, statistically significant and of larger magnitude as the first Shapley value, hence cancelling out with the effect of differences in education. These effects measure two things:\n",
370358
"1. How different is a causal mechanism between males and females?\n",
371359
"2. How important is a causal mechanism for the outcome?\n",
372360
"\n",

0 commit comments

Comments
 (0)