-
Notifications
You must be signed in to change notification settings - Fork 937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to include unobserved confounders in the model? #616
Comments
@AlxndrMlk DoWhy does support unobserved confounders. You are adding them correctly, by not including those variables in the dataframe. The default behavior is that the identification algorithm considers the full graph and outputs estimands only in terms of the observed variables. Can you share an example where you are obtaining an incorrect output? |
Hi @amit-sharma, thank you for a quick reply. I created a self-contained example that demonstrates the behavior I described: from dowhy.causal_model import CausalModel
from sklearn.linear_model import LinearRegression
# Create the graph describing the causal structure
graph = """
graph [
directed 1
node [
id "X"
label "X"
]
node [
id "Z"
label "Z"
]
node [
id "Y"
label "Y"
]
node [
id "U"
label "U"
]
edge [
source "X"
target "Z"
]
edge [
source "Z"
target "Y"
]
edge [
source "U"
target "Y"
]
edge [
source "U"
target "X"
]
]
""".replace('\n', '')
N_SAMPLES = 10000
# Generate the data
U = np.random.randn(N_SAMPLES)
X = np.random.randn(N_SAMPLES) + 0.3*U
Z = 0.7*X + 0.3*np.random.randn(N_SAMPLES)
Y = 0.65*Z + 0.2*U
# Data to df
df = pd.DataFrame(np.vstack([X, Z, Y]).T, columns=['X', 'Z', 'Y'])
# Create a model
model = CausalModel(
data=df,
treatment='X',
outcome='Y',
graph=graph
)
# Get the estimand
estimand = model.identify_effect(proceed_when_unidentifiable=True)
print(estimand)
# Estimand type: nonparametric-ate
# ### Estimand : 1
# Estimand name: backdoor
# Estimand expression:
# d
# ────(E[Y])
# d[X]
# Estimand assumption 1, Unconfoundedness: If U→{X} and U→Y then P(Y|X,,U) = P(Y|X,)
# ### Estimand : 2
# Estimand name: iv
# No such variable found!
# ### Estimand : 3
# Estimand name: frontdoor
# Estimand expression:
# ⎡ d d ⎤
# E⎢────(Y)⋅────([Z])⎥
# ⎣d[Z] d[X] ⎦
# Estimand assumption 1, Full-mediation: Z intercepts (blocks) all directed paths from X to Y.
# Estimand assumption 2, First-stage-unconfoundedness: If U→{X} and U→{Z} then P(Z|X,U) = P(Z|X)
# Estimand assumption 3, Second-stage-unconfoundedness: If U→{Z} and U→Y then P(Y|Z, X, U) = P(Y|Z, X)
# Estimate the effect with front-door
estimate = model.estimate_effect(
identified_estimand=estimand,
method_name='frontdoor.linear_regression'
)
estimate.value
# Out[12] 0.511009176530488
# Compute expected output
# Model P(Z|X)
lr_zx = LinearRegression()
lr_zx.fit(
X=df['X'].values.reshape(-1, 1),
y=df['Z']
)
# Model P(Y|X, Z)P(X)
lr_yz = LinearRegression()
lr_yz.fit(
X=df[['Z', 'X']],
y=df['Y']
)
# Compute the expected causal effect
lr_zx.coef_ * lr_yz.coef_[0]
# Out[13] array([0.45161212])
# Sanity check -> compute naive estimate
lr_naive = LinearRegression()
lr_naive.fit(
X=df['X'].values.reshape(-1, 1),
y=df['Y']
)
lr_naive.coef_
# Out[14] array([0.51100918]) I might be missing something, but there are a couple of things that drew my attention:
Am I using the correct model for this graph ( What are your thoughts? Environment detailsWindows 11 |
Ah, you are using an older version of dowhy. Can you update to the latest version v0.8?
|
Thank you @amit-sharma, I updated to |
@amit-sharma points out that in dowhy version 0.8.0, the unobserved confounder - a variable included in the graph but not in the dataset - is properly considered when performing However, when running the same example proposed by @AlxndrMlk myself with the corrected method
Therefore, contrary to @amit-sharma's explanation, it seems there is no easy method to include the unobserved confounder in the model. This appears to be more true in the following message as well.
Doesn't the second and third unconfoundedness assumptions imply that the influence of the unobserved confounder, U, has been ignored? This result is also replicated in dowhy version 0.10.0 |
Also the output of the following statement is not the same as in @amit-sharma
Estimand : 1Estimand name: frontdoor
Realized estimand
EstimateMean value: 0.5004912411156469 |
One of the readers of "Causal Inference & Discovery in Python" reported that they had a similar issue with front-door in DoWhy 0.10.0: Hello Aleksander. But it worked for them correctly in 0.8: Good morning Aleksander. With version 0.8 it works properly. Have you tried to replicate the issue over a multiple runs (and datasets), @AnselmJeong? |
Thank you @AlxndrMlk and @AnselmJeong for resurfacing this issue. Unfortunately the error creeped up again in 0.10, while it works fine in v0.8. I have now added a fix through PR #1060 . I have also included @AlxndrMlk example as a test in the library so we never see this bug again in future versions of DoWhy. |
Hi, thank you for a great package!
I was wondering if there's a possibility to include unobserved confounders in
CausalModel
.Under certain conditions back-door and front-door criteria can provide us with correct causal estimands even in face of unobserved confounding (e.g. Pearl, Glymour & Jewell, 2016).
I tried including unobserved confounders by adding these variables to the graph but not including them in the data.
It seems that the model does not return correct estimates in such a case (although the model should theoretically be fully identified).
The text was updated successfully, but these errors were encountered: