Spatial confounding poses a significant challenge in scientific studies involving spatial data, where unobserved spatial variables can influence both treatment and outcome, possibly leading to spurious associations. To address this problem, SpaCE provides realistic benchmark datasets and tools for systematically evaluating causal inference methods designed to alleviate spatial confounding. Each dataset includes training data, true counterfactuals, a spatial graph with coordinates, and a smoothness and confounding scores characterizing the effect of a missing spatial confounder. The datasets cover real treatment and covariates from diverse domains, including climate, health and social sciences. Realistic semi-synthetic outcomes and counterfactuals are generated using state-of-the-art machine learning ensembles, following best practices for causal inference benchmarks. SpaCE facilitates an automated end-to-end machine learning pipeline, simplifying data loading, experimental setup, and model evaluation.
Install the PyPI version:
pip install "spacebench[all]"
The option [all]
installs all dependencies necessary for the spatial confounding algorithms and the examples. If you only want to use the SpaceDatasets
, use pip install spacebench
instead.
You can also install the latest 🔥 features from the development version:
pip install "git+https://github.com/NSAPH-Projects/space@dev#egg=spacebench[all]"
Python 3.10 or higher is required. See the docs and requirements.txt
for more information.
To obtain a benchmark dataset for spatial confounding you need to 1) create a SpaceEnv
which contains real treatment and confounder data, and a realistic semi-synthetic outcome, 2) create a SpaceDataset
which masks a spatially-varying confounder and facilitates the data loading pipeline for causal inference.
from spacebench import SpaceEnv
env = SpaceEnv('healthd_dmgrcs_mortality_disc')
dataset = env.make()
print(dataset)
SpaceDataset with a missing spatial confounder:
treatment: (3109,) (binary)
confounders: (3109, 30)
outcome: (3109,)
counterfactuals: (3109, 2)
confounding score of missing: 0.02
spatial smoothness score of missing: 0.11
graph edge list: (9237, 2)
graph node coordinates: (3109, 2)
parent SpaceEnv: healthd_dmgrcs_mortality_disc
WARNING ⚠️ : this dataset contains a (realistic) synthetic outcome!
By using it, you agree to understand its limitations. The variable
names have been masked to emphasize that no inferences can be made
about the source data.
The list of available environments can be in the documentations or in an interactive session as:
from spacebench import DataMaster
dm = DataMaster()
dm.master.head()
environments | treatment_type | collection |
---|---|---|
healthd_dmgrcs_mortality_disc | binary | Air Pollution and Mortality |
cdcsvi_limteng_hburdic_cont | continuous | Social Vulnerability and Welfare |
climate_relhum_wfsmoke_cont | continuous | Heat Exposure and Wildfires |
climate_wfsmoke_minrty_disc | binary | Heat Exposure and Wildfires |
healthd_hhinco_mortality_cont | continuous | Air Pollution and Mortality |
healthd_pollutn_mortality_cont | continuous | Air Pollution and Mortality |
county_educatn_election_cont | continuous | Welfare and Elections |
county_phyactiv_lifexpcy_cont | continuous | Welfare and Elections |
county_dmgrcs_election_disc | binary | Welfare and Elections |
cdcsvi_nohsdp_poverty_cont | continuous | Social Vulnerability and Welfare |
cdcsvi_nohsdp_poverty_disc | binary | Social Vulnerability and Welfare |
To learn more about the data collections and the environments see the docs. The data collections and environments are hosted at the Harvard Dataverse. "Data "nutrition labels" for the collections can be found here. The environments are produced using the space-data repository from a data collection with a configuration file. Don't forget to read our paper.
Please note that the SpaCE project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
We welcome contributions and feedback about spacebench
. If you have any suggestions or ideas, please open an issue or submit a pull request.
The documentation is hosted at https://nsaph-projects.github.io/space/.