Contact: Jason Bryer (jason@bryer.org)
Bookdown Site: https://psa.bryer.org
The use of propensity score methods (Rosenbaum & Rubin, 1983) for estimating causal effects in observational studies or certain kinds of quasi-experiments has been increasing over the last two decades. Propensity score analysis (PSA) attempts to adjust selection bias that occurs due to the lack of randomization. Analysis is typically conducted in three phases. In phase I, the probability of placement in the treatment is estimated to identify matched pairs, clusters, or probability weights. In phase II, comparisons on the dependent variable can be made between matched pairs, within clusters, or using inverse probability weights in regression models. In phase III, sensitivity analysis is conducted to estimate how robust the effect sizes estimated in phase II are to unobserved confounders. R (R Core Team, 2012) is ideal for conducting PSA given its wide availability of the most current statistical methods vis-à-vis add-on packages as well as its superior graphics capabilities. This talk will provide participants with a theoretical overview of propensity score methods with an emphasis on graphics. A survey of R packages for conducting PSA with multilevel data, non-binary treatments, and bootstrapping will also be provided. Lastly, a Shiny application to assist with all three phases of PSA will be demonstrated.
The latest version slides introducing propensity score analysis: PDF or HTML.
You can install the psa
package using the remotes
package. I
recommend setting the dependencies = 'Enhances'
as many this will
install all the packages that are used in the examples.
remotes::install_github('jbryer/psa', build_vignettes = TRUE, dependencies = 'Enhances')
Run the PSA Shiny App:
psa::psa_shiny()
To explore the PSA visualizations in this package through a simulation, run this Shiny application:
psa::psa_simulation_shiny()
data(lalonde, package='Matching')
formu.lalonde <- treat ~ age + I(age^2) + educ + I(educ^2) + hisp + married + nodegr +
re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75
mb0.lalonde <- psa::MatchBalance(df = lalonde, formu=formu.lalonde)
# summary(mb0.lalonde) # Excluded to save space
plot(mb0.lalonde)
data(lalonde, package = 'Matching')
lr_out <- glm(treat ~ age + I(age^2) + educ + I(educ^2) + black +
hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) +
u74 + u75,
data = lalonde,
family = binomial(link = 'logit'))
lalonde$ps <- fitted(lr_out)
psa::loess_plot(ps = lalonde$ps,
outcome = log(lalonde$re78 + 1),
treatment = as.logical(lalonde$treat))
#> `geom_smooth()` using method = 'loess'
psa::weighting_plot(ps = lalonde$ps,
treatment = lalonde$treat,
outcome = (lalonde$re78))
psa::stratification_plot(ps = lalonde$ps,
treatment = lalonde$treat,
outcome = lalonde$re78)
match_out <- Matching::Match(Y = lalonde$re78,
Tr = lalonde$treat,
X = lalonde$ps,
caliper = 0.1,
replace = FALSE,
estimand = 'ATE')
#> Warning in Matching::Match(Y = lalonde$re78, Tr = lalonde$treat, X =
#> lalonde$ps, : replace==FALSE, but there are more (weighted) control obs than
#> treated obs. Some control obs will not be matched. You may want to estimate
#> ATT instead.
psa::matching_plot(ps = lalonde$ps,
treatment = lalonde$treat,
outcome = log(lalonde$re78 + 1),
index_treated = match_out$index.treated,
index_control = match_out$index.control)
The merge.mids
function is a convenience for merging the multiple
imputation results from the mice::mice()
function with the full data
frame used for imputation. In the context of PSA imputation is conducted
without the including the outcome variable. This function will merge in
the outcome, along with any other variables not used in the imputation
procedure, with one of the imputed datasets. Additionally, by setting
the shadow.matrix
parameter to TRUE
the resulting data frame will
contain additional logical columns with the suffix _missing
with a
value of TRUE
if the variable was originally missing and therefore was
imputed.
The following R scripts will outline how to conduct propensity score analysis.
- Setup.R - Install R packages. This script generally needs to be run once per R installation.
- IntroPSA.R - Conducts propensity score analysis and matching, summarizes results, and evaluates balance using the National Supported Work Demonstration and Current Population Survey (aka lalonde data).
- IntroPSA-Tutoring.R - Conducts propensity
score analysis and matching, summarizes results, and evaluates balance
using data from a study examining student use of tutoring services in
an online introductory writing class (from the
TriMatch
package). - Sensitivity.R - Conduct a sensitivity analysis.
- Missingness.R - How to evaluate whether data is missing at random.
- BootstrappingPSA.R - Boostrapping PSA.
- NonBinaryPSA.R - Analysis of three groups (two treatments and one control)
- MultilevelPSA.R - Multilevel propensity score analysis.
There are a number of R packages available for conducting propensity score analysis. These are the packages this workshop will make use of:
MatchIt
(Ho, Imai, King, & Stuart, 2011) Nonparametric Preprocessing for Parametric Causal InferenceMatching
(Sekhon, 2011) Multivariate and Propensity Score Matching Software for Causal InferencemultilevelPSA
(Bryer & Pruzek, 2011) Multilevel Propensity Score Analysisparty
(Hothorn, Hornik, & Zeileis, 2006) A Laboratory for Recursive PartytioningPSAboot
(Bryer, 2013) Bootstrapping for Propensity Score AnalysisPSAgraphics
(Helmreich & Pruzek, 2009) An R Package to Support Propensity Score Analysisrbounds
(Keele, 2010) An Overview of rebounds: An R Package for Rosenbaum bounds sensitivity analysis with matched data.rpart
(Therneau, Atkinson, & Ripley, 2012) Recursive PartitioningTriMatch
(Bryer, 2013) Propensity Score Matching for Non-Binary Treatments
Please note that the psa project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.