This is a data analysis pipeline was created by Lucas Hertzog in 2023. It utilizes the targets
package in R for defining and managing data processing and analysis tasks.
The pipeline is divided into:
- Load the required data.
- Create subsets of data for girls and boys.
- Declare survey design.
- Select variables for the analysis.
- Perform logistic regression and calculate marginal effects.
- Combine the results.
- Generate tables for correlations, summary statistics, regressions, and marginal effects.
- Create relevant plots.
This pipeline requires the following R packages:
- targets
- yaml
- data.table
- dplyr
- ggeffects
- survey
- surveybootstrap
- tidyr
- purrr
- ggplot2
- flextable
- officer
- gtsummary
- gt
- Hmisc
- haven
- apaTables
First, ensure all the required R packages mentioned above are installed.
Then, source the required functions by running:
lapply(list.files("R", full.names = TRUE), source)
Load the configurations for the pipeline:
config <- yaml::read_yaml("config.yaml")
Finally, run the pipeline:
tar_make()
The outputs of the pipeline include:
- Logistic regression results and marginal effects for overall data, girls, and boys.
- Combined results of the marginal effects.
- Tables summarizing correlations, summary statistics, regressions, and marginal effects.
- Plots such as forest plot and summary plot.
The outputs are generated as targets
in the pipeline and can be accessed using the tar_read()
function.