Adding prereq boxes (#62)

* Add Prereq boxes to all chapters * Adjusting spacing.
tidy-survey-r · Aug 7, 2023 · 17c1098 · 17c1098
1 parent 6e27e75
commit 17c1098
Show file tree

Hide file tree

Showing 9 changed files with 227 additions and 165 deletions.
diff --git a/03-specifying-sample-designs.Rmd b/03-specifying-sample-designs.Rmd
@@ -1,7 +1,7 @@
 # Specifying sample designs and replicate weights in {srvyr} {#c03-specifying-sample-designs}
 
 ::: {.prereqbox-header}
-`r if (knitr:::is_html_output()) '### Prerequisites {- #prereq3}'`
+`r if (knitr:::is_html_output()) '### Prerequisites {- #prereq}'`
 :::
 
 ::: {.prereqbox data-latex="{Prerequisites}"}

diff --git a/04-understanding-survey-data-documentation.Rmd b/04-understanding-survey-data-documentation.Rmd
@@ -22,7 +22,7 @@ source("helper-fun/helper-functions.R")
 We will be using data from ANES. Here is the code to read in the data.
 ```{r}
 #| label: understand-anes-c04
-anes_in <- read_osf("anes_2020.rds")
+anes_raw <- read_osf("anes_2020.rds")
 ```
 :::
 
@@ -245,14 +245,14 @@ The target population in 2020 is `r scales::comma(targetpop)`. This information
 
 ```{r}
 #| label: understand-read-anes
-anes_popweights <- anes_in %>%
+anes_in <- anes_raw %>%
   mutate(Weight = V200010b / sum(V200010b) * targetpop) 
 ```
 
 Once we have the weights adjusted to the population, we can then create the survey design using our new weight variable in the `weights` argument and use the strata and cluster variables identified in the users manual.
 ```{r}
 #| label: understand-anes-des
-anes_des <- anes_popweights %>%
+anes_des <- anes_in %>%
   as_survey_design(
     weights = Weight,
     strata = V200010d,

diff --git a/05-descriptive-analysis.Rmd b/05-descriptive-analysis.Rmd
@@ -13,26 +13,78 @@ tribble(
   knitr::kable(format="pandoc", col.names=NULL, caption="Summary of Chapter 5")
 ```
 
+::: {.prereqbox-header}
+`r if (knitr:::is_html_output()) '### Prerequisites {- #prereq}'`
+:::
 
+::: {.prereqbox data-latex="{Prerequisites}"}
+For this chapter, here are the libraries and helper functions we will need:
 ```{r}
-#| label: desc-tidy
-#| include: FALSE
+#| label: desc-setup
+#| error: FALSE
+#| warning: FALSE
+#| message: FALSE
+library(tidyverse)
+library(survey) 
+library(srvyr) 
+library(broom)
+library(osfr)
+source("helper-fun/helper-functions.R")
+
 knitr::opts_chunk$set(tidy = TRUE)
 ```
 
-## Similarities between {dplyr} and {srvyr} functions
-
+To help explain the similarities between {dplyr} functions and {srvyr} functions, this chapter will use the `mtcars` and `iris` datasets that are built-in to R and `apistrat` data that comes in the {survey} package:
 ```{r}
-#| label: desc-dstrat
-#| include: false
-library(srvyr)
-library(survey)
+#| label: desc-setup-surveydata
 data(api)
-
 dstrata <- apistrat %>%
   as_survey_design(strata = stype, weights = pw)
 ```
 
+We will be using data from ANES and RECS. Here is the code to create the design objects for each to use throughout this chapter. For ANES, we need to adjust the weight so it sums to the population instead of the sample (see the ANES documentation and Chapter \@ref(c04-understanding-survey-data-documentation) for more information).
+```{r}
+#| label: desc-anes-des
+#| eval: FALSE
+anes_in <- read_osf("anes_2020.rds") %>%
+  mutate(Weight = Weight / sum(Weight) * targetpop)
+anes_des <- anes_in %>%
+  as_survey_design(
+    weights = Weight,
+    strata = Stratum,
+    ids = VarUnit,
+    nest = TRUE)
+```
+
+For RECS, details are included in the RECS documentation and Chapter \@ref(c03-specifying-sample-designs).
+```{r}
+#| label: desc-recs-des
+#| eval: FALSE
+recs_in <-read_osf("recs_2015.rds")
+recs_des <- recs_in %>%
+  as_survey_rep(weights = NWEIGHT,
+                repweights = starts_with("BRRWT"),
+                type = "Fay",
+                rho = 0.5,
+                mse = TRUE)
+```
+:::
+
+## Introduction
+
+Recall from Chapter \@ref(c03-specifying-sample-designs) the general process for estimation with the {srvyr} package:  
+
+1. Create a `tbl_svy` object using `srvyr::as_survey_design()` or `srvyr::as_survey_rep()`.
+2. Subset the data for subpopulations using `srvyr::filter()`, if needed.
+3. Specify domains of analysis using `srvyr::group_by()`, if needed.
+4. Within `srvyr::summarize()`, specify variables to calculate means, totals, proportions, quantiles, and more.
+
+Filtering should be done after creating the `tbl_svy` object (using `as_survey_design()` or `as_survey_rep()`) because survey objects incorporate the survey design information into the resulting object. 
+
+<!-- We need to add more here about why we do descriptive analysis, this chapter was missing an intro-->
+
+## Similarities between {dplyr} and {srvyr} functions
+
 One of the major advantages of using {srvyr} is that it applies {dplyr}-like syntax to the {survey} package. We can use pipes to specify a tbl_svy object, apply a function, and then feed that output into the next function's first argument. Functions follow the 'tidy' convention of snake_case functions names. In the example below, the mean and median are calculated for the variable `mpg` on the `mtcars` dataset.
 
 ```{r}
@@ -119,49 +171,6 @@ dstrata %>%
             api00_median = survey_median(api00))
 ```
 
-## Chapter set up
-
-Recall from Chapter \@ref(c03-specifying-sample-designs) the general process for estimation with the {srvyr} package:  
-
-1. Create a `tbl_svy` object using `srvyr::as_survey_design()` or `srvyr::as_survey_rep()`.
-2. Subset the data for subpopulations using `srvyr::filter()`, if needed.
-3. Specify domains of analysis using `srvyr::group_by()`, if needed.
-4. Within `srvyr::summarize()`, specify variables to calculate means, totals, proportions, quantiles, and more.
-
-Filtering should be done after creating the `tbl_svy` object (using `as_survey_design()` or `as_survey_rep()`) because survey objects incorporate the survey design information into the resulting object. 
-
-<!-- TODO: edit this depending on how much we've talked about it in earlier chapters-->
-The Residential Energy Consumption Survey (RECS) provides energy consumption and expenditures data. It is funded by Energy Information Administration and collects information through energy suppliers through in-person, phone, and web interviews. It has been fielded 14 times between 1950 and 2020. Topics include appliances, electronics, heating, air conditioning (A/C), temperatures, water heating, lighting, energy bills, respondent demographics, and energy assistance.
-
-The survey targets primarily occupied housing units in the US. RECS uses Balanced Repeated Replication (BRR) to estimate the variances. The full sample information is available on the [EIA website](https://www.eia.gov/consumption/residential/index.php). 
-
-To begin analyzing RECS, we create a `tbl_svy` object using `srvyr::as_survey_design()`:
-
-```{r}
-#| label: recs_des
-#| error: FALSE
-#| warning: FALSE
-#| message: FALSE
-#| eval: FALSE
-library(survey) # for survey analysis
-library(srvyr) # for tidy survey analysis
-library(readr)
-library(osfr)
-source("helper-fun/helper-functions.R")
-
-recs_in <-
-  read_osf("recs_2015.rds")
-
-recs_des <- recs_in %>%
-  as_survey_rep(
-    weights = NWEIGHT,
-    repweights = starts_with("BRRWT"),
-    type = "Fay",
-    rho = 0.5,
-    mse = TRUE
-  )
-```
-
 
 ## Deciding on descriptive analyses
 
@@ -236,7 +245,9 @@ We will discuss `vartype` in Section \@ref(Var-types) as this option occurs in a
 
 #### Examples {-}
 
-If we do not specify any variables in `survey_count()`, the function will output the estimated population count (n) and standard error (n_se).  For example, in the RECS data we can obtain the estimated number of households in the U.S. (the target population) by running the following code: 
+For an example, let's use the Residential Energy Consumption Survey (RECS), which provides energy consumption and expenditures data. RECS funded by Energy Information Administration and collects information through energy suppliers through in-person, phone, and web interviews. It has been fielded 14 times between 1950 and 2020 and includes questions about appliances, electronics, heating, air conditioning (A/C), temperatures, water heating, lighting, energy bills, respondent demographics, and energy assistance.  The survey targets primarily occupied housing units in the US.
+
+If we wanted to obtain the estimated number of households in the U.S. (the target population) using the RECS data we could use `survey_count()`. If we do not specify any variables in the `survey_count()` function, it will output the estimated population count (n) and standard error (n_se).
 
 ```{r}
 #| label: desc-count-overall
@@ -253,7 +264,6 @@ recs_des %>%
   prettyNum(big.mark=",", digits=20)
 ```
 
-
 Thus, the estimated number of households in the U.S. is `r .est_pop`.
 
 To calculate the estimated number of observations for subgroups, such as Region and Division, we can add the variables of interest into the function. In the example below, the estimated number of housing units by region and division is calculated. Additionally, the name of the count variable is changed to "N" from the default ("n").

diff --git a/06-statistical-testing.Rmd b/06-statistical-testing.Rmd
@@ -1,6 +1,54 @@
 # Statistical testing {#c06-statistical-testing}
 
-<!-- author review -->
+::: {.prereqbox-header}
+`r if (knitr:::is_html_output()) '### Prerequisites {- #prereq}'`
+:::
+
+::: {.prereqbox data-latex="{Prerequisites}"}
+For this chapter, here are the libraries and helper functions we will need:
+```{r}
+#| label: stattest-setup
+#| error: FALSE
+#| warning: FALSE
+#| message: FALSE
+library(tidyverse)
+library(survey) 
+library(srvyr) 
+library(broom)
+library(gt)
+library(osfr)
+source("helper-fun/helper-functions.R")
+```
+
+We will be using data from ANES and RECS. Here is the code to create the design objects for each to use throughout this chapter. For ANES, we need to adjust the weight so it sums to the population instead of the sample (see the ANES documentation and Chapter \@ref(c04-understanding-survey-data-documentation) for more information).
+```{r}
+#| label: stattest-anes-des
+#| eval: FALSE
+anes_in <- read_osf("anes_2020.rds") %>%
+  mutate(Weight = Weight / sum(Weight) * targetpop)
+anes_des <- anes_in %>%
+  as_survey_design(
+    weights = Weight,
+    strata = Stratum,
+    ids = VarUnit,
+    nest = TRUE)
+```
+
+For RECS, details are included in the RECS documentation and Chapter \@ref(c03-specifying-sample-designs).
+```{r}
+#| label: stattest-recs-des
+#| eval: FALSE
+recs_in <-read_osf("recs_2015.rds")
+recs_des <- recs_in %>%
+  as_survey_rep(weights = NWEIGHT,
+                repweights = starts_with("BRRWT"),
+                type = "Fay",
+                rho = 0.5,
+                mse = TRUE)
+```
+:::
+
+## Introduction
 
 When analyzing results from a survey, the point estimates described in Chapter \@ref(c05-descriptive-analysis) help us understand the data at a high level. Still, researchers and the public often want to make comparisons between different groups. These comparisons are calculated through statistical testing. 
 
@@ -41,60 +89,6 @@ svydata_des %>%
  svyttest(design = ., x ~ y)
 ```
 
-## Chapter Set-Up {#stattest-setup}
-
-For this chapter, we use the same RECS data as in Chapter \@ref(c05-descriptive-analysis) along with the ANES survey data introduced in Chapter \@ref(c04-understanding-survey-data-documentation). As a reminder, we need to create survey design objects to work with. These design objects ensure that the variance estimation is calculated accurately; thus, we can accurately determine statistical significance.
-
-First, make sure to install and load the following packages:
-
-```{r stattest-pkgs}
-#| error: FALSE
-#| warning: FALSE
-#| message: FALSE
-
-library(tidyverse)
-library(survey)
-library(srvyr)
-library(readr)
-library(gt)
-library(osfr)
-source("helper-fun/helper-functions.R")
-```
-
-Second, we need to read in the data and create the design objects.
-
-Here is how to create the design object for the ANES data. As we showed in Chapter \@ref(c04-understanding-survey-data-documentation), we need to adjust the weight so it sums to the population instead of the sample. We do that by multiplying the weights by the target population count (see the ANES methodology documentation for more information).
-
-```{r stattest-anesdes}
-#| eval: FALSE
-anes_in <- read_osf("anes_2020.rds") %>%
- mutate(Weight = Weight / sum(Weight) * targetpop)
-
-anes_des <- anes_in %>%
-  as_survey_design(
-    weights = Weight,
-    strata = Stratum,
-    ids = VarUnit,
-    nest = TRUE
-  )
-```
-
-Here is how to create the design object for the RECS data:
-
-```{r stattest-recsdes}
-#| eval: FALSE
-recs_in <- read_osf("recs_2015.rds")
-
-recs_des <- recs_in %>%
-  as_survey_rep(
-    weights = NWEIGHT,
-    repweights = starts_with("BRRWT"),
-    type = "Fay",
-    rho = 0.5,
-    mse = TRUE
-  )
-```
-
 ## Comparison of Proportions and Means {#stattest-ttest}
 
 We use t-tests to compare two proportions or means. T-tests allow us to determine if one proportion or mean is statistically different from the other. They are commonly used to determine if a single estimate differs from a known value (e.g., 0 or 50%) or to compare two group means (e.g., North versus South). Comparing a single estimate to a known value is called a *one sample t-test*, and we can set up the hypothesis test as follows:  
@@ -145,7 +139,9 @@ In R, `I()` is a special function that isolates its content from R's parsing cod
 
 Additionally, the `na.rm` argument defaults to `FALSE`, which means if any data is missing, the t-test will not compute. Throughout this chapter we will always set `na.rm = TRUE`, but before analyzing the survey data, review the notes provided in Chapter \@ref(c04-understanding-survey-data-documentation) to better understand how to handle missing data. Finally, the `level` argument is $1-\alpha$, or the amount of type 1 error. The default is $0.95$.
 
-Let's walk through a few examples using the ANES and RECS data. See Section \@ref(stattest-setup) above to set up the design objects.
+Let's walk through a few examples using the ANES and RECS data.
+
+### Examples {#stattest-ttest-examples}
 
 #### Example 1: One-sample t-test {.unnumbered #stattest-ttest-ex1}  
 
@@ -347,7 +343,7 @@ Additionally, as with the t-test function, both `svygofchisq()` and `svychisq()`
 
 ### Examples {#stattest-chi-examples}
 
-Let's walk through a few examples using the ANES data. See Section \@ref(stattest-setup) above to set up the design object.
+Let's walk through a few examples using the ANES data.
 
 #### Example 1: Goodness of Fit Test {.unnumbered #stattest-chi-ex1}
 

diff --git a/07-modeling.Rmd b/07-modeling.Rmd
@@ -1,13 +1,13 @@
 # Modeling {#c07-modeling}
 
 ::: {.prereqbox-header}
-`r if (knitr:::is_html_output()) '### Prerequisites {- #prereq7}'`
+`r if (knitr:::is_html_output()) '### Prerequisites {- #prereq}'`
 :::
 
 ::: {.prereqbox data-latex="{Prerequisites}"}
 For this chapter, here are the libraries and helper functions we will need:
 ```{r}
-#| label: model-c07-setup
+#| label: model-setup
 #| error: FALSE
 #| warning: FALSE
 #| message: FALSE
@@ -21,11 +21,10 @@ source("helper-fun/helper-functions.R")
 
 We will be using data from ANES and RECS. Here is the code to create the design objects for each to use throughout this chapter. For ANES, we need to adjust the weight so it sums to the population instead of the sample (see the ANES documentation and Chapter \@ref(c04-understanding-survey-data-documentation) for more information).
 ```{r}
-#| label: model-anes-des-c07
+#| label: model-anes-des
 #| eval: FALSE
 anes_in <- read_osf("anes_2020.rds") %>%
   mutate(Weight = Weight / sum(Weight) * targetpop)
-
 anes_des <- anes_in %>%
   as_survey_design(
     weights = Weight,
@@ -36,7 +35,7 @@ anes_des <- anes_in %>%
 
 For RECS, details are included in the RECS documentation and Chapter \@ref(c03-specifying-sample-designs).
 ```{r}
-#| label: model-recs-des-c07
+#| label: model-recs-des
 #| eval: FALSE
 recs_in <-read_osf("recs_2015.rds")
 recs_des <- recs_in %>%