Skip to content

Commit

Permalink
Adding prereq boxes (#62)
Browse files Browse the repository at this point in the history
* Add Prereq boxes to all chapters

* Adjusting spacing.
  • Loading branch information
rpowell22 authored Aug 7, 2023
1 parent 6e27e75 commit 17c1098
Show file tree
Hide file tree
Showing 9 changed files with 227 additions and 165 deletions.
2 changes: 1 addition & 1 deletion 03-specifying-sample-designs.Rmd
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Specifying sample designs and replicate weights in {srvyr} {#c03-specifying-sample-designs}

::: {.prereqbox-header}
`r if (knitr:::is_html_output()) '### Prerequisites {- #prereq3}'`
`r if (knitr:::is_html_output()) '### Prerequisites {- #prereq}'`
:::

::: {.prereqbox data-latex="{Prerequisites}"}
Expand Down
6 changes: 3 additions & 3 deletions 04-understanding-survey-data-documentation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ source("helper-fun/helper-functions.R")
We will be using data from ANES. Here is the code to read in the data.
```{r}
#| label: understand-anes-c04
anes_in <- read_osf("anes_2020.rds")
anes_raw <- read_osf("anes_2020.rds")
```
:::

Expand Down Expand Up @@ -245,14 +245,14 @@ The target population in 2020 is `r scales::comma(targetpop)`. This information

```{r}
#| label: understand-read-anes
anes_popweights <- anes_in %>%
anes_in <- anes_raw %>%
mutate(Weight = V200010b / sum(V200010b) * targetpop)
```

Once we have the weights adjusted to the population, we can then create the survey design using our new weight variable in the `weights` argument and use the strata and cluster variables identified in the users manual.
```{r}
#| label: understand-anes-des
anes_des <- anes_popweights %>%
anes_des <- anes_in %>%
as_survey_design(
weights = Weight,
strata = V200010d,
Expand Down
118 changes: 64 additions & 54 deletions 05-descriptive-analysis.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,26 +13,78 @@ tribble(
knitr::kable(format="pandoc", col.names=NULL, caption="Summary of Chapter 5")
```

::: {.prereqbox-header}
`r if (knitr:::is_html_output()) '### Prerequisites {- #prereq}'`
:::

::: {.prereqbox data-latex="{Prerequisites}"}
For this chapter, here are the libraries and helper functions we will need:
```{r}
#| label: desc-tidy
#| include: FALSE
#| label: desc-setup
#| error: FALSE
#| warning: FALSE
#| message: FALSE
library(tidyverse)
library(survey)
library(srvyr)
library(broom)
library(osfr)
source("helper-fun/helper-functions.R")
knitr::opts_chunk$set(tidy = TRUE)
```

## Similarities between {dplyr} and {srvyr} functions

To help explain the similarities between {dplyr} functions and {srvyr} functions, this chapter will use the `mtcars` and `iris` datasets that are built-in to R and `apistrat` data that comes in the {survey} package:
```{r}
#| label: desc-dstrat
#| include: false
library(srvyr)
library(survey)
#| label: desc-setup-surveydata
data(api)
dstrata <- apistrat %>%
as_survey_design(strata = stype, weights = pw)
```

We will be using data from ANES and RECS. Here is the code to create the design objects for each to use throughout this chapter. For ANES, we need to adjust the weight so it sums to the population instead of the sample (see the ANES documentation and Chapter \@ref(c04-understanding-survey-data-documentation) for more information).
```{r}
#| label: desc-anes-des
#| eval: FALSE
anes_in <- read_osf("anes_2020.rds") %>%
mutate(Weight = Weight / sum(Weight) * targetpop)
anes_des <- anes_in %>%
as_survey_design(
weights = Weight,
strata = Stratum,
ids = VarUnit,
nest = TRUE)
```

For RECS, details are included in the RECS documentation and Chapter \@ref(c03-specifying-sample-designs).
```{r}
#| label: desc-recs-des
#| eval: FALSE
recs_in <-read_osf("recs_2015.rds")
recs_des <- recs_in %>%
as_survey_rep(weights = NWEIGHT,
repweights = starts_with("BRRWT"),
type = "Fay",
rho = 0.5,
mse = TRUE)
```
:::

## Introduction

Recall from Chapter \@ref(c03-specifying-sample-designs) the general process for estimation with the {srvyr} package:

1. Create a `tbl_svy` object using `srvyr::as_survey_design()` or `srvyr::as_survey_rep()`.
2. Subset the data for subpopulations using `srvyr::filter()`, if needed.
3. Specify domains of analysis using `srvyr::group_by()`, if needed.
4. Within `srvyr::summarize()`, specify variables to calculate means, totals, proportions, quantiles, and more.

Filtering should be done after creating the `tbl_svy` object (using `as_survey_design()` or `as_survey_rep()`) because survey objects incorporate the survey design information into the resulting object.

<!-- We need to add more here about why we do descriptive analysis, this chapter was missing an intro-->

## Similarities between {dplyr} and {srvyr} functions

One of the major advantages of using {srvyr} is that it applies {dplyr}-like syntax to the {survey} package. We can use pipes to specify a tbl_svy object, apply a function, and then feed that output into the next function's first argument. Functions follow the 'tidy' convention of snake_case functions names. In the example below, the mean and median are calculated for the variable `mpg` on the `mtcars` dataset.

```{r}
Expand Down Expand Up @@ -119,49 +171,6 @@ dstrata %>%
api00_median = survey_median(api00))
```

## Chapter set up

Recall from Chapter \@ref(c03-specifying-sample-designs) the general process for estimation with the {srvyr} package:

1. Create a `tbl_svy` object using `srvyr::as_survey_design()` or `srvyr::as_survey_rep()`.
2. Subset the data for subpopulations using `srvyr::filter()`, if needed.
3. Specify domains of analysis using `srvyr::group_by()`, if needed.
4. Within `srvyr::summarize()`, specify variables to calculate means, totals, proportions, quantiles, and more.

Filtering should be done after creating the `tbl_svy` object (using `as_survey_design()` or `as_survey_rep()`) because survey objects incorporate the survey design information into the resulting object.

<!-- TODO: edit this depending on how much we've talked about it in earlier chapters-->
The Residential Energy Consumption Survey (RECS) provides energy consumption and expenditures data. It is funded by Energy Information Administration and collects information through energy suppliers through in-person, phone, and web interviews. It has been fielded 14 times between 1950 and 2020. Topics include appliances, electronics, heating, air conditioning (A/C), temperatures, water heating, lighting, energy bills, respondent demographics, and energy assistance.

The survey targets primarily occupied housing units in the US. RECS uses Balanced Repeated Replication (BRR) to estimate the variances. The full sample information is available on the [EIA website](https://www.eia.gov/consumption/residential/index.php).

To begin analyzing RECS, we create a `tbl_svy` object using `srvyr::as_survey_design()`:

```{r}
#| label: recs_des
#| error: FALSE
#| warning: FALSE
#| message: FALSE
#| eval: FALSE
library(survey) # for survey analysis
library(srvyr) # for tidy survey analysis
library(readr)
library(osfr)
source("helper-fun/helper-functions.R")
recs_in <-
read_osf("recs_2015.rds")
recs_des <- recs_in %>%
as_survey_rep(
weights = NWEIGHT,
repweights = starts_with("BRRWT"),
type = "Fay",
rho = 0.5,
mse = TRUE
)
```


## Deciding on descriptive analyses

Expand Down Expand Up @@ -236,7 +245,9 @@ We will discuss `vartype` in Section \@ref(Var-types) as this option occurs in a

#### Examples {-}

If we do not specify any variables in `survey_count()`, the function will output the estimated population count (n) and standard error (n_se). For example, in the RECS data we can obtain the estimated number of households in the U.S. (the target population) by running the following code:
For an example, let's use the Residential Energy Consumption Survey (RECS), which provides energy consumption and expenditures data. RECS funded by Energy Information Administration and collects information through energy suppliers through in-person, phone, and web interviews. It has been fielded 14 times between 1950 and 2020 and includes questions about appliances, electronics, heating, air conditioning (A/C), temperatures, water heating, lighting, energy bills, respondent demographics, and energy assistance. The survey targets primarily occupied housing units in the US.

If we wanted to obtain the estimated number of households in the U.S. (the target population) using the RECS data we could use `survey_count()`. If we do not specify any variables in the `survey_count()` function, it will output the estimated population count (n) and standard error (n_se).

```{r}
#| label: desc-count-overall
Expand All @@ -253,7 +264,6 @@ recs_des %>%
prettyNum(big.mark=",", digits=20)
```


Thus, the estimated number of households in the U.S. is `r .est_pop`.

To calculate the estimated number of observations for subgroups, such as Region and Division, we can add the variables of interest into the function. In the example below, the estimated number of housing units by region and division is calculated. Additionally, the name of the count variable is changed to "N" from the default ("n").
Expand Down
110 changes: 53 additions & 57 deletions 06-statistical-testing.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,54 @@
# Statistical testing {#c06-statistical-testing}

<!-- author review -->
::: {.prereqbox-header}
`r if (knitr:::is_html_output()) '### Prerequisites {- #prereq}'`
:::

::: {.prereqbox data-latex="{Prerequisites}"}
For this chapter, here are the libraries and helper functions we will need:
```{r}
#| label: stattest-setup
#| error: FALSE
#| warning: FALSE
#| message: FALSE
library(tidyverse)
library(survey)
library(srvyr)
library(broom)
library(gt)
library(osfr)
source("helper-fun/helper-functions.R")
```

We will be using data from ANES and RECS. Here is the code to create the design objects for each to use throughout this chapter. For ANES, we need to adjust the weight so it sums to the population instead of the sample (see the ANES documentation and Chapter \@ref(c04-understanding-survey-data-documentation) for more information).
```{r}
#| label: stattest-anes-des
#| eval: FALSE
anes_in <- read_osf("anes_2020.rds") %>%
mutate(Weight = Weight / sum(Weight) * targetpop)
anes_des <- anes_in %>%
as_survey_design(
weights = Weight,
strata = Stratum,
ids = VarUnit,
nest = TRUE)
```

For RECS, details are included in the RECS documentation and Chapter \@ref(c03-specifying-sample-designs).
```{r}
#| label: stattest-recs-des
#| eval: FALSE
recs_in <-read_osf("recs_2015.rds")
recs_des <- recs_in %>%
as_survey_rep(weights = NWEIGHT,
repweights = starts_with("BRRWT"),
type = "Fay",
rho = 0.5,
mse = TRUE)
```
:::

## Introduction

When analyzing results from a survey, the point estimates described in Chapter \@ref(c05-descriptive-analysis) help us understand the data at a high level. Still, researchers and the public often want to make comparisons between different groups. These comparisons are calculated through statistical testing.

Expand Down Expand Up @@ -41,60 +89,6 @@ svydata_des %>%
svyttest(design = ., x ~ y)
```

## Chapter Set-Up {#stattest-setup}

For this chapter, we use the same RECS data as in Chapter \@ref(c05-descriptive-analysis) along with the ANES survey data introduced in Chapter \@ref(c04-understanding-survey-data-documentation). As a reminder, we need to create survey design objects to work with. These design objects ensure that the variance estimation is calculated accurately; thus, we can accurately determine statistical significance.

First, make sure to install and load the following packages:

```{r stattest-pkgs}
#| error: FALSE
#| warning: FALSE
#| message: FALSE
library(tidyverse)
library(survey)
library(srvyr)
library(readr)
library(gt)
library(osfr)
source("helper-fun/helper-functions.R")
```

Second, we need to read in the data and create the design objects.

Here is how to create the design object for the ANES data. As we showed in Chapter \@ref(c04-understanding-survey-data-documentation), we need to adjust the weight so it sums to the population instead of the sample. We do that by multiplying the weights by the target population count (see the ANES methodology documentation for more information).

```{r stattest-anesdes}
#| eval: FALSE
anes_in <- read_osf("anes_2020.rds") %>%
mutate(Weight = Weight / sum(Weight) * targetpop)
anes_des <- anes_in %>%
as_survey_design(
weights = Weight,
strata = Stratum,
ids = VarUnit,
nest = TRUE
)
```

Here is how to create the design object for the RECS data:

```{r stattest-recsdes}
#| eval: FALSE
recs_in <- read_osf("recs_2015.rds")
recs_des <- recs_in %>%
as_survey_rep(
weights = NWEIGHT,
repweights = starts_with("BRRWT"),
type = "Fay",
rho = 0.5,
mse = TRUE
)
```

## Comparison of Proportions and Means {#stattest-ttest}

We use t-tests to compare two proportions or means. T-tests allow us to determine if one proportion or mean is statistically different from the other. They are commonly used to determine if a single estimate differs from a known value (e.g., 0 or 50%) or to compare two group means (e.g., North versus South). Comparing a single estimate to a known value is called a *one sample t-test*, and we can set up the hypothesis test as follows:
Expand Down Expand Up @@ -145,7 +139,9 @@ In R, `I()` is a special function that isolates its content from R's parsing cod

Additionally, the `na.rm` argument defaults to `FALSE`, which means if any data is missing, the t-test will not compute. Throughout this chapter we will always set `na.rm = TRUE`, but before analyzing the survey data, review the notes provided in Chapter \@ref(c04-understanding-survey-data-documentation) to better understand how to handle missing data. Finally, the `level` argument is $1-\alpha$, or the amount of type 1 error. The default is $0.95$.

Let's walk through a few examples using the ANES and RECS data. See Section \@ref(stattest-setup) above to set up the design objects.
Let's walk through a few examples using the ANES and RECS data.

### Examples {#stattest-ttest-examples}

#### Example 1: One-sample t-test {.unnumbered #stattest-ttest-ex1}

Expand Down Expand Up @@ -347,7 +343,7 @@ Additionally, as with the t-test function, both `svygofchisq()` and `svychisq()`

### Examples {#stattest-chi-examples}

Let's walk through a few examples using the ANES data. See Section \@ref(stattest-setup) above to set up the design object.
Let's walk through a few examples using the ANES data.

#### Example 1: Goodness of Fit Test {.unnumbered #stattest-chi-ex1}

Expand Down
9 changes: 4 additions & 5 deletions 07-modeling.Rmd
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Modeling {#c07-modeling}

::: {.prereqbox-header}
`r if (knitr:::is_html_output()) '### Prerequisites {- #prereq7}'`
`r if (knitr:::is_html_output()) '### Prerequisites {- #prereq}'`
:::

::: {.prereqbox data-latex="{Prerequisites}"}
For this chapter, here are the libraries and helper functions we will need:
```{r}
#| label: model-c07-setup
#| label: model-setup
#| error: FALSE
#| warning: FALSE
#| message: FALSE
Expand All @@ -21,11 +21,10 @@ source("helper-fun/helper-functions.R")

We will be using data from ANES and RECS. Here is the code to create the design objects for each to use throughout this chapter. For ANES, we need to adjust the weight so it sums to the population instead of the sample (see the ANES documentation and Chapter \@ref(c04-understanding-survey-data-documentation) for more information).
```{r}
#| label: model-anes-des-c07
#| label: model-anes-des
#| eval: FALSE
anes_in <- read_osf("anes_2020.rds") %>%
mutate(Weight = Weight / sum(Weight) * targetpop)
anes_des <- anes_in %>%
as_survey_design(
weights = Weight,
Expand All @@ -36,7 +35,7 @@ anes_des <- anes_in %>%

For RECS, details are included in the RECS documentation and Chapter \@ref(c03-specifying-sample-designs).
```{r}
#| label: model-recs-des-c07
#| label: model-recs-des
#| eval: FALSE
recs_in <-read_osf("recs_2015.rds")
recs_des <- recs_in %>%
Expand Down
Loading

0 comments on commit 17c1098

Please sign in to comment.