Skip to content

Commit

Permalink
Use prettyunits to make p-values pretty (#95)
Browse files Browse the repository at this point in the history
  • Loading branch information
szimmer authored Feb 25, 2024
1 parent 1aa432f commit c500658
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 6 deletions.
10 changes: 5 additions & 5 deletions 06-statistical-testing.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ recs_des %>%
summarize(mu = survey_mean(SummerTempNight, na.rm = TRUE))
```

The result is the same in both methods, so we see that the average temperature U.S. households set their thermostat to in the summer at night is `r signif(ttest_ex1$estimate + 68,3)`$^\circ$F. Looking at the output from `svyttest()`, the t-statistic is `r signif(ttest_ex1$statistic, 3)`, and the p-value is $`r signif(ttest_ex1[["p.value"]], 3)`$, indicating that the average is statistically different from 68$^\circ$F at an $\alpha$ level of $0.05$.
The result is the same in both methods, so we see that the average temperature U.S. households set their thermostat to in the summer at night is `r signif(ttest_ex1$estimate + 68,3)`$^\circ$F. Looking at the output from `svyttest()`, the t-statistic is `r signif(ttest_ex1$statistic, 3)`, and the p-value is $`r pretty_p_value(ttest_ex1[["p.value"]])`$, indicating that the average is statistically different from 68$^\circ$F at an $\alpha$ level of $0.05$.

If we want an 80% confidence interval for the test statistic, we can use the function `confint()` to change the confidence level. Below, we print both the original 95% confidence interval and the 80% confidence interval:

Expand Down Expand Up @@ -245,7 +245,7 @@ The output from the `svyttest()` function can be a bit hard to read. Using the {
broom::tidy(ttest_ex2)
```

The estimate differs from Example 1 in that the estimate is not displaying \(\mu - 0.90\) but rather \(\mu\), or the difference between the U.S. households that use AC and the proportion we are comparing to. We can see that there is a difference of `r signif(ttest_ex2$estimate*100,3)` percentage points. Additionally, the t-statistic value in the `statistic` column is `r signif(ttest_ex2$statistic,3)`, and the p-value is `r signif(ttest_ex2$p.value,3)`. These results indicate that the fewer than 90% of U.S. households use AC in their homes.
The estimate differs from Example 1 in that the estimate is not displaying \(\mu - 0.90\) but rather \(\mu\), or the difference between the U.S. households that use AC and the proportion we are comparing to. We can see that there is a difference of `r signif(ttest_ex2$estimate*100,3)` percentage points. Additionally, the t-statistic value in the `statistic` column is `r signif(ttest_ex2$statistic,3)`, and the p-value is `r pretty_p_value(ttest_ex2$p.value)`. These results indicate that the fewer than 90% of U.S. households use AC in their homes.

<!--Add in callout box about how to use the $ notation to help call out the different values? Maybe indicate how this will be covered more in the reporting chapter? IV: I added a bit up top, not sure if it needs a whole call out box but happy to revisit.-->

Expand Down Expand Up @@ -277,7 +277,7 @@ ttest_ex3 <- recs_des %>%
broom::tidy(ttest_ex3)
```

The results indicate that the difference in electrical bills for those that used AC and those that did not is, on average, \$`r round(ttest_ex3$estimate,2)`. The difference appears to be statistically significant as the t-statistic is `r signif(ttest_ex3$statistic, 3)` and the p-value is $`r signif(ttest_ex3[["p.value"]], 3)`$. Households that used AC spent, on average, $`r round(ttest_ex3[["estimate"]], 2) %>% unname()` more in 2020 on electricity than households without AC.
The results indicate that the difference in electrical bills for those that used AC and those that did not is, on average, \$`r round(ttest_ex3$estimate,2)`. The difference appears to be statistically significant as the t-statistic is `r signif(ttest_ex3$statistic, 3)` and the p-value is $`r pretty_p_value(ttest_ex3[["p.value"]])`$. Households that used AC spent, on average, $`r round(ttest_ex3[["estimate"]], 2) %>% unname()` more in 2020 on electricity than households without AC.

#### Example 4: Paired two-sample t-test {.unnumbered #stattest-ttest-ex4}

Expand All @@ -300,7 +300,7 @@ ttest_ex4 <- recs_des %>%
broom::tidy(ttest_ex4)
```

U.S. households set their thermostat on average `r signif(ttest_ex4$estimate,2)`$^\circ$F warmer in summer nights than winter nights, which is statistically significant (t = `r signif(ttest_ex4$statistic, 3)`, p-value = $`r signif(ttest_ex4$p.value, 3)`$).
U.S. households set their thermostat on average `r signif(ttest_ex4$estimate,2)`$^\circ$F warmer in summer nights than winter nights, which is statistically significant (t = `r signif(ttest_ex4$statistic, 3)`, p-value = $`r pretty_p_value(ttest_ex4[["p.value"]])`$).

## Chi-Square Tests {#stattest-chi}

Expand Down Expand Up @@ -432,7 +432,7 @@ chi_ex1 <- anes_des_educ %>%
chi_ex1
```

The output from the `svygofchisq()` indicates that at least one proportion from ANES does not match the ACS data ($\chi^2 =$ `r chi_ex1$statistic`; $p-value =$ `r signif(chi_ex1$p.value,3)`). To get a better idea of the differences, we can use the `expected` output along with `survey_mean()` to create a comparison table:
The output from the `svygofchisq()` indicates that at least one proportion from ANES does not match the ACS data ($\chi^2 =$ `r chi_ex1$statistic`; $p-value =$ `r pretty_p_value(chi_ex1[["p.value"]])`). To get a better idea of the differences, we can use the `expected` output along with `survey_mean()` to create a comparison table:

```{r}
#| label: stattest-chi-ex1-table
Expand Down
2 changes: 1 addition & 1 deletion 07-modeling.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,7 @@ urb_reg_test <- regTermTest(m_electric_multi, ~Urbanicity:Region)
urb_reg_test
```

This output indicates there is a significant interaction between urbanicity and region (p-value=$`r signif(urb_reg_test[["p"]], 3)`$).
This output indicates there is a significant interaction between urbanicity and region (p-value=$`r pretty_p_value(urb_reg_test[["p"]])`$).

To examine the predictions, residuals and more from the model, the function `augment()` from {broom} can be used. The `augment()` function will return a tibble with the independent and dependent variables and other fit statistics. The `augment()` function has not been specifically written for objects of class `svyglm`, and as such, a warning will be displayed indicating this at this time. As it was not written exactly for this class of objects, a little tweaking needs to be done after using augment to get the predicted (`.fitted`) and standard error (`.se.fit`) values. To obtain the standard error of the fitted values we need to use the `attr()` function on the `.fitted` values created by `augment()`.

Expand Down
1 change: 1 addition & 0 deletions index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ if (knitr:::is_html_output()){
options(width=72)
}
library(formatR)
library(prettyunits)
book_colors <- c("#0b3954", "#087e8b", "#bfd7ea", "#ff8484", "#8d6b94")
Expand Down

0 comments on commit c500658

Please sign in to comment.