diff --git a/04-foundations/02-lesson/04-02-lesson.Rmd b/04-foundations/02-lesson/04-02-lesson.Rmd index ab32568c..90b1f8fb 100644 --- a/04-foundations/02-lesson/04-02-lesson.Rmd +++ b/04-foundations/02-lesson/04-02-lesson.Rmd @@ -1144,17 +1144,18 @@ This function has three arguments (inputs): 2. the observed statistic (`obs_stat`) 3. the direction of the alternative hypothesis ("greater", "less", or "two-sided") -We can also use the `visualize()` function to visualize where the observed statistic falls in the distribution of permuted statistics, and shade the direction that the p-value was calculated from. -The `visualize()` function has many inputs (find out more by typing `?visualize` in your console), but the most important ones are the __same__ as the `get_p_value()` function! +We can also use the `visualize()` and `shade_p_value()` functions to visualize where the observed statistic falls in the distribution of permuted statistics, and shade the direction that the p-value was calculated from. +The `shade_p_value()` function has many inputs (find out more by typing `?shade_p_value` in your console), but the most important ones are the __same__ as the `get_p_value()` function! -Now, use the `visualize()` and `get_p_value()` functions for the original, small, and big datasets. -First `visualize()` where the p-value lies on the distibution, and then calculate the p-value. +Now, use the `visualize()`, `shade_p_value()`, and `get_p_value()` functions for the original, small, and big datasets. +First use `shade_p_value()` to see where the p-value lies on the distribution, and then calculate the p-value. - You can test out the different methods for calculating the p-value by trying out: `direction = "greater"`, `direction = "two_sided"`, and `direction = "less"`. ```{r pvalue, exercise=TRUE} # Visualize and calculate the p-value for the original dataset gender_discrimination_perm |> + visualize() + ___(obs_stat = ___, direction = "___") gender_discrimination_perm |> @@ -1162,6 +1163,7 @@ gender_discrimination_perm |> # Visualize and calculate the p-value for the small dataset ___ |> + visualize() + ___(___, ___) ___ |> @@ -1169,6 +1171,7 @@ ___ |> # Visualize and calculate the p-value for the big dataset ___ |> + visualize() + ___(___, ___) ___ |> @@ -1176,27 +1179,30 @@ ___ |> ``` ```{r pvalue-hint} - Argument of the both functions should be `obs_stat = diff_orig, direction = "greater"`, but remember to use the correct dataset! + Arguments of both `shade_p_value()` and `get_p_value()` functions should be `obs_stat = diff_orig, direction = "greater"`, but remember to use the correct dataset! ``` ```{r pvalue-solution} # Visualize and calculate the p-value for the original dataset gender_discrimination_perm |> - visualize(obs_stat = diff_orig, direction = "greater") + visualize() + + shade_p_value(obs_stat = diff_orig, direction = "greater") gender_discrimination_perm |> get_p_value(obs_stat = diff_orig, direction = "greater") # Visualize and calculate the p-value for the small dataset gender_discrimination_small_perm |> - visualize(obs_stat = diff_orig_small, direction = "greater") + visualize() + + shade_p_value(obs_stat = diff_orig_small, direction = "greater") gender_discrimination_small_perm |> get_p_value(obs_stat = diff_orig_small, direction = "greater") # Visualize and calculate the p-value for the big dataset gender_discrimination_big_perm |> - visualize(obs_stat = diff_orig_big, direction = "greater") + visualize() + + shade_p_value(obs_stat = diff_orig_big, direction = "greater") gender_discrimination_big_perm |> get_p_value(obs_stat = diff_orig_big, direction = "greater") diff --git a/04-foundations/03-lesson/04-03-lesson.Rmd b/04-foundations/03-lesson/04-03-lesson.Rmd index 004ccd3c..e8ef8529 100644 --- a/04-foundations/03-lesson/04-03-lesson.Rmd +++ b/04-foundations/03-lesson/04-03-lesson.Rmd @@ -428,9 +428,9 @@ Now that you've created the randomization distribution, you'll use it to assess The permuted dataset and the original observed statistic are available in your workspace as `opp_perm` and `diff_obs` respectively. -`visualize()` and `get_p_value()` using the built in infer functions. Remember that the null statistics are above the original difference, so the p-value (which represents how often a null value is more *extreme*) is calculated by counting the number of null values which are `less` than the original difference. +`visualize()`, `shade_p_value()`, and `get_p_value()` using the built-in infer functions. Remember that the null statistics are above the original difference, so the p-value (which represents how often a null value is more *extreme*) is calculated by counting the number of null values which are `less` than the original difference. -- First `visualize()` the sampling distribution of the permuted statistics indicating the place where `obs_stat = diff_obs`, and coloring in values below with the command `direction = "less"`. +- First `visualize()` the sampling distribution of the permuted statistics indicating the place where `obs_stat = diff_obs`, and coloring in values below with the command `direction = "less"` using `shade_p_value()`. - Then `get_p_value()` is calculated as the proportion of permuted statistics which are `direction = "less"` than `obs_stat = diff_obs`. - As an alternative way to calculate the p-value, use `summarize()` and `mean()` to find the proportion of times the permuted differences in `opp_perm` (called `stat`) are less than or equal to the observed difference (called `diff_obs`). - You can test your knowledge by trying out: `direction = "greater"`, `direction = "two_sided"`, and `direction = "less"` before submitting your answer to both `visualize()` and `get_p_value()`. @@ -442,6 +442,7 @@ opp_perm <- read_rds("data/opp_perm2.rds") ```{r summarizing_opportunity_cost, exercise=TRUE} # Visualize the statistic opp_perm |> + ___() + ___(___, ___) # Calculate the p-value using `get_p_value()` @@ -455,7 +456,8 @@ opp_perm |> ```{r summarizing_opportunity_cost-hint-1} opp_perm |> - visualize(obs_stat = diff_obs, direction = "less") + visualize() + + shade_p_value(obs_stat = diff_obs, direction = "less") ``` ```{r summarizing_opportunity_cost-hint-2} @@ -466,7 +468,8 @@ opp_perm |> ```{r summarizing_opportunity_cost-solution} # Visualize the statistic opp_perm |> - visualize(obs_stat = diff_obs, direction = "less") + visualize() + + shade_p_value(obs_stat = diff_obs, direction = "less") # Calculate the p-value using `get_p_value()` opp_perm |> diff --git a/04-foundations/04-lesson/04-04-lesson.Rmd b/04-foundations/04-lesson/04-04-lesson.Rmd index 6844afe4..4c03ec69 100644 --- a/04-foundations/04-lesson/04-04-lesson.Rmd +++ b/04-foundations/04-lesson/04-04-lesson.Rmd @@ -943,7 +943,7 @@ percentile_ci (3) -- Finally, use the `visualize()` function to plot the distribution of bootstrapped proportions with the middle 95 percent highlighted. +- Finally, use the `visualize()` function together with `shade_confidence_interval()` to plot the distribution of bootstrapped proportions with the middle 95 percent highlighted. - Set the `endpoints` argument to be `percentile_ci`. - Set the `direction` of the shading to `"between"`, to highlight in-between those endpoints. @@ -955,11 +955,12 @@ percentile_ci <- one_poll_boot |> one_poll_boot |> # Visualize in-between the endpoints given by percentile_ci - ___ + ___() + + ___(endpoints = ___, direction = ___) ``` ```{r bootstrap_percentile_3-hint} - After the pipe, visualize the interval by calling `visualize()`, setting `endpoints` to `percentile_ci` and `direction` to `"between"`. + After the pipe, visualize the distribution by calling `visualize()` and the interval using `shade_confidence_interval()`, setting `endpoints` to `percentile_ci` and `direction` to `"between"`. ``` ```{r bootstrap_percentile_3-solution} @@ -969,7 +970,8 @@ percentile_ci <- one_poll_boot |> one_poll_boot |> # Visualize in-between the endpoints given by percentile_ci - visualize(endpoints = percentile_ci, direction = "between") + visualize() + + shade_confidence_interval(endpoints = percentile_ci, direction = "between") ``` Excellent! Again, the same caveat applies: because the two intervals were created using different methods, the intervals are expected to be a bit different as well. In the long run, however, the intervals should provide the same information.