Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suggested edits due to infer package 1.0.0 update #263

Merged
merged 1 commit into from
Jan 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 14 additions & 8 deletions 04-foundations/02-lesson/04-02-lesson.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -1144,59 +1144,65 @@ This function has three arguments (inputs):
2. the observed statistic (`obs_stat`)
3. the direction of the alternative hypothesis ("greater", "less", or "two-sided")

We can also use the `visualize()` function to visualize where the observed statistic falls in the distribution of permuted statistics, and shade the direction that the p-value was calculated from.
The `visualize()` function has many inputs (find out more by typing `?visualize` in your console), but the most important ones are the __same__ as the `get_p_value()` function!
We can also use the `visualize()` and `shade_p_value()` functions to visualize where the observed statistic falls in the distribution of permuted statistics, and shade the direction that the p-value was calculated from.
The `shade_p_value()` function has many inputs (find out more by typing `?shade_p_value` in your console), but the most important ones are the __same__ as the `get_p_value()` function!

Now, use the `visualize()` and `get_p_value()` functions for the original, small, and big datasets.
First `visualize()` where the p-value lies on the distibution, and then calculate the p-value.
Now, use the `visualize()`, `shade_p_value()`, and `get_p_value()` functions for the original, small, and big datasets.
First use `shade_p_value()` to see where the p-value lies on the distribution, and then calculate the p-value.

- You can test out the different methods for calculating the p-value by trying out: `direction = "greater"`, `direction = "two_sided"`, and `direction = "less"`.

```{r pvalue, exercise=TRUE}
# Visualize and calculate the p-value for the original dataset
gender_discrimination_perm |>
visualize() +
___(obs_stat = ___, direction = "___")

gender_discrimination_perm |>
___(___, ___)

# Visualize and calculate the p-value for the small dataset
___ |>
visualize() +
___(___, ___)

___ |>
___(___, ___)

# Visualize and calculate the p-value for the big dataset
___ |>
visualize() +
___(___, ___)

___ |>
___(___, ___)
```

```{r pvalue-hint}
Argument of the both functions should be `obs_stat = diff_orig, direction = "greater"`, but remember to use the correct dataset!
Arguments of both `shade_p_value()` and `get_p_value()` functions should be `obs_stat = diff_orig, direction = "greater"`, but remember to use the correct dataset!
```

```{r pvalue-solution}
# Visualize and calculate the p-value for the original dataset
gender_discrimination_perm |>
visualize(obs_stat = diff_orig, direction = "greater")
visualize() +
shade_p_value(obs_stat = diff_orig, direction = "greater")

gender_discrimination_perm |>
get_p_value(obs_stat = diff_orig, direction = "greater")

# Visualize and calculate the p-value for the small dataset
gender_discrimination_small_perm |>
visualize(obs_stat = diff_orig_small, direction = "greater")
visualize() +
shade_p_value(obs_stat = diff_orig_small, direction = "greater")

gender_discrimination_small_perm |>
get_p_value(obs_stat = diff_orig_small, direction = "greater")

# Visualize and calculate the p-value for the big dataset
gender_discrimination_big_perm |>
visualize(obs_stat = diff_orig_big, direction = "greater")
visualize() +
shade_p_value(obs_stat = diff_orig_big, direction = "greater")

gender_discrimination_big_perm |>
get_p_value(obs_stat = diff_orig_big, direction = "greater")
Expand Down
11 changes: 7 additions & 4 deletions 04-foundations/03-lesson/04-03-lesson.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -428,9 +428,9 @@ Now that you've created the randomization distribution, you'll use it to assess

The permuted dataset and the original observed statistic are available in your workspace as `opp_perm` and `diff_obs` respectively.

`visualize()` and `get_p_value()` using the built in infer functions. Remember that the null statistics are above the original difference, so the p-value (which represents how often a null value is more *extreme*) is calculated by counting the number of null values which are `less` than the original difference.
`visualize()`, `shade_p_value()`, and `get_p_value()` using the built-in infer functions. Remember that the null statistics are above the original difference, so the p-value (which represents how often a null value is more *extreme*) is calculated by counting the number of null values which are `less` than the original difference.

- First `visualize()` the sampling distribution of the permuted statistics indicating the place where `obs_stat = diff_obs`, and coloring in values below with the command `direction = "less"`.
- First `visualize()` the sampling distribution of the permuted statistics indicating the place where `obs_stat = diff_obs`, and coloring in values below with the command `direction = "less"` using `shade_p_value()`.
- Then `get_p_value()` is calculated as the proportion of permuted statistics which are `direction = "less"` than `obs_stat = diff_obs`.
- As an alternative way to calculate the p-value, use `summarize()` and `mean()` to find the proportion of times the permuted differences in `opp_perm` (called `stat`) are less than or equal to the observed difference (called `diff_obs`).
- You can test your knowledge by trying out: `direction = "greater"`, `direction = "two_sided"`, and `direction = "less"` before submitting your answer to both `visualize()` and `get_p_value()`.
Expand All @@ -442,6 +442,7 @@ opp_perm <- read_rds("data/opp_perm2.rds")
```{r summarizing_opportunity_cost, exercise=TRUE}
# Visualize the statistic
opp_perm |>
___() +
___(___, ___)

# Calculate the p-value using `get_p_value()`
Expand All @@ -455,7 +456,8 @@ opp_perm |>

```{r summarizing_opportunity_cost-hint-1}
opp_perm |>
visualize(obs_stat = diff_obs, direction = "less")
visualize() +
shade_p_value(obs_stat = diff_obs, direction = "less")
```

```{r summarizing_opportunity_cost-hint-2}
Expand All @@ -466,7 +468,8 @@ opp_perm |>
```{r summarizing_opportunity_cost-solution}
# Visualize the statistic
opp_perm |>
visualize(obs_stat = diff_obs, direction = "less")
visualize() +
shade_p_value(obs_stat = diff_obs, direction = "less")

# Calculate the p-value using `get_p_value()`
opp_perm |>
Expand Down
10 changes: 6 additions & 4 deletions 04-foundations/04-lesson/04-04-lesson.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -943,7 +943,7 @@ percentile_ci
(3)


- Finally, use the `visualize()` function to plot the distribution of bootstrapped proportions with the middle 95 percent highlighted.
- Finally, use the `visualize()` function together with `shade_confidence_interval()` to plot the distribution of bootstrapped proportions with the middle 95 percent highlighted.
- Set the `endpoints` argument to be `percentile_ci`.
- Set the `direction` of the shading to `"between"`, to highlight in-between those endpoints.

Expand All @@ -955,11 +955,12 @@ percentile_ci <- one_poll_boot |>

one_poll_boot |>
# Visualize in-between the endpoints given by percentile_ci
___
___() +
___(endpoints = ___, direction = ___)
```

```{r bootstrap_percentile_3-hint}
After the pipe, visualize the interval by calling `visualize()`, setting `endpoints` to `percentile_ci` and `direction` to `"between"`.
After the pipe, visualize the distribution by calling `visualize()` and the interval using `shade_confidence_interval()`, setting `endpoints` to `percentile_ci` and `direction` to `"between"`.
```

```{r bootstrap_percentile_3-solution}
Expand All @@ -969,7 +970,8 @@ percentile_ci <- one_poll_boot |>

one_poll_boot |>
# Visualize in-between the endpoints given by percentile_ci
visualize(endpoints = percentile_ci, direction = "between")
visualize() +
shade_confidence_interval(endpoints = percentile_ci, direction = "between")
```

Excellent! Again, the same caveat applies: because the two intervals were created using different methods, the intervals are expected to be a bit different as well. In the long run, however, the intervals should provide the same information.
Expand Down