Merge pull request #234 from fhdsl/functions

avahoffman · web-flow · commit 1442a28cc486 · 2024-10-10T10:19:11.000-04:00
Last minute changes to Functions
diff --git a/modules/Functions/Functions.Rmd b/modules/Functions/Functions.Rmd
@@ -11,7 +11,6 @@ library(dplyr)
 library(knitr)
 library(stringr)
 library(tidyr)
-library(emo)
 library(readr)
 opts_chunk$set(comment = "")
 ```
@@ -192,7 +191,7 @@ loud(word = "hooray!")
 <!-- ``` -->
 
 
-## Functions for tibbles - curly braces{.codesmall}
+## Functions for tibbles - curly braces
 
 ```{r}
 # get means and missing for a specific column
@@ -203,23 +202,32 @@ get_summary <- function(dataset, col_name) {
 }
 ```
 
-Examples:
+## Functions for tibbles - example{.codesmall}
 
-```{r}
+```{r message = FALSE}
 er <- read_csv(file = "https://daseh.org/data/CO_ER_heat_visits.csv")
+```
+
+```{r}
 get_summary(er, visits)
+```
+
+```{r message = FALSE}
+yearly_co2 <- 
+  read_csv(file = "https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv")
+```
 
-yearly_co2 <- read_csv(file = "https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv")
+```{r}
 get_summary(yearly_co2, `2014`)
 ```
 
 ## Summary
 
 - Simple functions take the form:
   - `NEW_FUNCTION <- function(x, y){x + y}`
-  - Can specify defaults like `function(x = 1, y = 2){x + y}`
-  -`return` will provide a value as output
-  - `print` will simply print the value on the screen but not save it
+  - Can specify defaults like `function(x = 1, y = 2){x + y}`  
+  - `return` will provide a value as output
+- Specify a column (from a tibble) inside a function using `{{double curly braces}}`
 
 
 ## Lab Part 1
@@ -245,7 +253,7 @@ sapply(<a vector, list, data frame>, some_function)
 
 Let's apply a function to look at the CO heat-related ER visits dataset.
 
-`r emo::ji("rotating_light")` There are no parentheses on the functions! `r emo::ji("rotating_light")`
+🚨There are no parentheses on the functions!🚨
 
 You can also pipe into your function.
 
@@ -357,7 +365,6 @@ er %>%
   ))
 ```
 
-
 ## Applying functions with `across` from `dplyr`
 
 Using different `tidyselect()` options (e.g., `starts_with()`, `ends_with()`, `contains()`)
@@ -368,20 +375,6 @@ er %>%
   summarize(across(contains("cl"), mean, na.rm=T))
 ```
 
-
-<!-- ## Applying functions with `across` from `dplyr`{.codesmall} -->
-
-<!-- `mutate()` across to round across many columns at once! -->
-
-<!-- ```{r} -->
-<!-- calenviroscreen %>% -->
-<!--   mutate(across( -->
-<!--     where(is.numeric),  -->
-<!--     function(x) round(x, digits = 0) -->
-<!--   )) %>% select(7:13) -->
-<!-- ``` -->
-
-
 ## Applying functions with `across` from `dplyr` {.smaller}
 
 Combining with `mutate()` - the `replace_na` function
@@ -401,29 +394,15 @@ yearly_co2 %>%
   ))
 ```
 
+## GUT CHECK! 
 
-<!-- ## Use custom functions within `mutate` and `across` -->
+Why use `across()`?
 
-<!-- If your function needs to span more than one line, better to define it first before using inside `mutate()` and `across()`. -->
+A. Efficiency - faster and less repetitive
 
-<!-- ```{r} -->
-<!-- times1000 <- function(x) x * 1000 -->
-
-<!-- airquality %>% -->
-<!--   mutate(across( -->
-<!--     everything(), -->
-<!--     .fns  = times1000 -->
-<!--   )) %>% -->
-<!--   head(n = 2) -->
-
-<!-- airquality %>% -->
-<!--   mutate(across( -->
-<!--     everything(), -->
-<!--     .fns  = function(x) x * 1000 -->
-<!--   )) %>% -->
-<!--   head(n = 2) -->
-<!-- ``` -->
+B. Calculate the cross product
 
+C. Connect across datasets
 
 ## `purrr` package
 
@@ -433,22 +412,29 @@ While we won't get into `purrr` too much in this class, its a handy package for
 
 # Multiple Data Frames
 
-## Multiple data frames {.smaller}
+## Multiple data frames
 
-Lists help us work with multiple data frames
+Lists help us work with multiple tibbles / data frames
 
 ```{r}
-AQ_list <- list(AQ1 = airquality, AQ2 = airquality, AQ3 = airquality)
-str(AQ_list)
+df_list <- list(AQ = airquality, er = er, yearly_co2 = yearly_co2)
 ```
 
+<br>
+
+`select()` from each tibble the numeric columns:
+
+```{r}
+df_list <- 
+  df_list %>% 
+  sapply(function(x) select(x, where(is.numeric)))
+```
 
-## Multiple data frames: `sapply`
+## Multiple data frames: `sapply` {.smaller}
 
 ```{r}
-AQ_list %>% sapply(class)
-AQ_list %>% sapply(nrow)
-AQ_list %>% sapply(colMeans, na.rm = TRUE)
+df_list %>% sapply(nrow)
+df_list %>% sapply(colMeans, na.rm = TRUE)
 ```
 
 
@@ -457,7 +443,7 @@ AQ_list %>% sapply(colMeans, na.rm = TRUE)
 - Apply your functions with `sapply(<a vector or list>, some_function)`
 - Use `across()` to apply functions across multiple columns of data 
 - Need to use `across` within `summarize()` or `mutate()`
-- Can use `sapply` or `purrr` to work with multiple data frames within lists simultaneously
+- Can use `sapply` (or `purrr` package) to work with multiple data frames within lists simultaneously
 
 
 ## Lab Part 2
@@ -466,7 +452,20 @@ AQ_list %>% sapply(colMeans, na.rm = TRUE)
 
 💻 [Lab](https://daseh.org/modules/Functions/lab/Functions_Lab.Rmd)
 
-```{r, fig.alt="The End", out.width = "50%", echo = FALSE, fig.align='center'}
+📃 [Day 9 Cheatsheet](https://daseh.org/modules/cheatsheets/Day-9.pdf)
+
+📃 [Posit's purrr Cheatsheet](https://rstudio.github.io/cheatsheets/purrr.pdf)
+
+## Research Survey
+
+<br>
+
+https://forms.gle/jVue79CjgoMmbVbg9
+
+<br>
+<br>
+
+```{r, fig.alt="The End", out.width = "30%", echo = FALSE, fig.align='center'}
 knitr::include_graphics(here::here("images/the-end-g23b994289_1280.jpg"))
 ```
 
diff --git a/modules/Functions/lab/Functions_Lab_Key.Rmd b/modules/Functions/lab/Functions_Lab_Key.Rmd
@@ -11,29 +11,21 @@ knitr::opts_chunk$set(echo = TRUE)
 
 # Part 1
 
-Load all the libraries we will use in this lab.
+Load the `tidyverse` package.
 
 ```{r message=FALSE}
 library(tidyverse)
 ```
 
 ### 1.1
 
-Create a function that takes one argument, a vector, and returns the sum of the vector and then squares the result. Call it "sum_squared". Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500.
+Create a function that: 
 
-```
-# General format
-NEW_FUNCTION <- function(x, y) x + y
-```
-or
-
-```
-# General format
-NEW_FUNCTION <- function(x, y){
-result <- x + y
-return(result)
-}
-```
+* Takes one argument, a vector.
+* Returns the sum of the vector and then squares the result. 
+* Call it "sum_squared". 
+* Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500.
+* Format is `NEW_FUNCTION <- function(x, y) x + y`
 
 ```{r 1.1response}
 nums <- c(2, 7, 21, 30, 90)
@@ -50,7 +42,12 @@ sum_squared(x = nums)
 
 ### 1.2
 
-Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). **Hint**: use `%in%`. Call it `has_n`. Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.
+Create a function that:
+
+* takes two arguments, (1) a vector and (2) a numeric value. 
+* This function tests whether the number (2) is contained within the vector (1). **Hint**: use `%in%`. 
+* Call it `has_n`. 
+* Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.
 
 ```{r 1.2response}
 nums <- c(2, 7, 21, 30, 90)
@@ -74,11 +71,24 @@ has_n(x = nums)
 
 ### P.1
 
-Create a new number `b_num` that is not contained with `nums`. Use your updated `has_n` function with the default value and add `b_num` as the `n` argument when calling the function. What is the outcome?
+Create a function for the CalEnviroScreen Data. 
+
+* Read in (https://daseh.org/data/CalEnviroScreen_data.csv)
+* The function takes an argument for a column name. (use `{{col_name}}`)
+* The function creates a ggplot with `{{col_name}}` on the x-axis and `Poverty` on the y-axis.
+* Use `geom_point()`
+* Test the function using the `Lead` column and `HousingBurden` columns, or other columns of your choice.
 
 ```{r P.1response}
-b_num <- 11
-has_n(x = nums, n = b_num)
+ces <- read_csv("https://daseh.org/data/CalEnviroScreen_data.csv")
+
+plot_ces <- function(col_name){
+  ggplot(data = ces, aes(x = {{col_name}}, y = Poverty)) +
+    geom_point()
+}
+
+plot_ces(Lead)
+plot_ces(HousingBurden)
 ```
 
 
@@ -96,7 +106,12 @@ ces <- read_csv("https://daseh.org/data/CalEnviroScreen_data.csv")
 
 ### 2.2
 
-We want to get some summary statistics on water contamination. Use `across` inside `summarize` to get the sum total variable containing the string "water" AND ending with "Pctl". **Hint**: use `contains()` AND `ends_with()` to select the right columns inside `across`. Remember that `NA` values can influence calculations.
+We want to get some summary statistics on water contamination. 
+
+* Use `across` inside `summarize`.
+* Choose columns about "water". **Hint**: use `contains("water")` inside `across`. 
+* Use `mean` as the function inside of `across`.
+* Remember that `NA` values can influence calculations.
 
 ```
 # General format
@@ -110,19 +125,26 @@ data %>%
 ```{r 2.2response}
 ces %>%
   summarize(across(
-    contains("Water") & ends_with("Pctl"),
-    sum
+    contains("water"),
+    mean
   ))
+
+# Accounting for NA
 ces %>%
   summarize(across(
-    contains("Water") & ends_with("Pctl"),
-    function(x) sum(x, na.rm = T)
+    contains("water"),
+    function(x) mean(x, na.rm = T)
   ))
 ```
 
 ### 2.3
 
-Use `across` and `mutate` to convert all columns containing the word "water" into proportions (i.e., divide that value by 100). **Hint**: use `contains()` to select the right columns within `across()`. Use an anonymous function ("function on the fly") to divide by 100 (`function(x) x / 100`). It will also be easier to check your work if you `select()` columns that match "Pctl".
+Convert all columns that are percentiles into proportions.
+
+* Use `across` and `mutate`
+* Choose columns that contain "Pctl" in the name. **Hint**: use `contains("Pctl")` inside `across`.
+* Use an anonymous function ("function on the fly") to divide by 100 (`function(x) x / 100`). 
+* Check your work - It will also be easier if you `select(contains("Pctl"))`.
 
 ```
 # General format
@@ -136,7 +158,7 @@ data %>%
 ```{r 2.3response}
 ces %>%
   mutate(across(
-    contains("water"),
+    contains("Pctl"),
     function(x) x / 100
   )) %>%
   select(contains("Pctl"))
@@ -149,42 +171,50 @@ ces %>%
 
 Use `across` and `mutate` to convert all columns starting with the string "PM" into a binary variable: TRUE if the value is greater than 10 and FALSE if less than or equal to 10. 
 
-- **Hint**: use `starts_with()` to select the columns that start with "PM". 
-- Use an anonymous function ("function on the fly") to do a logical test if the value is greater than 10.
-- A logical test with `mutate` will automatically fill a column with TRUE/FALSE.
+* **Hint**: use `starts_with()` to select the columns that start with "PM". 
+* Use an anonymous function ("function on the fly") to do a logical test if the value is greater than 10.
+* A logical test with `mutate` (x > 10) will automatically fill a column with TRUE/FALSE.
 
 ```{r P.2response}
 ces %>%
   mutate(across(
     starts_with("PM"),
     function(x) x > 10
-  ))
+  )) %>% 
+  glimpse() # add glimpse to view the changes
 ```
 
 ### P.3
 
 Take your code from previous question and assign it to the variable `ces_dat`.
 
-- Use `filter()` to drop any rows where "Oakland" appears in `ApproxLocation`. Make sure to reassign this to `ces_dat`.
-- Create a ggplot boxplot (`geom_boxplot()`) where (1) the x-axis is `Asthma`  and (2) the y-axis is `PM2.5`.
-- You change the `labs()` layer so that the x-axis is "ER Visits for Asthma: PM2.5 greater than 10"
+- Create a ggplot where the x-axis is `Asthma` and the y-axis is `PM2.5`.
+- Add a boxplot (`geom_boxplot()`)
+- Change the `labs()` layer so that the x-axis is "ER Visits for Asthma: PM2.5 greater than 10"
 
 ```{r P.3response}
 ces_dat <-
   ces %>%
   mutate(across(
     starts_with("PM"),
     function(x) x > 10
-  )) %>%
-  filter(ApproxLocation != "Oakland")
-
-ces_boxplot <- function(df) {
-  ggplot(df) +
-    geom_boxplot(aes(
-      x = `Asthma`,
-      y = `PM2.5`
-    )) +
+  ))
+
+ggplot(data = ces_dat, aes(x = `Asthma`, y = `PM2.5`)) +
+  geom_boxplot() +
+  labs(x = "ER Visits for Asthma: PM2.5 greater than 10")
+
+# Make everything a function if you like!
+ces_boxplot <- function() {
+  ces %>%
+    mutate(across(
+      starts_with("PM"), 
+      function(x) x > 10
+    )) %>% 
+    ggplot(aes(x = `Asthma`, y = `PM2.5`)) +
+    geom_boxplot() +
     labs(x = "ER Visits for Asthma: PM2.5 greater than 10")
 }
-ces_boxplot(ces_dat)
+
+ces_boxplot()
 ```