Merge pull request #348 from jhudsl/stats23

Functions
jhudsl · Jan 20, 2023 · 31b7d56 · 31b7d56
2 parents 809f65d + 837a1b8
commit 31b7d56
Show file tree

Hide file tree

Showing 3 changed files with 185 additions and 62 deletions.
diff --git a/modules/Functions/Functions.Rmd b/modules/Functions/Functions.Rmd
@@ -52,29 +52,54 @@ times_2 <- function(x) {
 times_2(x = 10)
 ```
 
+
+## Writing your own functions
+
+The general syntax for a function is: 
+
+```
+functionName <- function(inputs) {
+ <function body>
+return(value)
+}
+```
+
+
 ## Writing your own functions: `return`
 
 If we want something specific for the function's output, we use `return()`:
 
 ```{r comment=""}
-times_2 <- function(x) {
-  output <- x * 2
+times_2_plus_4 <- function(x) {
+  output_int <- x * 2
+  output <- output_int + 4
   return(output)
 }
-times_2(x = 10)
+times_2_plus_4(x = 10)
 ```
 
-## Writing your own functions
+## Writing your own functions: print intermediate steps
 
-The general syntax for a function is: 
+ - printed results do not stay around but can show what a function is doing
+ - returned results stay around
+ - can only return one result but can print many
 
-```
-functionName <- function(inputs) {
- <function body>
-return(value)
+## Adding print
+
+```{r comment=""}
+times_2_plus_4 <- function(x) {
+  output_int <- x * 2
+  output <- output_int + 4
+  print(paste("times2 result = ", output_int))
+  return(output)
 }
+
+result <-times_2_plus_4(x = 10)
+result
+
 ```
 
+
 ## Writing your own functions: multiple inputs
 
 Functions can take multiple inputs:
@@ -84,6 +109,22 @@ times_2_plus_y <- function(x, y) x * 2 + y
 times_2_plus_y(x = 10, y = 3)
 ```
 
+
+## Writing your own functions: multiple outputs
+
+Functions can have one returned result with multiple outputs.
+
+```{r comment=""}
+x_and_y_plus_2<- function(x, y){
+    output1 <- x + 2
+    output2 <- y + 2
+
+return(c(output1,output2))
+}
+result <-x_and_y_plus_2(x = 10, y = 3)
+result
+```
+
 ## Writing your own functions: defaults
 
 Functions can have "default" arguments. This lets us use the function without using an argument later:
@@ -129,18 +170,16 @@ loud(word = "hooray!")
 We can use `filter(row_number()==n)` to extract a row of a tibble:
 
 ```{r message=FALSE}
-cars <- read_kaggle()
-
 get_row <- function(dat, row) dat %>% filter(row_number() == row)
-```
 
-```{r echo=FALSE}
-# So extra columns don't clutter the slide
-cars <- cars %>% select(1:10)
+cars <- read_kaggle()
+cars <- cars %>% select(1:8)
 ```
 
+
 ```{r}
 get_row(dat = cars, row = 10)
+get_row(dat = iris, row = 4)
 ```
 
 ```{r echo=FALSE}
@@ -156,7 +195,7 @@ cars <- read_kaggle()
 get_index <- function(dat, row, col) {
   dat %>%
     filter(row_number() == row) %>%
-    select(col)
+    select(all_of(col))
 }
 
 get_index(dat = cars, row = 10, col = 8)
@@ -170,67 +209,88 @@ Including default values for arguments:
 get_top <- function(dat, row = 1, col = 1) {
   dat %>%
     filter(row_number() == row) %>%
-    select(col)
+    select(all_of(col))
 }
 
 get_top(dat = cars)
 ```
 
-## Using your custom functions: `sapply()`
+# Functions on multiple columns
+
+## Using your custom functions: `sapply()`- a base R function
 
 Now that you've made a function... You can "apply" functions easily with `sapply()`!
 
 These functions take the form:
 
 ```   
-sapply(<a vector or list>, some_function)
+sapply(<a vector, list, data frame>, some_function)
 ```
 
 ## Using your custom functions: `sapply()`
 
 `r emo::ji("rotating_light")` There are no parentheses on the functions! `r emo::ji("rotating_light")`
 
+You can also pipe into your function.
+
 ```{r comment=""}
-sapply(cars, class)
+head(iris, n = 2)
+sapply(iris, class)
+iris %>% sapply(class)
 ```
 
 ## Using your custom functions: `sapply()`
 
 ```{r}
-sapply(pull(cars, VehOdo), times_2_plus_y)
+select(cars, VehYear:VehicleAge) %>% head()
+select(cars, VehYear:VehicleAge) %>% sapply(times_2) %>% head()
 ```
 
 ## Using your custom functions "on the fly" to iterate
 
 ```{r comment=""}
-sapply(pull(cars, VehOdo), function(x) x / 1000)
+select(cars, VehYear:VehicleAge) %>% 
+  sapply(function(x) x / 1000) %>% head()
 ```
+# across
 
 ## Applying functions with `across` from `dplyr`
 
-`across()` makes it easy to apply the same transformation to multiple columns. Usually used with `summarize()`.
+`across()` makes it easy to apply the same transformation to multiple columns. Usually used with `summarize()` or `mutate()`.
 
 ```
-across( .cols = <columns>, .fns = function, ... )
+summarize(across( .cols = <columns>, .fns = function, ... )) 
+```
+or
+```
+mutate(across(.cols = <columns>, .fns = function, ...))
 ```
-
 - List columns first : `.cols = `
 - List function next: `.fns = `
 - Then list any arguments for the function
 
-## Applying functions with `across` from `dplyr`.{.codesmall}
+## Applying functions with `across` from `dplyr`
 
-Combining with `summarize()`:
+Combining with `summarize()`
 
 ```{r warning=FALSE}
-cars_dbl <- cars %>% select(Make, Model, where(is.double))
+cars_dbl <- cars %>% select(Make, starts_with("Veh"))
 
+cars_dbl %>%
+  summarize(across(.cols = everything(), .fns = mean))
+```
+
+## Applying functions with `across` from `dplyr`
+
+Can use with other tidyverse functions like `group_by`!
+
+```{r}
 cars_dbl %>%
   group_by(Make) %>%
   summarize(across(.cols = everything(), .fns = mean))
 ```
 
-## Applying functions with `across` from `dplyr`.{.codesmall}
+## Applying functions with `across` from `dplyr`
 
 Combining with `summarize()`:
 
@@ -242,42 +302,33 @@ cars_dbl %>%
   summarize(across(.cols = everything(), .fns = mean, na.rm = TRUE))
 ```
 
-## Applying functions with `across` from `dplyr`.{.codesmall}
+## Applying functions with `across` from `dplyr`
 
 Using different `tidyselect()` options:
 
 ```{r warning=FALSE}
-cars_dbl %>%
+cars_dbl %>% 
   group_by(Make) %>%
   summarize(across(.cols = starts_with("Veh"), .fns = mean))
 ```
 
-## Applying functions with `across` from `dplyr`.{.codesmall}
+## Applying functions with `across` from `dplyr`
 
-Combining with `mutate()`:
+Combining with `mutate()`: rounding to the nearest power of 10 (with negative digits value)
 
 ```{r}
 cars_dbl %>%
-  mutate(across(.cols = starts_with("Veh"), .fns = round, digits = -3))
+  mutate(across(
+    .cols = starts_with("Veh"), 
+    .fns = round, 
+    digits = -3))
 ```
 
-## Applying functions with `across` from `dplyr`.{.codesmall}
 
-Combining with `mutate()`:
-
-```{r}
-cars_dbl %>%
-  mutate(across(
-    .cols = everything(),
-    .fns = str_replace_all,
-    pattern = "A",
-    replacement = "a"
-  ))
-```
 
 ## Applying functions with `across` from `dplyr`.{.codesmall}
 
-Combining with `mutate()`:
+Combining with `mutate()` - the `replace_na` function
 
 ```{r warning=FALSE, message=FALSE}
 # Child mortality data
@@ -292,13 +343,67 @@ mort %>%
   ))
 ```
 
+## Use custom functions within `mutate` and `across`
+
+```{r}
+times1000 <- function(x) x *1000
+
+airquality %>%
+  mutate(across(
+    .cols = everything(),
+    .fns  = times1000
+  )) %>% head(n = 2)
+
+airquality %>%
+  mutate(across(
+    .cols = everything(),
+    .fns  = function(x) x *1000
+  )) %>% head(n = 2)
+```
+
+
+## `purrr` package
+
+Similar to across, `purrr` is a package that allows you to apply a function to multiple columns in a data frame or multiple data objects in a list.
+
+## map_df
+
+```{r}
+library(purrr)
+airquality %>% map_df(replace_na, replace = 0)
+```
+
+# Multiple Data Frames
+
+## Multiple data frames
+
+Lists help us work with multiple data frames
+
+```{r}
+AQ_list <- list( AQ1 = airquality, AQ2 = airquality, AQ3 = airquality)
+str(AQ_list)
+```
+
+
+## Multiple data frames: `sapply`
+
+```{r}
+AQ_list %>% sapply(summary)
+```
+
+
 ## Summary
 
 - Simple functions take the form:
-  - `NEW_FUNCTION <- function(x, y) x + y ..`
-  - Can specify defaults like `function(x = 1, y = 2)`
+  - `NEW_FUNCTION <- function(x, y){x + y}`
+  - Can specify defaults like `function(x = 1, y = 2){x + y}`
+  -`return` will provide a value as output
+  - `print` will simply print the value on the screen but not save it
 - Apply your functions with `sapply(<a vector or list>, some_function)`
-- Use `across()` to apply functions across multiple columns of data
+- Use `across()` to apply functions across multiple columns of data 
+- need to use `across` within `summarize()` or `mutate()`
+- `purrr` is a package that you can use to do more iterative work easily
+- can use `sapply` or `purrr` to work with multiple data frames within lists simultaneously
 
 ## Website
 

diff --git a/modules/Functions/lab/Functions_Lab.Rmd b/modules/Functions/lab/Functions_Lab.Rmd
@@ -13,11 +13,20 @@ library(dplyr)
 library(ggplot2)
 ```
 
-1. Create a function that takes one argument, a vector, and returns the sum of the vector squared. Call it "sum_squared". Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500.
+1. Create a function that takes one argument, a vector, and returns the sum of the vector and squares the result. Call it "sum_squared". Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500.
 
 ```
 # General format
-NEW_FUNCTION <- function(x, y) x + y ..
+NEW_FUNCTION <- function(x, y) x + y 
+```
+or
+
+```
+# General format
+NEW_FUNCTION <- function(x, y){
+result <- x + y 
+return(result)
+}
 ```
 
 ```{r}
@@ -42,7 +51,7 @@ NEW_FUNCTION <- function(x, y) x + y ..
 
 ```
 
-5. We want to get some summary statistics on the Moderna vaccines. Use `across` inside `summarize` to get the sum total number vaccine doses for any variable containing the word "Moderna" or starting with "Total". **Hint**: use `matches()` AND `starts_with()` to select the right columns inside `across`. Keep in mind that this includes the United States as a whole and so is not totally accurate!
+5. We want to get some summary statistics on the Moderna vaccines. Use `across` inside `summarize` to get the sum total number vaccine doses for any variable containing the word "Moderna" or starting with "Total". **Hint**: use `contains()` AND `starts_with()` to select the right columns inside `across`. Keep in mind that this includes the United States as a whole and so it is not totally accurate! Remember that `NA` values can influence calculations.
 
 ```
 # General format
@@ -58,7 +67,7 @@ data %>%
 
 ```
 
-6. Use `across` and `mutate` to convert all columns containing the word "Percent" into proportions (i.e., divide that value by 100). **Hint**: use `matches()` to select the right columns. Use a "function on the fly" to divide by 100. It will also be easier to check your work if you `select()` columns that match "Percent".
+6. Use `across` and `mutate` to convert all columns containing the word "Percent" into proportions (i.e., divide that value by 100). **Hint**: use `contains()` to select the right columns. Use a "function on the fly" to divide by 100. It will also be easier to check your work if you `select()` columns that match "Percent".
 
 ```{r}