diff --git a/modules/Functions/Functions.Rmd b/modules/Functions/Functions.Rmd index 1c9c4934d..38ce6c7ce 100644 --- a/modules/Functions/Functions.Rmd +++ b/modules/Functions/Functions.Rmd @@ -52,29 +52,54 @@ times_2 <- function(x) { times_2(x = 10) ``` + +## Writing your own functions + +The general syntax for a function is: + +``` +functionName <- function(inputs) { + +return(value) +} +``` + + ## Writing your own functions: `return` If we want something specific for the function's output, we use `return()`: ```{r comment=""} -times_2 <- function(x) { - output <- x * 2 +times_2_plus_4 <- function(x) { + output_int <- x * 2 + output <- output_int + 4 return(output) } -times_2(x = 10) +times_2_plus_4(x = 10) ``` -## Writing your own functions +## Writing your own functions: print intermediate steps -The general syntax for a function is: + - printed results do not stay around but can show what a function is doing + - returned results stay around + - can only return one result but can print many -``` -functionName <- function(inputs) { - -return(value) +## Adding print + +```{r comment=""} +times_2_plus_4 <- function(x) { + output_int <- x * 2 + output <- output_int + 4 + print(paste("times2 result = ", output_int)) + return(output) } + +result <-times_2_plus_4(x = 10) +result + ``` + ## Writing your own functions: multiple inputs Functions can take multiple inputs: @@ -84,6 +109,22 @@ times_2_plus_y <- function(x, y) x * 2 + y times_2_plus_y(x = 10, y = 3) ``` + +## Writing your own functions: multiple outputs + +Functions can have one returned result with multiple outputs. + +```{r comment=""} +x_and_y_plus_2<- function(x, y){ + output1 <- x + 2 + output2 <- y + 2 + +return(c(output1,output2)) +} +result <-x_and_y_plus_2(x = 10, y = 3) +result +``` + ## Writing your own functions: defaults Functions can have "default" arguments. This lets us use the function without using an argument later: @@ -129,18 +170,16 @@ loud(word = "hooray!") We can use `filter(row_number()==n)` to extract a row of a tibble: ```{r message=FALSE} -cars <- read_kaggle() - get_row <- function(dat, row) dat %>% filter(row_number() == row) -``` -```{r echo=FALSE} -# So extra columns don't clutter the slide -cars <- cars %>% select(1:10) +cars <- read_kaggle() +cars <- cars %>% select(1:8) ``` + ```{r} get_row(dat = cars, row = 10) +get_row(dat = iris, row = 4) ``` ```{r echo=FALSE} @@ -156,7 +195,7 @@ cars <- read_kaggle() get_index <- function(dat, row, col) { dat %>% filter(row_number() == row) %>% - select(col) + select(all_of(col)) } get_index(dat = cars, row = 10, col = 8) @@ -170,67 +209,88 @@ Including default values for arguments: get_top <- function(dat, row = 1, col = 1) { dat %>% filter(row_number() == row) %>% - select(col) + select(all_of(col)) } get_top(dat = cars) ``` -## Using your custom functions: `sapply()` +# Functions on multiple columns + +## Using your custom functions: `sapply()`- a base R function Now that you've made a function... You can "apply" functions easily with `sapply()`! These functions take the form: ``` -sapply(, some_function) +sapply(, some_function) ``` ## Using your custom functions: `sapply()` `r emo::ji("rotating_light")` There are no parentheses on the functions! `r emo::ji("rotating_light")` +You can also pipe into your function. + ```{r comment=""} -sapply(cars, class) +head(iris, n = 2) +sapply(iris, class) +iris %>% sapply(class) ``` ## Using your custom functions: `sapply()` ```{r} -sapply(pull(cars, VehOdo), times_2_plus_y) +select(cars, VehYear:VehicleAge) %>% head() +select(cars, VehYear:VehicleAge) %>% sapply(times_2) %>% head() ``` ## Using your custom functions "on the fly" to iterate ```{r comment=""} -sapply(pull(cars, VehOdo), function(x) x / 1000) +select(cars, VehYear:VehicleAge) %>% + sapply(function(x) x / 1000) %>% head() ``` +# across ## Applying functions with `across` from `dplyr` -`across()` makes it easy to apply the same transformation to multiple columns. Usually used with `summarize()`. +`across()` makes it easy to apply the same transformation to multiple columns. Usually used with `summarize()` or `mutate()`. ``` -across( .cols = , .fns = function, ... ) +summarize(across( .cols = , .fns = function, ... )) +``` +or +``` +mutate(across(.cols = , .fns = function, ...)) ``` - - List columns first : `.cols = ` - List function next: `.fns = ` - Then list any arguments for the function -## Applying functions with `across` from `dplyr`.{.codesmall} +## Applying functions with `across` from `dplyr` -Combining with `summarize()`: +Combining with `summarize()` ```{r warning=FALSE} -cars_dbl <- cars %>% select(Make, Model, where(is.double)) +cars_dbl <- cars %>% select(Make, starts_with("Veh")) +cars_dbl %>% + summarize(across(.cols = everything(), .fns = mean)) +``` + +## Applying functions with `across` from `dplyr` + +Can use with other tidyverse functions like `group_by`! + +```{r} cars_dbl %>% group_by(Make) %>% summarize(across(.cols = everything(), .fns = mean)) ``` -## Applying functions with `across` from `dplyr`.{.codesmall} +## Applying functions with `across` from `dplyr` Combining with `summarize()`: @@ -242,42 +302,33 @@ cars_dbl %>% summarize(across(.cols = everything(), .fns = mean, na.rm = TRUE)) ``` -## Applying functions with `across` from `dplyr`.{.codesmall} +## Applying functions with `across` from `dplyr` Using different `tidyselect()` options: ```{r warning=FALSE} -cars_dbl %>% +cars_dbl %>% group_by(Make) %>% summarize(across(.cols = starts_with("Veh"), .fns = mean)) ``` -## Applying functions with `across` from `dplyr`.{.codesmall} +## Applying functions with `across` from `dplyr` -Combining with `mutate()`: +Combining with `mutate()`: rounding to the nearest power of 10 (with negative digits value) ```{r} cars_dbl %>% - mutate(across(.cols = starts_with("Veh"), .fns = round, digits = -3)) + mutate(across( + .cols = starts_with("Veh"), + .fns = round, + digits = -3)) ``` -## Applying functions with `across` from `dplyr`.{.codesmall} -Combining with `mutate()`: - -```{r} -cars_dbl %>% - mutate(across( - .cols = everything(), - .fns = str_replace_all, - pattern = "A", - replacement = "a" - )) -``` ## Applying functions with `across` from `dplyr`.{.codesmall} -Combining with `mutate()`: +Combining with `mutate()` - the `replace_na` function ```{r warning=FALSE, message=FALSE} # Child mortality data @@ -292,13 +343,67 @@ mort %>% )) ``` +## Use custom functions within `mutate` and `across` + +```{r} +times1000 <- function(x) x *1000 + +airquality %>% + mutate(across( + .cols = everything(), + .fns = times1000 + )) %>% head(n = 2) + +airquality %>% + mutate(across( + .cols = everything(), + .fns = function(x) x *1000 + )) %>% head(n = 2) +``` + + +## `purrr` package + +Similar to across, `purrr` is a package that allows you to apply a function to multiple columns in a data frame or multiple data objects in a list. + +## map_df + +```{r} +library(purrr) +airquality %>% map_df(replace_na, replace = 0) +``` + +# Multiple Data Frames + +## Multiple data frames + +Lists help us work with multiple data frames + +```{r} +AQ_list <- list( AQ1 = airquality, AQ2 = airquality, AQ3 = airquality) +str(AQ_list) +``` + + +## Multiple data frames: `sapply` + +```{r} +AQ_list %>% sapply(summary) +``` + + ## Summary - Simple functions take the form: - - `NEW_FUNCTION <- function(x, y) x + y ..` - - Can specify defaults like `function(x = 1, y = 2)` + - `NEW_FUNCTION <- function(x, y){x + y}` + - Can specify defaults like `function(x = 1, y = 2){x + y}` + -`return` will provide a value as output + - `print` will simply print the value on the screen but not save it - Apply your functions with `sapply(, some_function)` -- Use `across()` to apply functions across multiple columns of data +- Use `across()` to apply functions across multiple columns of data +- need to use `across` within `summarize()` or `mutate()` +- `purrr` is a package that you can use to do more iterative work easily +- can use `sapply` or `purrr` to work with multiple data frames within lists simultaneously ## Website diff --git a/modules/Functions/lab/Functions_Lab.Rmd b/modules/Functions/lab/Functions_Lab.Rmd index 2d4d49a6a..f1c1d92f0 100644 --- a/modules/Functions/lab/Functions_Lab.Rmd +++ b/modules/Functions/lab/Functions_Lab.Rmd @@ -13,11 +13,20 @@ library(dplyr) library(ggplot2) ``` -1. Create a function that takes one argument, a vector, and returns the sum of the vector squared. Call it "sum_squared". Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500. +1. Create a function that takes one argument, a vector, and returns the sum of the vector and squares the result. Call it "sum_squared". Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500. ``` # General format -NEW_FUNCTION <- function(x, y) x + y .. +NEW_FUNCTION <- function(x, y) x + y +``` +or + +``` +# General format +NEW_FUNCTION <- function(x, y){ +result <- x + y +return(result) +} ``` ```{r} @@ -42,7 +51,7 @@ NEW_FUNCTION <- function(x, y) x + y .. ``` -5. We want to get some summary statistics on the Moderna vaccines. Use `across` inside `summarize` to get the sum total number vaccine doses for any variable containing the word "Moderna" or starting with "Total". **Hint**: use `matches()` AND `starts_with()` to select the right columns inside `across`. Keep in mind that this includes the United States as a whole and so is not totally accurate! +5. We want to get some summary statistics on the Moderna vaccines. Use `across` inside `summarize` to get the sum total number vaccine doses for any variable containing the word "Moderna" or starting with "Total". **Hint**: use `contains()` AND `starts_with()` to select the right columns inside `across`. Keep in mind that this includes the United States as a whole and so it is not totally accurate! Remember that `NA` values can influence calculations. ``` # General format @@ -58,7 +67,7 @@ data %>% ``` -6. Use `across` and `mutate` to convert all columns containing the word "Percent" into proportions (i.e., divide that value by 100). **Hint**: use `matches()` to select the right columns. Use a "function on the fly" to divide by 100. It will also be easier to check your work if you `select()` columns that match "Percent". +6. Use `across` and `mutate` to convert all columns containing the word "Percent" into proportions (i.e., divide that value by 100). **Hint**: use `contains()` to select the right columns. Use a "function on the fly" to divide by 100. It will also be easier to check your work if you `select()` columns that match "Percent". ```{r} diff --git a/modules/Functions/lab/Functions_Lab_Key.Rmd b/modules/Functions/lab/Functions_Lab_Key.Rmd index 0d265984e..f8e060c45 100644 --- a/modules/Functions/lab/Functions_Lab_Key.Rmd +++ b/modules/Functions/lab/Functions_Lab_Key.Rmd @@ -13,11 +13,20 @@ library(dplyr) library(ggplot2) ``` -1. Create a function that takes one argument, a vector, and returns the sum of the vector squared. Call it "sum_squared". Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500. +1. Create a function that takes one argument, a vector, and returns the sum of the vector and squares the result. Call it "sum_squared". Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500. ``` # General format -NEW_FUNCTION <- function(x, y) x + y .. +NEW_FUNCTION <- function(x, y) x + y +``` +or + +``` +# General format +NEW_FUNCTION <- function(x, y){ +result <- x + y +return(result) +} ``` ```{r} @@ -61,7 +70,7 @@ vacc <- read_csv("http://jhudatascience.org/intro_to_r/data/USA_covid19_vaccinat # vacc <- read_csv("USA_covid19_vaccinations.csv") ``` -5. We want to get some summary statistics on the Moderna vaccines. Use `across` inside `summarize` to get the sum total number vaccine doses for any variable containing the word "Moderna" or starting with "Total". **Hint**: use `matches()` AND `starts_with()` to select the right columns inside `across`. Keep in mind that this includes the United States as a whole and so is not totally accurate! +5. We want to get some summary statistics on the Moderna vaccines. Use `across` inside `summarize` to get the sum total number vaccine doses for any variable containing the word "Moderna" or starting with "Total". **Hint**: use `contains()` AND `starts_with()` to select the right columns inside `across`. Keep in mind that this includes the United States as a whole and so it is not totally accurate! Remember that `NA` values can influence calculations. ``` # General format @@ -76,26 +85,26 @@ data %>% ```{r} vacc %>% summarize(across( - .cols = matches("Moderna") & starts_with("Total"), + .cols = contains("Moderna") & starts_with("Total"), .fns = sum )) vacc %>% summarize(across( - .cols = matches("Moderna") & starts_with("Total"), + .cols = contains("Moderna") & starts_with("Total"), .fns = sum, na.rm = TRUE )) ``` -6. Use `across` and `mutate` to convert all columns containing the word "Percent" into proportions (i.e., divide that value by 100). **Hint**: use `matches()` to select the right columns. Use a "function on the fly" to divide by 100. It will also be easier to check your work if you `select()` columns that match "Percent". +6. Use `across` and `mutate` to convert all columns containing the word "Percent" into proportions (i.e., divide that value by 100). **Hint**: use `contains()` to select the right columns. Use a "function on the fly" to divide by 100. It will also be easier to check your work if you `select()` columns that match "Percent". ```{r} vacc %>% mutate(across( - .cols = matches("Percent"), + .cols = contains("Percent"), .fns = function(x) x / 100 )) %>% - select(matches("Percent")) + select(contains("Percent")) ``` 7. Use `across` and `mutate` to convert all columns starting with the word "Total" into a binary variable: TRUE if the value is greater than 10,000,000 and FALSE if less than or equal to 10,000,000. **Hint**: use `starts_with()` to select the columns starting with "Total". Use a "function on the fly" to do a logical test if the value is greater than 10,000,000.