Skip to content

Commit

Permalink
Merge pull request #348 from jhudsl/stats23
Browse files Browse the repository at this point in the history
Functions
  • Loading branch information
carriewright11 authored Jan 20, 2023
2 parents 809f65d + 837a1b8 commit 31b7d56
Show file tree
Hide file tree
Showing 3 changed files with 185 additions and 62 deletions.
205 changes: 155 additions & 50 deletions modules/Functions/Functions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -52,29 +52,54 @@ times_2 <- function(x) {
times_2(x = 10)
```


## Writing your own functions

The general syntax for a function is:

```
functionName <- function(inputs) {
<function body>
return(value)
}
```


## Writing your own functions: `return`

If we want something specific for the function's output, we use `return()`:

```{r comment=""}
times_2 <- function(x) {
output <- x * 2
times_2_plus_4 <- function(x) {
output_int <- x * 2
output <- output_int + 4
return(output)
}
times_2(x = 10)
times_2_plus_4(x = 10)
```

## Writing your own functions
## Writing your own functions: print intermediate steps

The general syntax for a function is:
- printed results do not stay around but can show what a function is doing
- returned results stay around
- can only return one result but can print many

```
functionName <- function(inputs) {
<function body>
return(value)
## Adding print

```{r comment=""}
times_2_plus_4 <- function(x) {
output_int <- x * 2
output <- output_int + 4
print(paste("times2 result = ", output_int))
return(output)
}
result <-times_2_plus_4(x = 10)
result
```


## Writing your own functions: multiple inputs

Functions can take multiple inputs:
Expand All @@ -84,6 +109,22 @@ times_2_plus_y <- function(x, y) x * 2 + y
times_2_plus_y(x = 10, y = 3)
```


## Writing your own functions: multiple outputs

Functions can have one returned result with multiple outputs.

```{r comment=""}
x_and_y_plus_2<- function(x, y){
output1 <- x + 2
output2 <- y + 2
return(c(output1,output2))
}
result <-x_and_y_plus_2(x = 10, y = 3)
result
```

## Writing your own functions: defaults

Functions can have "default" arguments. This lets us use the function without using an argument later:
Expand Down Expand Up @@ -129,18 +170,16 @@ loud(word = "hooray!")
We can use `filter(row_number()==n)` to extract a row of a tibble:

```{r message=FALSE}
cars <- read_kaggle()
get_row <- function(dat, row) dat %>% filter(row_number() == row)
```
```{r echo=FALSE}
# So extra columns don't clutter the slide
cars <- cars %>% select(1:10)
cars <- read_kaggle()
cars <- cars %>% select(1:8)
```


```{r}
get_row(dat = cars, row = 10)
get_row(dat = iris, row = 4)
```

```{r echo=FALSE}
Expand All @@ -156,7 +195,7 @@ cars <- read_kaggle()
get_index <- function(dat, row, col) {
dat %>%
filter(row_number() == row) %>%
select(col)
select(all_of(col))
}
get_index(dat = cars, row = 10, col = 8)
Expand All @@ -170,67 +209,88 @@ Including default values for arguments:
get_top <- function(dat, row = 1, col = 1) {
dat %>%
filter(row_number() == row) %>%
select(col)
select(all_of(col))
}
get_top(dat = cars)
```

## Using your custom functions: `sapply()`
# Functions on multiple columns

## Using your custom functions: `sapply()`- a base R function

Now that you've made a function... You can "apply" functions easily with `sapply()`!

These functions take the form:

```
sapply(<a vector or list>, some_function)
sapply(<a vector, list, data frame>, some_function)
```

## Using your custom functions: `sapply()`

`r emo::ji("rotating_light")` There are no parentheses on the functions! `r emo::ji("rotating_light")`

You can also pipe into your function.

```{r comment=""}
sapply(cars, class)
head(iris, n = 2)
sapply(iris, class)
iris %>% sapply(class)
```

## Using your custom functions: `sapply()`

```{r}
sapply(pull(cars, VehOdo), times_2_plus_y)
select(cars, VehYear:VehicleAge) %>% head()
select(cars, VehYear:VehicleAge) %>% sapply(times_2) %>% head()
```

## Using your custom functions "on the fly" to iterate

```{r comment=""}
sapply(pull(cars, VehOdo), function(x) x / 1000)
select(cars, VehYear:VehicleAge) %>%
sapply(function(x) x / 1000) %>% head()
```
# across

## Applying functions with `across` from `dplyr`

`across()` makes it easy to apply the same transformation to multiple columns. Usually used with `summarize()`.
`across()` makes it easy to apply the same transformation to multiple columns. Usually used with `summarize()` or `mutate()`.

```
across( .cols = <columns>, .fns = function, ... )
summarize(across( .cols = <columns>, .fns = function, ... ))
```
or
```
mutate(across(.cols = <columns>, .fns = function, ...))
```

- List columns first : `.cols = `
- List function next: `.fns = `
- Then list any arguments for the function

## Applying functions with `across` from `dplyr`.{.codesmall}
## Applying functions with `across` from `dplyr`

Combining with `summarize()`:
Combining with `summarize()`

```{r warning=FALSE}
cars_dbl <- cars %>% select(Make, Model, where(is.double))
cars_dbl <- cars %>% select(Make, starts_with("Veh"))
cars_dbl %>%
summarize(across(.cols = everything(), .fns = mean))
```

## Applying functions with `across` from `dplyr`

Can use with other tidyverse functions like `group_by`!

```{r}
cars_dbl %>%
group_by(Make) %>%
summarize(across(.cols = everything(), .fns = mean))
```

## Applying functions with `across` from `dplyr`.{.codesmall}
## Applying functions with `across` from `dplyr`

Combining with `summarize()`:

Expand All @@ -242,42 +302,33 @@ cars_dbl %>%
summarize(across(.cols = everything(), .fns = mean, na.rm = TRUE))
```

## Applying functions with `across` from `dplyr`.{.codesmall}
## Applying functions with `across` from `dplyr`

Using different `tidyselect()` options:

```{r warning=FALSE}
cars_dbl %>%
cars_dbl %>%
group_by(Make) %>%
summarize(across(.cols = starts_with("Veh"), .fns = mean))
```

## Applying functions with `across` from `dplyr`.{.codesmall}
## Applying functions with `across` from `dplyr`

Combining with `mutate()`:
Combining with `mutate()`: rounding to the nearest power of 10 (with negative digits value)

```{r}
cars_dbl %>%
mutate(across(.cols = starts_with("Veh"), .fns = round, digits = -3))
mutate(across(
.cols = starts_with("Veh"),
.fns = round,
digits = -3))
```

## Applying functions with `across` from `dplyr`.{.codesmall}

Combining with `mutate()`:

```{r}
cars_dbl %>%
mutate(across(
.cols = everything(),
.fns = str_replace_all,
pattern = "A",
replacement = "a"
))
```

## Applying functions with `across` from `dplyr`.{.codesmall}

Combining with `mutate()`:
Combining with `mutate()` - the `replace_na` function

```{r warning=FALSE, message=FALSE}
# Child mortality data
Expand All @@ -292,13 +343,67 @@ mort %>%
))
```

## Use custom functions within `mutate` and `across`

```{r}
times1000 <- function(x) x *1000
airquality %>%
mutate(across(
.cols = everything(),
.fns = times1000
)) %>% head(n = 2)
airquality %>%
mutate(across(
.cols = everything(),
.fns = function(x) x *1000
)) %>% head(n = 2)
```


## `purrr` package

Similar to across, `purrr` is a package that allows you to apply a function to multiple columns in a data frame or multiple data objects in a list.

## map_df

```{r}
library(purrr)
airquality %>% map_df(replace_na, replace = 0)
```

# Multiple Data Frames

## Multiple data frames

Lists help us work with multiple data frames

```{r}
AQ_list <- list( AQ1 = airquality, AQ2 = airquality, AQ3 = airquality)
str(AQ_list)
```


## Multiple data frames: `sapply`

```{r}
AQ_list %>% sapply(summary)
```


## Summary

- Simple functions take the form:
- `NEW_FUNCTION <- function(x, y) x + y ..`
- Can specify defaults like `function(x = 1, y = 2)`
- `NEW_FUNCTION <- function(x, y){x + y}`
- Can specify defaults like `function(x = 1, y = 2){x + y}`
-`return` will provide a value as output
- `print` will simply print the value on the screen but not save it
- Apply your functions with `sapply(<a vector or list>, some_function)`
- Use `across()` to apply functions across multiple columns of data
- Use `across()` to apply functions across multiple columns of data
- need to use `across` within `summarize()` or `mutate()`
- `purrr` is a package that you can use to do more iterative work easily
- can use `sapply` or `purrr` to work with multiple data frames within lists simultaneously

## Website

Expand Down
17 changes: 13 additions & 4 deletions modules/Functions/lab/Functions_Lab.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,20 @@ library(dplyr)
library(ggplot2)
```

1. Create a function that takes one argument, a vector, and returns the sum of the vector squared. Call it "sum_squared". Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500.
1. Create a function that takes one argument, a vector, and returns the sum of the vector and squares the result. Call it "sum_squared". Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500.

```
# General format
NEW_FUNCTION <- function(x, y) x + y ..
NEW_FUNCTION <- function(x, y) x + y
```
or

```
# General format
NEW_FUNCTION <- function(x, y){
result <- x + y
return(result)
}
```

```{r}
Expand All @@ -42,7 +51,7 @@ NEW_FUNCTION <- function(x, y) x + y ..
```

5. We want to get some summary statistics on the Moderna vaccines. Use `across` inside `summarize` to get the sum total number vaccine doses for any variable containing the word "Moderna" or starting with "Total". **Hint**: use `matches()` AND `starts_with()` to select the right columns inside `across`. Keep in mind that this includes the United States as a whole and so is not totally accurate!
5. We want to get some summary statistics on the Moderna vaccines. Use `across` inside `summarize` to get the sum total number vaccine doses for any variable containing the word "Moderna" or starting with "Total". **Hint**: use `contains()` AND `starts_with()` to select the right columns inside `across`. Keep in mind that this includes the United States as a whole and so it is not totally accurate! Remember that `NA` values can influence calculations.

```
# General format
Expand All @@ -58,7 +67,7 @@ data %>%
```

6. Use `across` and `mutate` to convert all columns containing the word "Percent" into proportions (i.e., divide that value by 100). **Hint**: use `matches()` to select the right columns. Use a "function on the fly" to divide by 100. It will also be easier to check your work if you `select()` columns that match "Percent".
6. Use `across` and `mutate` to convert all columns containing the word "Percent" into proportions (i.e., divide that value by 100). **Hint**: use `contains()` to select the right columns. Use a "function on the fly" to divide by 100. It will also be easier to check your work if you `select()` columns that match "Percent".

```{r}
Expand Down
Loading

0 comments on commit 31b7d56

Please sign in to comment.