Skip to content

Commit 1442a28

Browse files
authored
Merge pull request #234 from fhdsl/functions
Last minute changes to Functions
2 parents 000d817 + 1c5cda3 commit 1442a28

File tree

2 files changed

+125
-96
lines changed

2 files changed

+125
-96
lines changed

modules/Functions/Functions.Rmd

Lines changed: 52 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ library(dplyr)
1111
library(knitr)
1212
library(stringr)
1313
library(tidyr)
14-
library(emo)
1514
library(readr)
1615
opts_chunk$set(comment = "")
1716
```
@@ -192,7 +191,7 @@ loud(word = "hooray!")
192191
<!-- ``` -->
193192

194193

195-
## Functions for tibbles - curly braces{.codesmall}
194+
## Functions for tibbles - curly braces
196195

197196
```{r}
198197
# get means and missing for a specific column
@@ -203,23 +202,32 @@ get_summary <- function(dataset, col_name) {
203202
}
204203
```
205204

206-
Examples:
205+
## Functions for tibbles - example{.codesmall}
207206

208-
```{r}
207+
```{r message = FALSE}
209208
er <- read_csv(file = "https://daseh.org/data/CO_ER_heat_visits.csv")
209+
```
210+
211+
```{r}
210212
get_summary(er, visits)
213+
```
214+
215+
```{r message = FALSE}
216+
yearly_co2 <-
217+
read_csv(file = "https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv")
218+
```
211219

212-
yearly_co2 <- read_csv(file = "https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv")
220+
```{r}
213221
get_summary(yearly_co2, `2014`)
214222
```
215223

216224
## Summary
217225

218226
- Simple functions take the form:
219227
- `NEW_FUNCTION <- function(x, y){x + y}`
220-
- Can specify defaults like `function(x = 1, y = 2){x + y}`
221-
-`return` will provide a value as output
222-
- `print` will simply print the value on the screen but not save it
228+
- Can specify defaults like `function(x = 1, y = 2){x + y}`
229+
- `return` will provide a value as output
230+
- Specify a column (from a tibble) inside a function using `{{double curly braces}}`
223231

224232

225233
## Lab Part 1
@@ -245,7 +253,7 @@ sapply(<a vector, list, data frame>, some_function)
245253

246254
Let's apply a function to look at the CO heat-related ER visits dataset.
247255

248-
`r emo::ji("rotating_light")` There are no parentheses on the functions! `r emo::ji("rotating_light")`
256+
🚨There are no parentheses on the functions!🚨
249257

250258
You can also pipe into your function.
251259

@@ -357,7 +365,6 @@ er %>%
357365
))
358366
```
359367

360-
361368
## Applying functions with `across` from `dplyr`
362369

363370
Using different `tidyselect()` options (e.g., `starts_with()`, `ends_with()`, `contains()`)
@@ -368,20 +375,6 @@ er %>%
368375
summarize(across(contains("cl"), mean, na.rm=T))
369376
```
370377

371-
372-
<!-- ## Applying functions with `across` from `dplyr`{.codesmall} -->
373-
374-
<!-- `mutate()` across to round across many columns at once! -->
375-
376-
<!-- ```{r} -->
377-
<!-- calenviroscreen %>% -->
378-
<!-- mutate(across( -->
379-
<!-- where(is.numeric), -->
380-
<!-- function(x) round(x, digits = 0) -->
381-
<!-- )) %>% select(7:13) -->
382-
<!-- ``` -->
383-
384-
385378
## Applying functions with `across` from `dplyr` {.smaller}
386379

387380
Combining with `mutate()` - the `replace_na` function
@@ -401,29 +394,15 @@ yearly_co2 %>%
401394
))
402395
```
403396

397+
## GUT CHECK!
404398

405-
<!-- ## Use custom functions within `mutate` and `across` -->
399+
Why use `across()`?
406400

407-
<!-- If your function needs to span more than one line, better to define it first before using inside `mutate()` and `across()`. -->
401+
A. Efficiency - faster and less repetitive
408402

409-
<!-- ```{r} -->
410-
<!-- times1000 <- function(x) x * 1000 -->
411-
412-
<!-- airquality %>% -->
413-
<!-- mutate(across( -->
414-
<!-- everything(), -->
415-
<!-- .fns = times1000 -->
416-
<!-- )) %>% -->
417-
<!-- head(n = 2) -->
418-
419-
<!-- airquality %>% -->
420-
<!-- mutate(across( -->
421-
<!-- everything(), -->
422-
<!-- .fns = function(x) x * 1000 -->
423-
<!-- )) %>% -->
424-
<!-- head(n = 2) -->
425-
<!-- ``` -->
403+
B. Calculate the cross product
426404

405+
C. Connect across datasets
427406

428407
## `purrr` package
429408

@@ -433,22 +412,29 @@ While we won't get into `purrr` too much in this class, its a handy package for
433412

434413
# Multiple Data Frames
435414

436-
## Multiple data frames {.smaller}
415+
## Multiple data frames
437416

438-
Lists help us work with multiple data frames
417+
Lists help us work with multiple tibbles / data frames
439418

440419
```{r}
441-
AQ_list <- list(AQ1 = airquality, AQ2 = airquality, AQ3 = airquality)
442-
str(AQ_list)
420+
df_list <- list(AQ = airquality, er = er, yearly_co2 = yearly_co2)
443421
```
444422

423+
<br>
424+
425+
`select()` from each tibble the numeric columns:
426+
427+
```{r}
428+
df_list <-
429+
df_list %>%
430+
sapply(function(x) select(x, where(is.numeric)))
431+
```
445432

446-
## Multiple data frames: `sapply`
433+
## Multiple data frames: `sapply` {.smaller}
447434

448435
```{r}
449-
AQ_list %>% sapply(class)
450-
AQ_list %>% sapply(nrow)
451-
AQ_list %>% sapply(colMeans, na.rm = TRUE)
436+
df_list %>% sapply(nrow)
437+
df_list %>% sapply(colMeans, na.rm = TRUE)
452438
```
453439

454440

@@ -457,7 +443,7 @@ AQ_list %>% sapply(colMeans, na.rm = TRUE)
457443
- Apply your functions with `sapply(<a vector or list>, some_function)`
458444
- Use `across()` to apply functions across multiple columns of data
459445
- Need to use `across` within `summarize()` or `mutate()`
460-
- Can use `sapply` or `purrr` to work with multiple data frames within lists simultaneously
446+
- Can use `sapply` (or `purrr` package) to work with multiple data frames within lists simultaneously
461447

462448

463449
## Lab Part 2
@@ -466,7 +452,20 @@ AQ_list %>% sapply(colMeans, na.rm = TRUE)
466452

467453
💻 [Lab](https://daseh.org/modules/Functions/lab/Functions_Lab.Rmd)
468454

469-
```{r, fig.alt="The End", out.width = "50%", echo = FALSE, fig.align='center'}
455+
📃 [Day 9 Cheatsheet](https://daseh.org/modules/cheatsheets/Day-9.pdf)
456+
457+
📃 [Posit's purrr Cheatsheet](https://rstudio.github.io/cheatsheets/purrr.pdf)
458+
459+
## Research Survey
460+
461+
<br>
462+
463+
https://forms.gle/jVue79CjgoMmbVbg9
464+
465+
<br>
466+
<br>
467+
468+
```{r, fig.alt="The End", out.width = "30%", echo = FALSE, fig.align='center'}
470469
knitr::include_graphics(here::here("images/the-end-g23b994289_1280.jpg"))
471470
```
472471

modules/Functions/lab/Functions_Lab_Key.Rmd

Lines changed: 73 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -11,29 +11,21 @@ knitr::opts_chunk$set(echo = TRUE)
1111

1212
# Part 1
1313

14-
Load all the libraries we will use in this lab.
14+
Load the `tidyverse` package.
1515

1616
```{r message=FALSE}
1717
library(tidyverse)
1818
```
1919

2020
### 1.1
2121

22-
Create a function that takes one argument, a vector, and returns the sum of the vector and then squares the result. Call it "sum_squared". Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500.
22+
Create a function that:
2323

24-
```
25-
# General format
26-
NEW_FUNCTION <- function(x, y) x + y
27-
```
28-
or
29-
30-
```
31-
# General format
32-
NEW_FUNCTION <- function(x, y){
33-
result <- x + y
34-
return(result)
35-
}
36-
```
24+
* Takes one argument, a vector.
25+
* Returns the sum of the vector and then squares the result.
26+
* Call it "sum_squared".
27+
* Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500.
28+
* Format is `NEW_FUNCTION <- function(x, y) x + y`
3729

3830
```{r 1.1response}
3931
nums <- c(2, 7, 21, 30, 90)
@@ -50,7 +42,12 @@ sum_squared(x = nums)
5042

5143
### 1.2
5244

53-
Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). **Hint**: use `%in%`. Call it `has_n`. Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.
45+
Create a function that:
46+
47+
* takes two arguments, (1) a vector and (2) a numeric value.
48+
* This function tests whether the number (2) is contained within the vector (1). **Hint**: use `%in%`.
49+
* Call it `has_n`.
50+
* Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.
5451

5552
```{r 1.2response}
5653
nums <- c(2, 7, 21, 30, 90)
@@ -74,11 +71,24 @@ has_n(x = nums)
7471

7572
### P.1
7673

77-
Create a new number `b_num` that is not contained with `nums`. Use your updated `has_n` function with the default value and add `b_num` as the `n` argument when calling the function. What is the outcome?
74+
Create a function for the CalEnviroScreen Data.
75+
76+
* Read in (https://daseh.org/data/CalEnviroScreen_data.csv)
77+
* The function takes an argument for a column name. (use `{{col_name}}`)
78+
* The function creates a ggplot with `{{col_name}}` on the x-axis and `Poverty` on the y-axis.
79+
* Use `geom_point()`
80+
* Test the function using the `Lead` column and `HousingBurden` columns, or other columns of your choice.
7881

7982
```{r P.1response}
80-
b_num <- 11
81-
has_n(x = nums, n = b_num)
83+
ces <- read_csv("https://daseh.org/data/CalEnviroScreen_data.csv")
84+
85+
plot_ces <- function(col_name){
86+
ggplot(data = ces, aes(x = {{col_name}}, y = Poverty)) +
87+
geom_point()
88+
}
89+
90+
plot_ces(Lead)
91+
plot_ces(HousingBurden)
8292
```
8393

8494

@@ -96,7 +106,12 @@ ces <- read_csv("https://daseh.org/data/CalEnviroScreen_data.csv")
96106

97107
### 2.2
98108

99-
We want to get some summary statistics on water contamination. Use `across` inside `summarize` to get the sum total variable containing the string "water" AND ending with "Pctl". **Hint**: use `contains()` AND `ends_with()` to select the right columns inside `across`. Remember that `NA` values can influence calculations.
109+
We want to get some summary statistics on water contamination.
110+
111+
* Use `across` inside `summarize`.
112+
* Choose columns about "water". **Hint**: use `contains("water")` inside `across`.
113+
* Use `mean` as the function inside of `across`.
114+
* Remember that `NA` values can influence calculations.
100115

101116
```
102117
# General format
@@ -110,19 +125,26 @@ data %>%
110125
```{r 2.2response}
111126
ces %>%
112127
summarize(across(
113-
contains("Water") & ends_with("Pctl"),
114-
sum
128+
contains("water"),
129+
mean
115130
))
131+
132+
# Accounting for NA
116133
ces %>%
117134
summarize(across(
118-
contains("Water") & ends_with("Pctl"),
119-
function(x) sum(x, na.rm = T)
135+
contains("water"),
136+
function(x) mean(x, na.rm = T)
120137
))
121138
```
122139

123140
### 2.3
124141

125-
Use `across` and `mutate` to convert all columns containing the word "water" into proportions (i.e., divide that value by 100). **Hint**: use `contains()` to select the right columns within `across()`. Use an anonymous function ("function on the fly") to divide by 100 (`function(x) x / 100`). It will also be easier to check your work if you `select()` columns that match "Pctl".
142+
Convert all columns that are percentiles into proportions.
143+
144+
* Use `across` and `mutate`
145+
* Choose columns that contain "Pctl" in the name. **Hint**: use `contains("Pctl")` inside `across`.
146+
* Use an anonymous function ("function on the fly") to divide by 100 (`function(x) x / 100`).
147+
* Check your work - It will also be easier if you `select(contains("Pctl"))`.
126148

127149
```
128150
# General format
@@ -136,7 +158,7 @@ data %>%
136158
```{r 2.3response}
137159
ces %>%
138160
mutate(across(
139-
contains("water"),
161+
contains("Pctl"),
140162
function(x) x / 100
141163
)) %>%
142164
select(contains("Pctl"))
@@ -149,42 +171,50 @@ ces %>%
149171

150172
Use `across` and `mutate` to convert all columns starting with the string "PM" into a binary variable: TRUE if the value is greater than 10 and FALSE if less than or equal to 10.
151173

152-
- **Hint**: use `starts_with()` to select the columns that start with "PM".
153-
- Use an anonymous function ("function on the fly") to do a logical test if the value is greater than 10.
154-
- A logical test with `mutate` will automatically fill a column with TRUE/FALSE.
174+
* **Hint**: use `starts_with()` to select the columns that start with "PM".
175+
* Use an anonymous function ("function on the fly") to do a logical test if the value is greater than 10.
176+
* A logical test with `mutate` (x > 10) will automatically fill a column with TRUE/FALSE.
155177

156178
```{r P.2response}
157179
ces %>%
158180
mutate(across(
159181
starts_with("PM"),
160182
function(x) x > 10
161-
))
183+
)) %>%
184+
glimpse() # add glimpse to view the changes
162185
```
163186

164187
### P.3
165188

166189
Take your code from previous question and assign it to the variable `ces_dat`.
167190

168-
- Use `filter()` to drop any rows where "Oakland" appears in `ApproxLocation`. Make sure to reassign this to `ces_dat`.
169-
- Create a ggplot boxplot (`geom_boxplot()`) where (1) the x-axis is `Asthma` and (2) the y-axis is `PM2.5`.
170-
- You change the `labs()` layer so that the x-axis is "ER Visits for Asthma: PM2.5 greater than 10"
191+
- Create a ggplot where the x-axis is `Asthma` and the y-axis is `PM2.5`.
192+
- Add a boxplot (`geom_boxplot()`)
193+
- Change the `labs()` layer so that the x-axis is "ER Visits for Asthma: PM2.5 greater than 10"
171194

172195
```{r P.3response}
173196
ces_dat <-
174197
ces %>%
175198
mutate(across(
176199
starts_with("PM"),
177200
function(x) x > 10
178-
)) %>%
179-
filter(ApproxLocation != "Oakland")
180-
181-
ces_boxplot <- function(df) {
182-
ggplot(df) +
183-
geom_boxplot(aes(
184-
x = `Asthma`,
185-
y = `PM2.5`
186-
)) +
201+
))
202+
203+
ggplot(data = ces_dat, aes(x = `Asthma`, y = `PM2.5`)) +
204+
geom_boxplot() +
205+
labs(x = "ER Visits for Asthma: PM2.5 greater than 10")
206+
207+
# Make everything a function if you like!
208+
ces_boxplot <- function() {
209+
ces %>%
210+
mutate(across(
211+
starts_with("PM"),
212+
function(x) x > 10
213+
)) %>%
214+
ggplot(aes(x = `Asthma`, y = `PM2.5`)) +
215+
geom_boxplot() +
187216
labs(x = "ER Visits for Asthma: PM2.5 greater than 10")
188217
}
189-
ces_boxplot(ces_dat)
218+
219+
ces_boxplot()
190220
```

0 commit comments

Comments
 (0)