select(n)
will choose column n
:
get_index <- function(dat, row, col) { - dat %>% - filter(row_number() == row) %>% - select(all_of(col)) -} - -get_index(dat = ces, row = 10, col = 7)- -
# A tibble: 1 × 1 - CES4.0Score - <dbl> -1 43.7- -
diff --git a/modules/Functions/Functions.html b/modules/Functions/Functions.html index ce2ac076..31ea37b6 100644 --- a/modules/Functions/Functions.html +++ b/modules/Functions/Functions.html @@ -3117,7 +3117,7 @@ div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;} ul.task-list{list-style: none;} pre > code.sourceCode { white-space: pre; position: relative; } -pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } +pre > code.sourceCode > span { line-height: 1.25; } pre > code.sourceCode > span:empty { height: 1.2em; } .sourceCode { overflow: visible; } code.sourceCode > span { color: inherit; text-decoration: inherit; } @@ -3473,45 +3473,10 @@
select(n)
will choose column n
:
get_index <- function(dat, row, col) { - dat %>% - filter(row_number() == row) %>% - select(all_of(col)) -} - -get_index(dat = ces, row = 10, col = 7)- -
# A tibble: 1 × 1 - CES4.0Score - <dbl> -1 43.7- -
Including default values for arguments:
- -get_top <- function(dat, row = 1, col = 1) { - dat %>% - filter(row_number() == row) %>% - select(all_of(col)) -} - -get_top(dat = ces)- -
# A tibble: 1 × 1 - CensusTract - <dbl> -1 6001400100- -
Can create function with an argument that allows inputting a column name for select
or other dplyr
operation:
clean_dataset <- function(dataset, col_name) { my_data_out <- dataset %>% select({{col_name}}) # Note the curly braces - write_csv(my_data_out, "clean_data.csv") return(my_data_out) } @@ -3532,6 +3497,27 @@ 10 43.7 # ℹ 8,025 more rows+
get_mean <- function(dat, county_name, col_name) { + my_data_out <- dat %>% + filter(str_detect(CaliforniaCounty, county_name)) %>% + summarise(mean = mean({{col_name}}, na.rm = TRUE)) + return(my_data_out) +} + +get_mean(dat = ces, county_name = "Alameda", col_name = CES4.0Score)+ +
# A tibble: 1 × 1 + mean + <dbl> +1 22.9+ +
get_mean(dat = ces, county_name = "Fresno", col_name = CES4.0Score)+ +
# A tibble: 1 × 1 + mean + <dbl> +1 40.9+
Combining with mutate()
- the replace_na
function
Here we will use the yearly_co2_emissions
data from dasehr
replace_na({data frame}, {list of values})
or replace_na({vector}, {single value})
yearly_co2_emissions %>% diff --git a/modules/Functions/lab/Functions_Lab.Rmd b/modules/Functions/lab/Functions_Lab.Rmd index 551b5d9f..661f8054 100644 --- a/modules/Functions/lab/Functions_Lab.Rmd +++ b/modules/Functions/lab/Functions_Lab.Rmd @@ -116,18 +116,18 @@ data %>% # Practice on Your Own! -### P.1 +### P.2 Use `across` and `mutate` to convert all columns starting with the string "PM" into a binary variable: TRUE if the value is greater than 10 and FALSE if less than or equal to 10. **Hint**: use `starts_with()` to select the columns that start with "PM". Use a "function on the fly" to do a logical test if the value is greater than 10. -```{r P.1response} +```{r P.2response} ``` -### P.2 +### P.3 Take your code from question 2.4 and assign it to the variable `ces_dat`. @@ -135,6 +135,6 @@ Take your code from question 2.4 and assign it to the variable `ces_dat`. - Create a ggplot boxplot (`geom_boxplot()`) where (1) the x-axis is `PM2.5` and (2) the y-axis is `Asthma`. - You change the `labs()` layer so that the x-axis is "ER Visits for Asthma: PM2.5 greater than 10" -```{r P.2response} +```{r P.3response} ``` diff --git a/modules/Functions/lab/Functions_Lab_Key.Rmd b/modules/Functions/lab/Functions_Lab_Key.Rmd index c97c3d87..8591cbf1 100644 --- a/modules/Functions/lab/Functions_Lab_Key.Rmd +++ b/modules/Functions/lab/Functions_Lab_Key.Rmd @@ -149,13 +149,13 @@ ces %>% # Practice on Your Own! -### P.1 +### P.2 Use `across` and `mutate` to convert all columns starting with the string "PM" into a binary variable: TRUE if the value is greater than 10 and FALSE if less than or equal to 10. **Hint**: use `starts_with()` to select the columns that start with "PM". Use a "function on the fly" to do a logical test if the value is greater than 10. -```{r P.1response} +```{r P.2response} ces %>% mutate(across( .cols = starts_with("PM"), @@ -164,7 +164,7 @@ ces %>% ``` -### P.2 +### P.3 Take your code from question 2.4 and assign it to the variable `ces_dat`. @@ -172,7 +172,7 @@ Take your code from question 2.4 and assign it to the variable `ces_dat`. - Create a ggplot boxplot (`geom_boxplot()`) where (1) the x-axis is `PM2.5` and (2) the y-axis is `Asthma`. - You change the `labs()` layer so that the x-axis is "ER Visits for Asthma: PM2.5 greater than 10" -```{r P.2response} +```{r P.3response} ces_dat <- ces %>% mutate(across( diff --git a/modules/Functions/lab/Functions_Lab_Key.html b/modules/Functions/lab/Functions_Lab_Key.html index bc375fef..8b09e46f 100644 --- a/modules/Functions/lab/Functions_Lab_Key.html +++ b/modules/Functions/lab/Functions_Lab_Key.html @@ -362,9 +362,9 @@Part 1
-1.1
Create a function that takes one argument, a vector, and returns the -sum of the vector and squares the result. Call it “sum_squared”. Test -your function on the vector
+sum of the vector and then squares the result. Call it “sum_squared”. +Test your function on the vectorc(2,7,21,30,90)
- you should -get the answer 22500.c(2,7,21,30,90)
- you +should get the answer 22500.# General format NEW_FUNCTION <- function(x, y) x + y
or
@@ -411,8 +411,8 @@1.3
has_n(x = nums)## [1] TRUE
-1.4
++P.1
Create a new number
b_num
that is not contained withnums
. Use your updatedhas_n
function with the default value and addb_num
as then
argument @@ -424,7 +424,7 @@1.4
Part 2
-+-2.1
Read in the CalEnviroScreen from https://daseh.org/data/CalEnviroScreen_data.csv. Assign the data the name “ces”.
@@ -432,7 +432,7 @@2.1
# If downloaded # ces <- read_csv("CalEnviroScreen_data.csv")+-2.2
We want to get some summary statistics on water contamination. Use
across
insidesummarize
to get the sum total @@ -479,14 +479,21 @@2.2
## <dbl> <dbl> <dbl> ## 1 403640. 304029. 256802.+-2.3
Use
+on the fly” to divide by 100 (across
andmutate
to convert all columns containing the word “Pctl” into proportions (i.e., divide that value by 100). Hint: usecontains()
to select the right columns withinacross()
. Use a “function -on the fly” to divide by 100. It will also be easier to check your work -if youselect()
columns that match “Pctl”.function(x) x / 100
). It will +also be easier to check your work if youselect()
columns +that match “Pctl”. +# General format +data %>% + mutate(across( + .cols = {vector or tidyselect}, + .fns = {some function} + ))
ces %>% mutate(across( .cols = contains("Pctl"), @@ -514,8 +521,11 @@
2.3
## # CardiovascularDiseasePctl <dbl>, PopCharPctl <dbl>, EducationPctl <dbl>, ## # LinguisticIsolPctl <dbl>, PovertyPctl <dbl>, UnemploymentPctl <dbl>, …-+2.4
++-Practice on Your Own!
++-P.2
Use
across
andmutate
to convert all columns starting with the string “PM” into a binary variable: TRUE if the value is greater than 10 and FALSE if less than or equal to 10. @@ -548,11 +558,8 @@2.4
## # PesticidesPctl <dbl>, ToxRelease <dbl>, ToxReleasePctl <dbl>, ## # Traffic <dbl>, TrafficPctl <dbl>, CleanupSites <dbl>, …-Practice on Your Own!
--P.1
++P.3
Take your code from question 2.4 and assign it to the variable
ces_dat
.@@ -584,7 +591,7 @@
P.1
ces_boxplot(ces_dat)- +## Warning: Removed 11 rows containing non-finite outside the scale range ## (`stat_boxplot()`).