Merge pull request #111 from fhdsl/factors

updating functions
fhdsl · Jul 17, 2024 · 1d0fe2a · 1d0fe2a
2 parents 3e75968 + e261482
commit 1d0fe2a
Show file tree

Hide file tree

Showing 6 changed files with 108 additions and 104 deletions.
diff --git a/modules/Functions/Functions.Rmd b/modules/Functions/Functions.Rmd
@@ -199,49 +199,30 @@ get_row(dat = ces, row = 4)
 ```
 
 
-## Functions for tibbles
-
-`select(n)` will choose column `n`:
-
-```{r message=FALSE}
-get_index <- function(dat, row, col) {
-  dat %>%
-    filter(row_number() == row) %>%
-    select(all_of(col))
-}
-
-get_index(dat = ces, row = 10, col = 7)
-```
-
-
-## Functions for tibbles
-
-Including default values for arguments:
-
-```{r message=FALSE}
-get_top <- function(dat, row = 1, col = 1) {
-  dat %>%
-    filter(row_number() == row) %>%
-    select(all_of(col))
-}
-
-get_top(dat = ces)
-```
-
 ## Functions for tibbles
 
 Can create function with an argument that allows inputting a column name for `select` or other `dplyr` operation:
 
 ```{r}
 clean_dataset <- function(dataset, col_name) {
   my_data_out <- dataset %>% select({{col_name}}) # Note the curly braces
-  write_csv(my_data_out, "clean_data.csv")
   return(my_data_out)
 }
 
 clean_dataset(dataset = ces, col_name = "CES4.0Score")
 ```
+```{r}
+get_mean <- function(dat, county_name, col_name) {
+  my_data_out <- dat %>% 
+    filter(str_detect(CaliforniaCounty, county_name)) %>%  
+    summarise(mean = mean({{col_name}}, na.rm = TRUE))
+    return(my_data_out)
+}
+
+get_mean(dat = ces, county_name = "Alameda", col_name = CES4.0Score)
+get_mean(dat = ces, county_name = "Fresno", col_name = CES4.0Score)
 
+```
 ## Summary
 
 - Simple functions take the form:
@@ -398,6 +379,8 @@ ces_dbl %>%
 
 Combining with `mutate()` - the `replace_na` function
 
+Here we will use the `yearly_co2_emissions` data from `dasehr`
+
 ```replace_na({data frame}, {list of values})```
 or
 ```replace_na({vector}, {single value})```

diff --git a/modules/Functions/Functions.html b/modules/Functions/Functions.html
@@ -3117,7 +3117,7 @@
 div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
 ul.task-list{list-style: none;}
 pre > code.sourceCode { white-space: pre; position: relative; }
-pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
+pre > code.sourceCode > span { line-height: 1.25; }
 pre > code.sourceCode > span:empty { height: 1.2em; }
 .sourceCode { overflow: visible; }
 code.sourceCode > span { color: inherit; text-decoration: inherit; }
@@ -3473,45 +3473,10 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 
 </article></slide><slide class=""><hgroup><h2>Functions for tibbles</h2></hgroup><article  id="functions-for-tibbles-1">
 
-<p><code>select(n)</code> will choose column <code>n</code>:</p>
-
-<pre class = 'prettyprint lang-r'>get_index &lt;- function(dat, row, col) {
-  dat %&gt;%
-    filter(row_number() == row) %&gt;%
-    select(all_of(col))
-}
-
-get_index(dat = ces, row = 10, col = 7)</pre>
-
-<pre ># A tibble: 1 × 1
-  CES4.0Score
-        &lt;dbl&gt;
-1        43.7</pre>
-
-</article></slide><slide class=""><hgroup><h2>Functions for tibbles</h2></hgroup><article  id="functions-for-tibbles-2">
-
-<p>Including default values for arguments:</p>
-
-<pre class = 'prettyprint lang-r'>get_top &lt;- function(dat, row = 1, col = 1) {
-  dat %&gt;%
-    filter(row_number() == row) %&gt;%
-    select(all_of(col))
-}
-
-get_top(dat = ces)</pre>
-
-<pre ># A tibble: 1 × 1
-  CensusTract
-        &lt;dbl&gt;
-1  6001400100</pre>
-
-</article></slide><slide class=""><hgroup><h2>Functions for tibbles</h2></hgroup><article  id="functions-for-tibbles-3">
-
 <p>Can create function with an argument that allows inputting a column name for <code>select</code> or other <code>dplyr</code> operation:</p>
 
 <pre class = 'prettyprint lang-r'>clean_dataset &lt;- function(dataset, col_name) {
   my_data_out &lt;- dataset %&gt;% select({{col_name}}) # Note the curly braces
-  write_csv(my_data_out, &quot;clean_data.csv&quot;)
   return(my_data_out)
 }
 
@@ -3532,6 +3497,27 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 10       43.7 
 # ℹ 8,025 more rows</pre>
 
+<pre class = 'prettyprint lang-r'>get_mean &lt;- function(dat, county_name, col_name) {
+  my_data_out &lt;- dat %&gt;% 
+    filter(str_detect(CaliforniaCounty, county_name)) %&gt;%  
+    summarise(mean = mean({{col_name}}, na.rm = TRUE))
+    return(my_data_out)
+}
+
+get_mean(dat = ces, county_name = &quot;Alameda&quot;, col_name = CES4.0Score)</pre>
+
+<pre ># A tibble: 1 × 1
+   mean
+  &lt;dbl&gt;
+1  22.9</pre>
+
+<pre class = 'prettyprint lang-r'>get_mean(dat = ces, county_name = &quot;Fresno&quot;, col_name = CES4.0Score)</pre>
+
+<pre ># A tibble: 1 × 1
+   mean
+  &lt;dbl&gt;
+1  40.9</pre>
+
 </article></slide><slide class=""><hgroup><h2>Summary</h2></hgroup><article  id="summary">
 
 <ul>
@@ -3775,6 +3761,8 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 
 <p>Combining with <code>mutate()</code> - the <code>replace_na</code> function</p>
 
+<p>Here we will use the <code>yearly_co2_emissions</code> data from <code>dasehr</code></p>
+
 <p><code>replace_na({data frame}, {list of values})</code> or <code>replace_na({vector}, {single value})</code></p>
 
 <pre class = 'prettyprint lang-r'>yearly_co2_emissions %&gt;%

diff --git a/modules/Functions/Functions.pdf b/modules/Functions/Functions.pdf
diff --git a/modules/Functions/lab/Functions_Lab.Rmd b/modules/Functions/lab/Functions_Lab.Rmd
@@ -57,11 +57,11 @@ Amend the function `has_n` from question 1.2 so that it takes a default value of
 
 ```
 
-### 1.4
+### P.1
 
 Create a new number `b_num` that is not contained with `nums`. Use your updated `has_n` function with the default value and add `b_num` as the `n` argument when calling the function. What is the outcome?
 
-```{r 1.4response}
+```{r P.1response}
 
 ```
 
@@ -73,7 +73,9 @@ Create a new number `b_num` that is not contained with `nums`. Use your updated
 Read in the CalEnviroScreen from https://daseh.org/data/CalEnviroScreen_data.csv. Assign the data the name "ces".
 
 ```{r message = FALSE, label = '2.1response'}
-
+ces <- read_csv("https://daseh.org/data/CalEnviroScreen_data.csv")
+# If downloaded
+# ces <- read_csv("CalEnviroScreen_data.csv")
 ```
 
 ### 2.2
@@ -94,33 +96,45 @@ data %>%
 
 ```
 
+
 ### 2.3
 
-Use `across` and `mutate` to convert all columns containing the word "Pctl" into proportions (i.e., divide that value by 100). **Hint**: use `contains()` to select the right columns within `across()`. Use a "function on the fly" to divide by 100. It will also be easier to check your work if you `select()` columns that match "Pctl".
+Use `across` and `mutate` to convert all columns containing the word "Pctl" into proportions (i.e., divide that value by 100). **Hint**: use `contains()` to select the right columns within `across()`. Use a "function on the fly" to divide by 100 (`function(x) x / 100`). It will also be easier to check your work if you `select()` columns that match "Pctl".
+
+```
+# General format
+data %>%
+  mutate(across(
+    .cols = {vector or tidyselect},
+    .fns = {some function}
+  ))
+```
 
 ```{r 2.3response}
 
 ```
 
-### 2.4
+# Practice on Your Own!
+
+### P.2
+
+
 
 Use `across` and `mutate` to convert all columns starting with the string "PM" into a binary variable: TRUE if the value is greater than 10 and FALSE if less than or equal to 10. **Hint**: use `starts_with()` to select the columns that start with "PM". Use a "function on the fly" to do a logical test if the value is greater than 10.
 
-```{r 2.4response}
+```{r P.2response}
 
 ```
 
 
-# Practice on Your Own!
-
-### P.1
+### P.3 
 
 Take your code from question 2.4 and assign it to the variable `ces_dat`. 
 
 - use `filter()` to drop any rows where "Oakland" appears in `ApproxLocation`. Make sure to reassign this to `ces_dat`.
 - Create a ggplot boxplot (`geom_boxplot()`) where (1) the x-axis is `PM2.5` and (2) the y-axis is `Asthma`.
 - You change the `labs()` layer so that the x-axis is "ER Visits for Asthma: PM2.5 greater than 10"
 
-```{r P.1response}
+```{r P.3response}
 
 ```
diff --git a/modules/Functions/lab/Functions_Lab_Key.Rmd b/modules/Functions/lab/Functions_Lab_Key.Rmd
@@ -21,7 +21,7 @@ library(ggplot2)
 
 ### 1.1
 
-Create a function that takes one argument, a vector, and returns the sum of the vector and squares the result. Call it "sum_squared". Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500.
+Create a function that takes one argument, a vector, and returns the sum of the vector and then squares the result. Call it "sum_squared". Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500.
 
 ```
 # General format
@@ -74,11 +74,11 @@ has_n <- function(x, n = 21) n %in% x
 has_n(x = nums)
 ```
 
-### 1.4
+### P.1
 
 Create a new number `b_num` that is not contained with `nums`. Use your updated `has_n` function with the default value and add `b_num` as the `n` argument when calling the function. What is the outcome?
 
-```{r 1.4response}
+```{r P.1response}
 b_num <- 11
 has_n(x = nums, n = b_num)
 ```
@@ -124,9 +124,19 @@ ces %>%
   ))
 ```
 
+
 ### 2.3
 
-Use `across` and `mutate` to convert all columns containing the word "Pctl" into proportions (i.e., divide that value by 100). **Hint**: use `contains()` to select the right columns within `across()`. Use a "function on the fly" to divide by 100. It will also be easier to check your work if you `select()` columns that match "Pctl".
+Use `across` and `mutate` to convert all columns containing the word "Pctl" into proportions (i.e., divide that value by 100). **Hint**: use `contains()` to select the right columns within `across()`. Use a "function on the fly" to divide by 100 (`function(x) x / 100`). It will also be easier to check your work if you `select()` columns that match "Pctl".
+
+```
+# General format
+data %>%
+  mutate(across(
+    .cols = {vector or tidyselect},
+    .fns = {some function}
+  ))
+```
 
 ```{r 2.3response}
 ces %>%
@@ -137,11 +147,15 @@ ces %>%
   select(contains("Pctl"))
 ```
 
-### 2.4
+# Practice on Your Own!
+
+### P.2
+
+
 
 Use `across` and `mutate` to convert all columns starting with the string "PM" into a binary variable: TRUE if the value is greater than 10 and FALSE if less than or equal to 10. **Hint**: use `starts_with()` to select the columns that start with "PM". Use a "function on the fly" to do a logical test if the value is greater than 10.
 
-```{r 2.4response}
+```{r P.2response}
 ces %>%
   mutate(across(
     .cols = starts_with("PM"),
@@ -150,17 +164,15 @@ ces %>%
 ```
 
 
-# Practice on Your Own!
-
-### P.1
+### P.3
 
 Take your code from question 2.4 and assign it to the variable `ces_dat`. 
 
 - use `filter()` to drop any rows where "Oakland" appears in `ApproxLocation`. Make sure to reassign this to `ces_dat`.
 - Create a ggplot boxplot (`geom_boxplot()`) where (1) the x-axis is `PM2.5` and (2) the y-axis is `Asthma`.
 - You change the `labs()` layer so that the x-axis is "ER Visits for Asthma: PM2.5 greater than 10"
 
-```{r P.1response}
+```{r P.3response}
 ces_dat <-
   ces %>%
   mutate(across(