Skip to content

Commit

Permalink
Merge pull request #111 from fhdsl/factors
Browse files Browse the repository at this point in the history
updating functions
  • Loading branch information
avahoffman authored Jul 17, 2024
2 parents 3e75968 + e261482 commit 1d0fe2a
Show file tree
Hide file tree
Showing 6 changed files with 108 additions and 104 deletions.
43 changes: 13 additions & 30 deletions modules/Functions/Functions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -199,49 +199,30 @@ get_row(dat = ces, row = 4)
```


## Functions for tibbles

`select(n)` will choose column `n`:

```{r message=FALSE}
get_index <- function(dat, row, col) {
dat %>%
filter(row_number() == row) %>%
select(all_of(col))
}
get_index(dat = ces, row = 10, col = 7)
```


## Functions for tibbles

Including default values for arguments:

```{r message=FALSE}
get_top <- function(dat, row = 1, col = 1) {
dat %>%
filter(row_number() == row) %>%
select(all_of(col))
}
get_top(dat = ces)
```

## Functions for tibbles

Can create function with an argument that allows inputting a column name for `select` or other `dplyr` operation:

```{r}
clean_dataset <- function(dataset, col_name) {
my_data_out <- dataset %>% select({{col_name}}) # Note the curly braces
write_csv(my_data_out, "clean_data.csv")
return(my_data_out)
}
clean_dataset(dataset = ces, col_name = "CES4.0Score")
```
```{r}
get_mean <- function(dat, county_name, col_name) {
my_data_out <- dat %>%
filter(str_detect(CaliforniaCounty, county_name)) %>%
summarise(mean = mean({{col_name}}, na.rm = TRUE))
return(my_data_out)
}
get_mean(dat = ces, county_name = "Alameda", col_name = CES4.0Score)
get_mean(dat = ces, county_name = "Fresno", col_name = CES4.0Score)
```
## Summary

- Simple functions take the form:
Expand Down Expand Up @@ -398,6 +379,8 @@ ces_dbl %>%

Combining with `mutate()` - the `replace_na` function

Here we will use the `yearly_co2_emissions` data from `dasehr`

```replace_na({data frame}, {list of values})```
or
```replace_na({vector}, {single value})```
Expand Down
60 changes: 24 additions & 36 deletions modules/Functions/Functions.html
Original file line number Diff line number Diff line change
Expand Up @@ -3117,7 +3117,7 @@
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span { line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
Expand Down Expand Up @@ -3473,45 +3473,10 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>

</article></slide><slide class=""><hgroup><h2>Functions for tibbles</h2></hgroup><article id="functions-for-tibbles-1">

<p><code>select(n)</code> will choose column <code>n</code>:</p>

<pre class = 'prettyprint lang-r'>get_index &lt;- function(dat, row, col) {
dat %&gt;%
filter(row_number() == row) %&gt;%
select(all_of(col))
}

get_index(dat = ces, row = 10, col = 7)</pre>

<pre ># A tibble: 1 × 1
CES4.0Score
&lt;dbl&gt;
1 43.7</pre>

</article></slide><slide class=""><hgroup><h2>Functions for tibbles</h2></hgroup><article id="functions-for-tibbles-2">

<p>Including default values for arguments:</p>

<pre class = 'prettyprint lang-r'>get_top &lt;- function(dat, row = 1, col = 1) {
dat %&gt;%
filter(row_number() == row) %&gt;%
select(all_of(col))
}

get_top(dat = ces)</pre>

<pre ># A tibble: 1 × 1
CensusTract
&lt;dbl&gt;
1 6001400100</pre>

</article></slide><slide class=""><hgroup><h2>Functions for tibbles</h2></hgroup><article id="functions-for-tibbles-3">

<p>Can create function with an argument that allows inputting a column name for <code>select</code> or other <code>dplyr</code> operation:</p>

<pre class = 'prettyprint lang-r'>clean_dataset &lt;- function(dataset, col_name) {
my_data_out &lt;- dataset %&gt;% select({{col_name}}) # Note the curly braces
write_csv(my_data_out, &quot;clean_data.csv&quot;)
return(my_data_out)
}

Expand All @@ -3532,6 +3497,27 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
10 43.7
# ℹ 8,025 more rows</pre>

<pre class = 'prettyprint lang-r'>get_mean &lt;- function(dat, county_name, col_name) {
my_data_out &lt;- dat %&gt;%
filter(str_detect(CaliforniaCounty, county_name)) %&gt;%
summarise(mean = mean({{col_name}}, na.rm = TRUE))
return(my_data_out)
}

get_mean(dat = ces, county_name = &quot;Alameda&quot;, col_name = CES4.0Score)</pre>

<pre ># A tibble: 1 × 1
mean
&lt;dbl&gt;
1 22.9</pre>

<pre class = 'prettyprint lang-r'>get_mean(dat = ces, county_name = &quot;Fresno&quot;, col_name = CES4.0Score)</pre>

<pre ># A tibble: 1 × 1
mean
&lt;dbl&gt;
1 40.9</pre>

</article></slide><slide class=""><hgroup><h2>Summary</h2></hgroup><article id="summary">

<ul>
Expand Down Expand Up @@ -3775,6 +3761,8 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>

<p>Combining with <code>mutate()</code> - the <code>replace_na</code> function</p>

<p>Here we will use the <code>yearly_co2_emissions</code> data from <code>dasehr</code></p>

<p><code>replace_na({data frame}, {list of values})</code> or <code>replace_na({vector}, {single value})</code></p>

<pre class = 'prettyprint lang-r'>yearly_co2_emissions %&gt;%
Expand Down
Binary file modified modules/Functions/Functions.pdf
Binary file not shown.
34 changes: 24 additions & 10 deletions modules/Functions/lab/Functions_Lab.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,11 @@ Amend the function `has_n` from question 1.2 so that it takes a default value of
```

### 1.4
### P.1

Create a new number `b_num` that is not contained with `nums`. Use your updated `has_n` function with the default value and add `b_num` as the `n` argument when calling the function. What is the outcome?

```{r 1.4response}
```{r P.1response}
```

Expand All @@ -73,7 +73,9 @@ Create a new number `b_num` that is not contained with `nums`. Use your updated
Read in the CalEnviroScreen from https://daseh.org/data/CalEnviroScreen_data.csv. Assign the data the name "ces".

```{r message = FALSE, label = '2.1response'}
ces <- read_csv("https://daseh.org/data/CalEnviroScreen_data.csv")
# If downloaded
# ces <- read_csv("CalEnviroScreen_data.csv")
```

### 2.2
Expand All @@ -94,33 +96,45 @@ data %>%
```


### 2.3

Use `across` and `mutate` to convert all columns containing the word "Pctl" into proportions (i.e., divide that value by 100). **Hint**: use `contains()` to select the right columns within `across()`. Use a "function on the fly" to divide by 100. It will also be easier to check your work if you `select()` columns that match "Pctl".
Use `across` and `mutate` to convert all columns containing the word "Pctl" into proportions (i.e., divide that value by 100). **Hint**: use `contains()` to select the right columns within `across()`. Use a "function on the fly" to divide by 100 (`function(x) x / 100`). It will also be easier to check your work if you `select()` columns that match "Pctl".

```
# General format
data %>%
mutate(across(
.cols = {vector or tidyselect},
.fns = {some function}
))
```

```{r 2.3response}
```

### 2.4
# Practice on Your Own!

### P.2



Use `across` and `mutate` to convert all columns starting with the string "PM" into a binary variable: TRUE if the value is greater than 10 and FALSE if less than or equal to 10. **Hint**: use `starts_with()` to select the columns that start with "PM". Use a "function on the fly" to do a logical test if the value is greater than 10.

```{r 2.4response}
```{r P.2response}
```


# Practice on Your Own!

### P.1
### P.3

Take your code from question 2.4 and assign it to the variable `ces_dat`.

- use `filter()` to drop any rows where "Oakland" appears in `ApproxLocation`. Make sure to reassign this to `ces_dat`.
- Create a ggplot boxplot (`geom_boxplot()`) where (1) the x-axis is `PM2.5` and (2) the y-axis is `Asthma`.
- You change the `labs()` layer so that the x-axis is "ER Visits for Asthma: PM2.5 greater than 10"

```{r P.1response}
```{r P.3response}
```
32 changes: 22 additions & 10 deletions modules/Functions/lab/Functions_Lab_Key.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ library(ggplot2)

### 1.1

Create a function that takes one argument, a vector, and returns the sum of the vector and squares the result. Call it "sum_squared". Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500.
Create a function that takes one argument, a vector, and returns the sum of the vector and then squares the result. Call it "sum_squared". Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500.

```
# General format
Expand Down Expand Up @@ -74,11 +74,11 @@ has_n <- function(x, n = 21) n %in% x
has_n(x = nums)
```

### 1.4
### P.1

Create a new number `b_num` that is not contained with `nums`. Use your updated `has_n` function with the default value and add `b_num` as the `n` argument when calling the function. What is the outcome?

```{r 1.4response}
```{r P.1response}
b_num <- 11
has_n(x = nums, n = b_num)
```
Expand Down Expand Up @@ -124,9 +124,19 @@ ces %>%
))
```


### 2.3

Use `across` and `mutate` to convert all columns containing the word "Pctl" into proportions (i.e., divide that value by 100). **Hint**: use `contains()` to select the right columns within `across()`. Use a "function on the fly" to divide by 100. It will also be easier to check your work if you `select()` columns that match "Pctl".
Use `across` and `mutate` to convert all columns containing the word "Pctl" into proportions (i.e., divide that value by 100). **Hint**: use `contains()` to select the right columns within `across()`. Use a "function on the fly" to divide by 100 (`function(x) x / 100`). It will also be easier to check your work if you `select()` columns that match "Pctl".

```
# General format
data %>%
mutate(across(
.cols = {vector or tidyselect},
.fns = {some function}
))
```

```{r 2.3response}
ces %>%
Expand All @@ -137,11 +147,15 @@ ces %>%
select(contains("Pctl"))
```

### 2.4
# Practice on Your Own!

### P.2



Use `across` and `mutate` to convert all columns starting with the string "PM" into a binary variable: TRUE if the value is greater than 10 and FALSE if less than or equal to 10. **Hint**: use `starts_with()` to select the columns that start with "PM". Use a "function on the fly" to do a logical test if the value is greater than 10.

```{r 2.4response}
```{r P.2response}
ces %>%
mutate(across(
.cols = starts_with("PM"),
Expand All @@ -150,17 +164,15 @@ ces %>%
```


# Practice on Your Own!

### P.1
### P.3

Take your code from question 2.4 and assign it to the variable `ces_dat`.

- use `filter()` to drop any rows where "Oakland" appears in `ApproxLocation`. Make sure to reassign this to `ces_dat`.
- Create a ggplot boxplot (`geom_boxplot()`) where (1) the x-axis is `PM2.5` and (2) the y-axis is `Asthma`.
- You change the `labs()` layer so that the x-axis is "ER Visits for Asthma: PM2.5 greater than 10"

```{r P.1response}
```{r P.3response}
ces_dat <-
ces %>%
mutate(across(
Expand Down
Loading

0 comments on commit 1d0fe2a

Please sign in to comment.