You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create a function that takes one argument, a vector, and returns the sum of the vector and then squares the result. Call it "sum_squared". Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500.
22
+
Create a function that:
23
23
24
-
```
25
-
# General format
26
-
NEW_FUNCTION <- function(x, y) x + y
27
-
```
28
-
or
29
-
30
-
```
31
-
# General format
32
-
NEW_FUNCTION <- function(x, y){
33
-
result <- x + y
34
-
return(result)
35
-
}
36
-
```
24
+
* Takes one argument, a vector.
25
+
* Returns the sum of the vector and then squares the result.
26
+
* Call it "sum_squared".
27
+
* Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500.
28
+
* Format is `NEW_FUNCTION <- function(x, y) x + y`
37
29
38
30
```{r 1.1response}
39
31
nums <- c(2, 7, 21, 30, 90)
@@ -50,7 +42,12 @@ sum_squared(x = nums)
50
42
51
43
### 1.2
52
44
53
-
Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). **Hint**: use `%in%`. Call it `has_n`. Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.
45
+
Create a function that:
46
+
47
+
* takes two arguments, (1) a vector and (2) a numeric value.
48
+
* This function tests whether the number (2) is contained within the vector (1). **Hint**: use `%in%`.
49
+
* Call it `has_n`.
50
+
* Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.
54
51
55
52
```{r 1.2response}
56
53
nums <- c(2, 7, 21, 30, 90)
@@ -74,11 +71,24 @@ has_n(x = nums)
74
71
75
72
### P.1
76
73
77
-
Create a new number `b_num` that is not contained with `nums`. Use your updated `has_n` function with the default value and add `b_num` as the `n` argument when calling the function. What is the outcome?
74
+
Create a function for the CalEnviroScreen Data.
75
+
76
+
* Read in (https://daseh.org/data/CalEnviroScreen_data.csv)
77
+
* The function takes an argument for a column name. (use `{{col_name}}`)
78
+
* The function creates a ggplot with `{{col_name}}` on the x-axis and `Poverty` on the y-axis.
79
+
* Use `geom_point()`
80
+
* Test the function using the `Lead` column and `HousingBurden` columns, or other columns of your choice.
78
81
79
82
```{r P.1response}
80
-
b_num <- 11
81
-
has_n(x = nums, n = b_num)
83
+
ces <- read_csv("https://daseh.org/data/CalEnviroScreen_data.csv")
84
+
85
+
plot_ces <- function(col_name){
86
+
ggplot(data = ces, aes(x = {{col_name}}, y = Poverty)) +
87
+
geom_point()
88
+
}
89
+
90
+
plot_ces(Lead)
91
+
plot_ces(HousingBurden)
82
92
```
83
93
84
94
@@ -96,7 +106,12 @@ ces <- read_csv("https://daseh.org/data/CalEnviroScreen_data.csv")
96
106
97
107
### 2.2
98
108
99
-
We want to get some summary statistics on water contamination. Use `across` inside `summarize` to get the sum total variable containing the string "water" AND ending with "Pctl". **Hint**: use `contains()` AND `ends_with()` to select the right columns inside `across`. Remember that `NA` values can influence calculations.
109
+
We want to get some summary statistics on water contamination.
110
+
111
+
* Use `across` inside `summarize`.
112
+
* Choose columns about "water". **Hint**: use `contains("water")` inside `across`.
113
+
* Use `mean` as the function inside of `across`.
114
+
* Remember that `NA` values can influence calculations.
100
115
101
116
```
102
117
# General format
@@ -110,19 +125,26 @@ data %>%
110
125
```{r 2.2response}
111
126
ces %>%
112
127
summarize(across(
113
-
contains("Water") & ends_with("Pctl"),
114
-
sum
128
+
contains("water"),
129
+
mean
115
130
))
131
+
132
+
# Accounting for NA
116
133
ces %>%
117
134
summarize(across(
118
-
contains("Water") & ends_with("Pctl"),
119
-
function(x) sum(x, na.rm = T)
135
+
contains("water"),
136
+
function(x) mean(x, na.rm = T)
120
137
))
121
138
```
122
139
123
140
### 2.3
124
141
125
-
Use `across` and `mutate` to convert all columns containing the word "water" into proportions (i.e., divide that value by 100). **Hint**: use `contains()` to select the right columns within `across()`. Use an anonymous function ("function on the fly") to divide by 100 (`function(x) x / 100`). It will also be easier to check your work if you `select()` columns that match "Pctl".
142
+
Convert all columns that are percentiles into proportions.
143
+
144
+
* Use `across` and `mutate`
145
+
* Choose columns that contain "Pctl" in the name. **Hint**: use `contains("Pctl")` inside `across`.
146
+
* Use an anonymous function ("function on the fly") to divide by 100 (`function(x) x / 100`).
147
+
* Check your work - It will also be easier if you `select(contains("Pctl"))`.
126
148
127
149
```
128
150
# General format
@@ -136,7 +158,7 @@ data %>%
136
158
```{r 2.3response}
137
159
ces %>%
138
160
mutate(across(
139
-
contains("water"),
161
+
contains("Pctl"),
140
162
function(x) x / 100
141
163
)) %>%
142
164
select(contains("Pctl"))
@@ -149,42 +171,50 @@ ces %>%
149
171
150
172
Use `across` and `mutate` to convert all columns starting with the string "PM" into a binary variable: TRUE if the value is greater than 10 and FALSE if less than or equal to 10.
151
173
152
-
-**Hint**: use `starts_with()` to select the columns that start with "PM".
153
-
- Use an anonymous function ("function on the fly") to do a logical test if the value is greater than 10.
154
-
- A logical test with `mutate` will automatically fill a column with TRUE/FALSE.
174
+
***Hint**: use `starts_with()` to select the columns that start with "PM".
175
+
* Use an anonymous function ("function on the fly") to do a logical test if the value is greater than 10.
176
+
* A logical test with `mutate` (x > 10) will automatically fill a column with TRUE/FALSE.
155
177
156
178
```{r P.2response}
157
179
ces %>%
158
180
mutate(across(
159
181
starts_with("PM"),
160
182
function(x) x > 10
161
-
))
183
+
)) %>%
184
+
glimpse() # add glimpse to view the changes
162
185
```
163
186
164
187
### P.3
165
188
166
189
Take your code from previous question and assign it to the variable `ces_dat`.
167
190
168
-
-Use `filter()` to drop any rows where "Oakland" appears in `ApproxLocation`. Make sure to reassign this to `ces_dat`.
169
-
-Create a ggplot boxplot (`geom_boxplot()`) where (1) the x-axis is `Asthma` and (2) the y-axis is `PM2.5`.
170
-
-You change the `labs()` layer so that the x-axis is "ER Visits for Asthma: PM2.5 greater than 10"
191
+
-Create a ggplot where the x-axis is `Asthma` and the y-axis is `PM2.5`.
192
+
-Add a boxplot (`geom_boxplot()`)
193
+
-Change the `labs()` layer so that the x-axis is "ER Visits for Asthma: PM2.5 greater than 10"
171
194
172
195
```{r P.3response}
173
196
ces_dat <-
174
197
ces %>%
175
198
mutate(across(
176
199
starts_with("PM"),
177
200
function(x) x > 10
178
-
)) %>%
179
-
filter(ApproxLocation != "Oakland")
180
-
181
-
ces_boxplot <- function(df) {
182
-
ggplot(df) +
183
-
geom_boxplot(aes(
184
-
x = `Asthma`,
185
-
y = `PM2.5`
186
-
)) +
201
+
))
202
+
203
+
ggplot(data = ces_dat, aes(x = `Asthma`, y = `PM2.5`)) +
204
+
geom_boxplot() +
205
+
labs(x = "ER Visits for Asthma: PM2.5 greater than 10")
206
+
207
+
# Make everything a function if you like!
208
+
ces_boxplot <- function() {
209
+
ces %>%
210
+
mutate(across(
211
+
starts_with("PM"),
212
+
function(x) x > 10
213
+
)) %>%
214
+
ggplot(aes(x = `Asthma`, y = `PM2.5`)) +
215
+
geom_boxplot() +
187
216
labs(x = "ER Visits for Asthma: PM2.5 greater than 10")
0 commit comments