-
Notifications
You must be signed in to change notification settings - Fork 15
/
Copy pathwrangle.qmd
138 lines (105 loc) · 3.38 KB
/
wrangle.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
title: "Wrangle Data with {dplyr}"
date-modified: 'today'
date-format: long
license: CC BY-NC
bibliography: references.bib
---
{[`dplyr`](https://dplyr.tidyverse.org)} verbs help you wrangle, clean, and normalize your data
::: column-margin
```{=html}
<iframe width="300" height="200" src="https://www.youtube.com/embed/PGSXf-NMWVg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
```
:::
| dplyr function | use for |
|----------------|--------------------------------------------|
| `select()` | subset columns |
| `filter()` | subset rows |
| `arrange()` | sort rows by column variable values |
| `mutate()` | Create new, or modify variables |
| `group_by()` | use with summarize for subtotals |
| `summarize()` | generate column totals and subtotals, etc. |
| `count()` | a specialized `summarize()` function |
### Examples
First we need to load the {dplyr} package for wrangling and the {readr} package for importing CSV data. In our case, we'll do that by loading the tidyverse which loads {dplyr}, {readr} and several other helpful packages. Then we need to load our data
```{r}
#| warning: false
#| message: false
library(tidyverse)
brodhead_center <- read_csv("data/brodhead_center.csv")
```
#### `select()`
```{r}
brodhead_center |>
select(name, type)
```
::: column-margin
[![Select columns](images/select.svg){fig-alt="Select columns"}](https://dplyr.tidyverse.org/reference/select.html)
:::
#### `filter()`
```{r}
brodhead_center |>
filter(menuType == "dessert")
```
::: column-margin
[![Filter by rows](images/filter_by_rows.svg){fig-alt="Filter by rows"}](https://dplyr.tidyverse.org/reference/filter.html)
:::
#### `arrange()`
```{r}
brodhead_center |>
arrange(cost)
```
::: column-margin
[![Arrange rows by the values in a column](images/arrange_rows.svg){fig-alt="Arrange rows by the values in a column"}](https://dplyr.tidyverse.org/reference/arrange.html)
:::
#### `mutate()`
```{r}
brodhead_center |>
mutate(ratings_high = rating * 2)
```
::: column-margin
[![Create new variable or modify variable with mutate()](images/mutate.svg){fig-alt="Create new variable or modify variable with mutate()"}](https://dplyr.tidyverse.org/reference/mutate.html)
:::
We can also mutate data by groups or categories
```{r}
brodhead_center |>
mutate(avg_item_rating_rest = mean(rating, na.rm = TRUE),
.by = name,
.after = name)
```
#### `count()`
```{r}
#| label: margin-picture-count
#| echo: false
#| column: margin
brodhead_center |>
count(menuType) |>
arrange(desc(n)) |>
gt::gt() |>
gt::tab_header(title = "Count values in a group")
```
```{r}
brodhead_center |>
count(menuType)
```
#### `group_by()` & `summarise()`
```{r}
#| label: margin-picture-summarize
#| echo: false
#| column: margin
brodhead_center |>
summarise(Sum_of_cost = sum(cost)) |>
gt::gt() |>
gt::tab_header(title = "Summarise column")
```
```{r}
brodhead_center |>
group_by(name) |>
summarise(min_cost = min(cost), mean_cost = mean(cost), max_cost = max(cost))
```
##### or
Summarize by groups, without `group_by()`
```{r}
brodhead_center |>
summarise(min_cost = min(cost), .by = name)
```