forked from hadley/ggplot2-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
scales-other.Rmd
304 lines (221 loc) · 14.4 KB
/
scales-other.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
```{r setup, include = FALSE}
source("common.R")
columns(1, 2 / 3)
```
# Other aesthetics {#scale-other}
In addition to position and colour, there are several other aesthetics that ggplot2 can use to represent data. In this chapter we'll look at size scales (Section \@ref(scale-size)), shape scales (Section \@ref(scale-shape)), line width scales (Section \@ref(scale-linewidth)), and line type scales (Section \@ref(scale-linetype)), which use visual features other than location and colour to represent data values. Additionally, we'll talk about manual scales (Section \@ref(scale-manual)) and identity scales (Section \@ref(scale-identity)): these don't necessarily use different visual features, but they construct data mappings in an unusual way.
## Size {#scale-size}
\index{Size}
```{r, echo=FALSE}
planets <- data.frame(
name = c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune"),
type = c(rep("Inner", 4), rep("Outer", 4)),
position = 1:8,
radius = c(2440, 6052, 6378, 3390, 71400, 60330, 25559, 24764),
orbit = c(57900000, 108200000, 149600000, 227900000, 778300000, 1427000000, 2871000000, 4497100000)
# mass = c(3.3022e+23, 4.8685e+24, 5.9736e+24, 6.4185e+23, 1.8986e+27, 5.6846e+26, 8.681e+25, 1.0243e+26)
)
planets$name <- with(planets, factor(name, name))
```
The size aesthetic is typically used to scale points and text. The default scale for size aesthetics is `scale_size()` in which a linear increase in the variable is mapped onto a linear increase in the area (not the radius) of the geom. Scaling as a function of area is a sensible default as human perception of size is more closely mimicked by area scaling than by radius scaling. By default the smallest value in the data (more precisely in the scale limits) is mapped to a size of 1 and the largest is mapped to a size of 6. The `range` argument allows you to scale the size of the geoms:
`r columns(2, 2/3)`
```{r}
base <- ggplot(mpg, aes(displ, hwy, size = cyl)) +
geom_point()
base
base + scale_size(range = c(1, 2))
```
There are several size scales worth noting briefly:
- `scale_size_area()` and `scale_size_binned_area()` are versions of `scale_size()` and `scale_size_binned()` that ensure that a value of 0 maps to an area of 0.
- `scale_radius()` maps the data value to the radius rather than to the area (Section \@ref(radius-scaling)).
- `scale_size_binned()` is a size scale that behaves like `scale_size()` but maps continuous values onto discrete size categories, analogous to the binned position and colour scales discussed in Sections \@ref(binned-position) and \@ref(binned-colour) respectively. Legends associated with this scale are discussed in Section \@ref(guide-bins).
- `scale_size_date()` and `scale_size_datetime()` are designed to handle date data, analogous to the date scales discussed in Section \@ref(date-scales).
### Radius size scales {#radius-scaling}
There are situations where area scaling is undesirable, and for such situations the `scale_radius()` function is provided. To illustrate when `scale_radius()` is appropriate consider a data set containing astronomical data that includes the radius of different planets:
```{r}
planets
```
In this instance a plot that uses the size aesthetic to represent the radius of the planets should use `scale_radius()` rather than the default `scale_size()`. It is also important in this case to set the scale limits so that a planet with radius 0 would be drawn with a disc with radius 0.
`r columns(2, 1)`
```{r}
base <- ggplot(planets, aes(1, name, size = radius)) +
geom_point() +
scale_x_continuous(breaks = NULL) +
labs(x = NULL, y = NULL, size = NULL)
base + ggtitle("not to scale")
base +
scale_radius(limits = c(0, NA), range = c(0, 10)) +
ggtitle("to scale")
```
On the left it is difficult to distinguish Jupiter from Saturn, despite the fact that the difference between the two should be double the size of Earth; compare this to the plot on the right where the radius of Jupiter is visibly larger.
### Binned size scales {#guide-bins}
Binned size scales work similarly to binned scales for colour and position aesthetics (Sections \@ref(binned-colour) and \@ref(binned-position)). One difference is how legends are displayed. The default legend for a binned size scale, and all binned scales except position and colour aesthetics, is governed by `guide_bins()`. For instance, in the `mpg` data we could use `scale_size_binned()` to create a binned version of the continuous variable `hwy`:
`r columns(2, 1)`
```{r}
base <- ggplot(mpg, aes(displ, manufacturer, size = hwy)) +
geom_point(alpha = .2) +
scale_size_binned()
base
```
Unlike `guide_legend()`, the guide created for a binned scale by `guide_bins()` does not organise the individual keys into a table. Instead they are arranged in a column (or row) along a single vertical (or horizontal) axis, which by default is displayed with its own axis. The important arguments to `guide_bins()` are listed below:
* `axis` indicates whether the axis should be drawn (default is `TRUE`)
```{r}
base + guides(size = guide_bins(axis = FALSE))
```
* `direction` is a character string specifying the direction of the guide, either `"vertical"` (the default) or `"horizontal"`
`r columns(1, 2/3)`
```{r}
base + guides(size = guide_bins(direction = "horizontal"))
```
* `show.limits` specifies whether tick marks are shown at the ends of the
guide axis (default is FALSE)
`r columns(2, 1)`
```{r}
base + guides(size = guide_bins(show.limits = TRUE))
```
* `axis.colour`, `axis.linewidth` and `axis.arrow` are used to
control the guide axis that is displayed alongside the legend keys
`r columns(2, 1)`
```{r}
base + guides(
size = guide_bins(
axis.colour = "red",
axis.arrow = arrow(
length = unit(.1, "inches"),
ends = "first",
type = "closed"
)
)
)
```
* `keywidth`, `keyheight`, `reverse` and `override.aes` have the same
behaviour for `guide_bins()` as they do for `guide_legend()` (see Section \@ref(guide-legend))
## Shape {#scale-shape}
Values can be mapped to the shape aesthetic. The typical use for this is when you have a small number of discrete categories: if the data variable contains more than 6 values it becomes difficult to distinguish between shapes, and will produce a warning. The default `scale_shape()` function contains a single argument: set `solid = TRUE` (the default) to use a "palette" consisting of three solid shapes and three hollow shapes, or set `solid = FALSE` to use six hollow shapes:
`r columns(2, 1)`
```{r}
base <- ggplot(mpg, aes(displ, hwy, shape = factor(cyl))) +
geom_point()
base
base + scale_shape(solid = FALSE)
```
Although any one plot is unlikely to be readable with more than a 6 distinct markers, there are 25 possible shapes to choose from, each associated with an integer value:
`r columns(1, .3, max_width = .8)`
```{r, echo=FALSE}
df <- data.frame(
shape = 1:25,
x = (0:24) %% 13,
y = 2 - floor((0:24)/13)
)
ggplot(df, aes(x, y, shape = shape)) +
geom_point(size = 4) +
geom_text(aes(label = shape), nudge_y = .3) +
theme_void() +
scale_shape_identity() +
ylim(.8, 2.5)
```
You can specify the marker types for each data value manually using `scale_shape_manual()`:
`r columns(2, 1)`
```{r}
base +
scale_shape_manual(
values = c("4" = 16, "5" = 17, "6" = 1 , "8" = 2)
)
```
For more information about manual scales see Section \@ref(scale-manual).
## Line width {#scale-linewidth}
The linewidth aesthetic, introduced in ggplot2 3.4.0, is used to control the width of lines. In earlier versions of ggplot2 the size aesthetic was used for this purpose, which caused some difficulty for complex geoms such as `geom_pointrange()` that contain both points and lines. For these geoms it's often important to be able to separately control the size of the points and the width of the lines. This is illustrated in the plots below. In the leftmost plot both the size and linewidth aesthetics are set at their default values. The middle plot increases the size of the points while leaving the linewidth unchanged, whereas the plot on the right increases the linewidth while leaving the point size unchanged.
`r columns(3, 2)`
```{r}
base <- ggplot(airquality, aes(x = factor(Month), y = Temp))
base + geom_pointrange(stat = "summary", fun.data = "median_hilow")
base + geom_pointrange(
stat = "summary",
fun.data = "median_hilow",
size = 2
)
base + geom_pointrange(
stat = "summary",
fun.data = "median_hilow",
linewidth = 2
)
```
In practice you're most likely to set linewidth as a fixed parameter, as shown in the previous example, but it is a true aesthetic and can be mapped onto data values:
`r columns(1, 1/2, 1)`
```{r}
ggplot(airquality, aes(Day, Temp, group = Month)) +
geom_line(aes(linewidth = Month)) +
scale_linewidth(range = c(0.5, 3))
```
Linewidth scales behave like size scales in most ways, but there are differences. As discussed earlier the default behaviour of a size scale is to increase linearly with the area of the plot marker (e.g., the diameter of a circular plot marker increases with the square root of the data value). In contrast, the linewidth increases linearly with the data value.
Binned linewidth scales can be added using `scale_linewidth_binned()`.
## Line type {#scale-linetype}
It is possible to map a variable onto the linetype aesthetic in ggplot2. This works best for discrete variables with a small number of categories, and `scale_linetype()` is an alias for `scale_linetype_discrete()`. Continuous variables cannot be mapped to line types unless `scale_linetype_binned()` is used: although there is a `scale_linetype_continuous()` function, all it does is produce an error. To see why the linetype aesthetic is suited only to cases with a few categories, consider this plot:
`r columns(1, 5/12, 0.8)`
```{r}
ggplot(economics_long, aes(date, value01, linetype = variable)) +
geom_line()
```
With five categories the plot is quite difficult to read, and it is unlikely you will want to use the linetype aesthetic for more than that. The default "palette" for linetype is supplied by the `scales::linetype_pal()` function, and includes the 13 linetypes shown below:
`r columns(1, 2/3)`
```{r}
df <- data.frame(value = letters[1:13])
base <- ggplot(df, aes(linetype = value)) +
geom_segment(
mapping = aes(x = 0, xend = 1, y = value, yend = value),
show.legend = FALSE
) +
theme(panel.grid = element_blank()) +
scale_x_continuous(NULL, NULL)
base
```
You can control the line type by specifying a string with up to 8 hexadecimal values (i.e., from 0 to F). In this specification, the first value is the length of the first line segment, the second value is the length of the first space between segments, and so on. This allows you to specify your own line types using `scale_linetype_manual()`, or alternatively, by passing a custom function to the `palette` argument:
```{r}
linetypes <- function(n) {
types <- c("55", "75", "95", "1115", "111115", "11111115",
"5158", "9198", "c1c8")
return(types[seq_len(n)])
}
base + scale_linetype(palette = linetypes)
```
Note that the last four lines are blank, because the `linetypes()` function defined above returns `NA` when the number of categories exceeds 9. The `scale_linetype()` function contains a `na.value` argument used to specify what kind of line is plotted for these values. By default this produces a blank line, but you can override this by setting `na.value = "dotted"`:
```{r}
base + scale_linetype(palette = linetypes, na.value = "dotted")
```
Valid line types can be set using a human readable character string: `"blank"`, `"solid"`, `"dashed"`, `"dotted"`, `"dotdash"`, `"longdash"`, and `"twodash"` are all understood.
## Manual scales {#scale-manual}
Manual scales are just a list of valid values that are mapped to the unique discrete values. If you want to customise these scales, you need to create your own new scale with the "manual" version of each: `scale_linetype_manual()`, `scale_shape_manual()`, `scale_colour_manual()`, etc. The manual scale has one important argument, `values`, where you specify the values that the scale should produce if this vector is named, it will match the values of the output to the values of the input; otherwise it will match in order of the levels of the discrete variable. You will need some knowledge of the valid aesthetic values, which are described in `vignette("ggplot2-specs")`.
\index{Shape} \index{Line type} \indexf{scale\_shape\_manual} \indexf{scale\_colour\_manual} \indexf{scale\_linetype\_manual}
Manual scales have appeared earlier, in Sections \@ref(manual-colour) and \@ref(scale-shape). In this example we'll show a creative use of `scale_colour_manual()` to display multiple variables on the same plot and show a useful legend. In most plotting systems, you'd colour the lines and then add a legend: \index{Data!longitudinal}
`r columns(1, 5/12, 0.8)`
```{r huron}
huron <- data.frame(year = 1875:1972, level = as.numeric(LakeHuron))
ggplot(huron, aes(year)) +
geom_line(aes(y = level + 5), colour = "red") +
geom_line(aes(y = level - 5), colour = "blue")
```
That doesn't work in ggplot because there's no way to add a legend manually. Instead, give the lines informative labels:
```{r huron2}
ggplot(huron, aes(year)) +
geom_line(aes(y = level + 5, colour = "above")) +
geom_line(aes(y = level - 5, colour = "below"))
```
And then tell the scale how to map labels to colours:
```{r huron3}
ggplot(huron, aes(year)) +
geom_line(aes(y = level + 5, colour = "above")) +
geom_line(aes(y = level - 5, colour = "below")) +
scale_colour_manual("Direction",
values = c("above" = "red", "below" = "blue")
)
```
## Identity scales {#scale-identity}
Identity scales --- such as `scale_colour_identity()` and `scale_shape_identity()` --- are used when your data is already scaled such that the data and aesthetic spaces are the same. The code below shows an example where the identity scale is useful. `luv_colours` contains the locations of all R's built-in colours in the LUV colour space (the space that HCL is based on). A legend is unnecessary, because the point colour represents itself: the data and aesthetic spaces are the same. \index{Scales!identity} \indexf{scale\_identity}
`r columns(1, 1, 0.75)`
```{r scale-identity}
head(luv_colours)
ggplot(luv_colours, aes(u, v)) +
geom_point(aes(colour = col), size = 3) +
scale_color_identity() +
coord_equal()
```