forked from hadley/ggplot2-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
extensions.Rmd
730 lines (579 loc) · 40 KB
/
extensions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
```{r include = FALSE}
source("common.R")
```
# Extending ggplot2 {#extensions}
The ggplot2 package has been designed in a way that makes it relatively easy to extend the functionality with new types of the common grammar components. The extension system allows you to distribute these extensions as packages should you choose to, but the ease with which extensions can be made means that writing one-off extensions to solve a particular plotting challenge is also viable. This chapter discusses different ways ggplot2 can be extended and highlights specific issues to keep in mind. We'll present small examples throughout the chapter, but to see a worked example from beginning to end, see Chapters \@ref(spring1).
## New themes
### Modifying themes
Themes are probably the easiest form of extensions as they only require you to write code you would normally write when creating plots with ggplot2. While it is possible to build up a new theme from the ground it is usually easier and less error-prone to modify an existing theme. This approach is often taken in the ggplot2 source. For example, here is the source code for `theme_minimal()`:
```{r}
theme_minimal <- function(base_size = 11,
base_family = "",
base_line_size = base_size/22,
base_rect_size = base_size/22) {
theme_bw(
base_size = base_size,
base_family = base_family,
base_line_size = base_line_size,
base_rect_size = base_rect_size
) %+replace%
theme(
axis.ticks = element_blank(),
legend.background = element_blank(),
legend.key = element_blank(),
panel.background = element_blank(),
panel.border = element_blank(),
strip.background = element_blank(),
plot.background = element_blank(),
complete = TRUE
)
}
```
As you can see, the code doesn't look much different to the code you normally write when styling a plot (Chapter \@ref(polishing)). The `theme_minimal()` function uses `theme_bw()` as the base theme, and then replaces certain parts of it with its own style using the `%+replace%` operator. When writing new themes it is a good idea to provide a few parameters to the user for defining overarching aspects of the theme. One important such aspect is sizing of text and lines but other aspects could be e.g. key and accent colours of the theme. For example, we could create a variant of `theme_minimal()` that allows the user to specify the plot background colour:
`r columns(3, 1)`
```{r}
theme_background <- function(background = "white", ...) {
theme_minimal(...) %+replace%
theme(
plot.background = element_rect(
fill = background,
colour = background
),
complete = TRUE
)
}
base <- ggplot(mpg, aes(displ, hwy)) + geom_point()
base + theme_minimal(base_size = 14)
base + theme_background(base_size = 14)
base + theme_background(base_size = 14, background = "grey70")
```
### Complete themes
An important point to note is the use of `complete = TRUE` in the code for `theme_minimal()` and `theme_background()`. It is always good practice to do this when defining your own themes in a ggplot2 extension package: this will ensure that your theme behaves in the same way as the default theme and as a consequence will be less likely to surprise users. To see why this is necessary, compare these two themes:
```{r}
# good
theme_predictable <- function(...) {
theme_classic(...) %+replace%
theme(
axis.line.x = element_line(color = "blue"),
axis.line.y = element_line(color = "orange"),
complete = TRUE
)
}
# bad
theme_surprising <- function(...) {
theme_classic(...) %+replace%
theme(
axis.line.x = element_line(color = "blue"),
axis.line.y = element_line(color = "orange")
)
}
```
Both themes are intended to do the same thing: change the defaults to `theme_classic()` so that the x-axis is drawn with a blue line, and the y-axis is drawn with an orange line. At first glance, it appears that both versions behave in line with the user expectations:
```{r}
base + theme_classic()
base + theme_predictable()
base + theme_surprising()
```
However, suppose the user of your theme wants to remove the axis lines:
```{r}
base + theme_classic() + theme(axis.line = element_blank())
base + theme_predictable() + theme(axis.line = element_blank())
base + theme_surprising() + theme(axis.line = element_blank())
```
The behaviour of `theme_predictable()` is the same as `theme_classic()` and the axis lines are removed, but for `theme_surprising()` this does not happen. The reason for this is that ggplot2 treats complete themes as a collection of "fallback" values: when the user adds `theme(axis.line = element_blank())` to a complete theme, there is no need to rely on the fallback value for `axis.line.x` or `axis.line.y`, because these are inherited from `axis.line` in the user command. This is a kindness to your users, as it allows them to overwrite everything that inherits from `axis.line` using a command like `theme_predictable() + theme(axis.line = ...)`. In contrast, `theme_surprising()` does not specify a complete theme. When the user calls `theme_surprising()` the fallback values are taken from `theme_classic()`, but more importantly, ggplot2 treats the `theme()` command that sets `axis.line.x` and `axis.line.y` exactly as if the user had typed it. As a consequence, the plot specification is equivalent to this:
```{r}
base +
theme_classic() +
theme(
axis.line.x = element_line(color = "blue"),
axis.line.y = element_line(color = "orange"),
axis.line = element_blank()
)
```
In this code, the specific-first inheritance rule applies, and as such setting `axis.line` does not override the more specific `axis.line.x`.
### Defining theme elements {#defining-theme-elements}
In Chapter \@ref(polishing) we saw that the structure of a ggplot2 theme is defined by the element tree. The element tree specifies what type each theme element has and where it inherits its value from (you can use the `get_element_tree()` function to return this tree as a list). The extension system for ggplot2 makes it possible to define new theme elements by registering them as part of the element tree using the `register_theme_elements()` function. Let's say you're writing a new package called "ggxyz" that includes a panel annotation as part of the coordinate system and you want this panel annotation to be a theme element:
```{r}
register_theme_elements(
ggxyz.panel.annotation = element_text(
color = "blue",
hjust = 0.95,
vjust = 0.05
),
element_tree = list(
ggxyz.panel.annotation = el_def(
class = "element_text",
inherit = "text"
)
)
)
```
There are two points to note here when defining new theme elements in a package:
- It is important to call `register_theme_elements()` from the `.onLoad()` function of your package, so that the new theme elements are available to anybody using functions from your package, irrespective of whether the package has been attached
- It is always a good idea to include the name of your package as a prefix for any new theme elements. That way, if someone else writes a panel annotation package `ggabc`, there is no potential conflict between theme elements `ggxyz.panel.annotation` and `ggabc.panel.annotation`.
Once the element tree has been updated, the package can define a new coordinate system that uses the new theme element. A simple way to do this is to define a function that creates a new instance of the `CoordCartesian` ggproto object. We'll talk more about this in Section \@ref(new-coords), but for now it is sufficient to note that this code will work:
```{r}
coord_annotate <- function(label = "panel annotation") {
ggproto(NULL, CoordCartesian,
limits = list(x = NULL, y = NULL),
expand = TRUE,
default = FALSE,
clip = "on",
render_fg = function(panel_params, theme) {
element_render(
theme = theme,
element = "ggxyz.panel.annotation",
label = label
)
}
)
}
```
So now this works:
`r columns(2, 1)`
```{r, eval=FALSE}
base + coord_annotate("annotation in blue")
base + coord_annotate("annotation in blue") + theme_dark()
```
```{r, echo=FALSE}
# DJN: I'm not sure why, because I can't reproduce the bug elsewhere, but the
# call to register_theme_element() updates ggplot2:::ggplot_global$element_tree
# only within *that* chunk, so subsequent chunks don't have ggxyz.panel.annotation
# in the element tree. For now, this is a hacky fix:
register_theme_elements(
ggxyz.panel.annotation = element_text(
color = "blue",
hjust = 0.95,
vjust = 0.05
),
element_tree = list(
ggxyz.panel.annotation = el_def(
class = "element_text",
inherit = "text"
)
)
)
base + coord_annotate("annotation in blue")
base + coord_annotate("annotation in blue") + theme_dark()
```
Having modified the element tree, it is worth mentioning the `reset_theme_settings()` function restores the default element tree, discards all new element definitions, and (unless turned off) resets the currently active theme to the default.
## New stats {#new-stats}
It may seem surprising, but creating new stats is one of the most useful ways to extend the capabilities of ggplot2. When users add new layers to a plot they most often use a geom function, and so it is tempting as a developer to think that your ggplot2 extension should be encapsulated as a new geom. To an extent this is true, as your users will likely want to use a geom function, but in truth the variety among different geoms is mostly due to the variety in different stats. One of the benefits of working with stats is that they are purely about data transformations. Most R users and developers are very comfortable with data transformation, which makes the task of defining a new stat easier. As long as the desired behaviour can be encapsulated in a stat, there is no need to fiddle with any calls to grid.
### Creating stats
As discussed in Chapter \@ref(internals), the core behaviour of a stat is captured by a tiered succession of calls to `compute_layer()`, `compute_panel()`, and `compute_group()`, all of which are methods associated with the ggproto object defining the stat. By default the top two functions don't do very much, they simply split the data and then pass it down to the function below:
- `compute_layer()` splits the data set by the `PANEL` column, calls `compute_panel()`, and reassembles the results.
- `compute_panel()` splits the panel data by the `group` column, calls `compute_group()`, and reassembles the results.
Because of this, the only method you usually need to specify as a developer is the `compute_group()` function, whose job is to take the data for a single group and transform it appropriately. This will be sufficient to create a working stat, though it may not yield the best performance. As a consequence developers sometimes find it valuable to offload some of the work to `compute_panel()` where possible: doing so allows you to vectorise computations and avoid an expensive split-combine step. However, as a general rule it is better to begin by modifying `compute_group()` only and see if the performance is adequate.
To illustrate this, we'll start by creating a stat that calculates the convex hull of a set of points, using the `chull()` function included in `grDevices`. As you might expect, most of the work is done by a new ggproto object that we will create:
```{r}
StatChull <- ggproto("StatChull", Stat,
compute_group = function(data, scales) {
data[chull(data$x, data$y), , drop = FALSE]
},
required_aes = c("x", "y")
)
```
As described in Section \@ref(ggproto) the first two arguments to `ggproto()` are used to indicate that this object defines a new class (conveniently named `"StatChull"`) which inherits fields and methods from the `Stat` object. We then specify only those fields and methods that need to be altered from the defaults provided by `Stat`, in this case `compute_group()` and `required_aes`. Our `compute_group()` function takes two inputs, `data` and `scales`---because this is what ggplot2 expects---but the actual computation is dependent only on the `data`. Note that because the computation necessarily requires both position aesthetics to be present, we have also specified the `required_aes` field to make sure that ggplot2 knows that these aesthetics are required.
By creating this ggproto object we have a working stat, but have not yet given the user a way to access it. To address this we write a layer function, `stat_chull()`. All layer functions have the same form: you specify defaults in the function arguments and then call `layer()`, sending `...` into the `params` argument. The arguments in `...` will either be arguments for the geom (if you're making a stat wrapper), arguments for the stat (if you're making a geom wrapper), or aesthetics to be set. `layer()` takes care of teasing the different parameters apart and making sure they're stored in the right place. So our `stat_chull()` function looks like this
```{r}
stat_chull <- function(mapping = NULL, data = NULL,
geom = "polygon", position = "identity",
na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, ...) {
layer(
stat = StatChull,
data = data,
mapping = mapping,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
```
and our stat can now be used in plots:
`r columns(2, 1)`
```{r}
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
stat_chull(fill = NA, colour = "black")
ggplot(mpg, aes(displ, hwy, colour = drv)) +
geom_point() +
stat_chull(fill = NA)
```
When creating new stats it is usually a good idea to provide an accompanying `geom_*()` constructor as well as the `stat_*()` constructor, because most users are accustomed to adding plot layers with geoms rather than stats. We'll show what a `geom_chull()` function might look like in Section \@ref(new-geoms).
Note that it is not always possible to define `geom_*()` constructor in a sensible way. This can happen when there is no obvious default geom for the new stat, or if the stat is intended to offer a slight modification to an existing geom/stat pair. In such cases it may be wise to provide only a `stat_*()` function.
### Modifying parameters and data {#modifying-stat-params}
When defining new stats, it is often necessary to specify the `setup_params()` and/or `setup_data()` functions. These are called before the `compute_*()` functions and they allow the Stat to react and modify itself in response to the parameters and data (especially the data, as this is not available when the stat is constructed):
- The `setup_params()` function is called first. It takes two arguments corresponding to the layer `data` and a list of parameters (`params`) specified during construction, and returns a modified list of parameters that will be used in later computations. Because the parameters are used by the `compute_*()` functions, the elements of the list should correspond to argument names in the `compute_*()` functions in order to be made available.
- The `setup_data()` function is called next. It also takes `data` and `params` as input---though the parameters it receives are the modified parameters returned from `setup_params()`---and returns the modified layer data. It is important that no matter what modifications happen in `setup_data()` the `PANEL` and `group` columns remain intact.
In the example below we show how to use the `setup_params()` method to define a new stat. An example modifying the `setup_data()` method is included later, in Section \@ref(modifying-geoms).
Suppose we want to create `StatDensityCommon`, a stat that computes a density estimate of a variable after estimating a default bandwidth to apply to all groups in the data. This could be done in many different ways but for simplicity let's imagine we have a function `common_bandwidth()` that estimates the bandwidth separately for each group using the `bw.nrd0()` function and then returns the average:
```{r common-bandwidth}
common_bandwidth <- function(data) {
split_data <- split(data$x, data$group)
bandwidth <- mean(vapply(split_data, bw.nrd0, numeric(1)))
return(bandwidth)
}
```
What we want from `StatDensityCommon` is to use the `common_bandwith()` function to set a common bandwidth before the data are separated by group and passed to the `compute_group()` function. This is where the `setup_params()` method is useful:
```{r stat-density-common}
StatDensityCommon <- ggproto("StatDensityCommon", Stat,
required_aes = "x",
setup_params = function(data, params) {
if(is.null(params$bandwith)) {
params$bandwidth <- common_bandwidth(data)
message("Picking bandwidth of ", signif(params$bandwidth, 3))
}
return(params)
},
compute_group = function(data, scales, bandwidth = 1) {
d <- density(data$x, bw = bandwidth)
return(data.frame(x = d$x, y = d$y))
}
)
```
We then define a `stat_*()` function in the usual way:
```{r stat-density-common-2}
stat_density_common <- function(mapping = NULL, data = NULL,
geom = "line", position = "identity",
na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, bandwidth = NULL, ...) {
layer(
stat = StatDensityCommon,
data = data,
mapping = mapping,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
bandwidth = bandwidth,
na.rm = na.rm,
...
)
)
}
```
We can now apply our new stat
`r columns(1, 1)`
```{r}
ggplot(mpg, aes(displ, colour = drv)) +
stat_density_common()
```
## New geoms {#new-geoms}
While many things can be achieved by creating new stats, there are situations where creating a new geom is necessary. Some of these are
- It is not meaningful to return data from the stat in a form that is understandable by any current geoms.
- The layer need to combine the output of multiple geoms.
- The geom needs to return grobs not currently available from existing geoms.
Creating new geoms can feel slightly more daunting than creating new stats as the end result is a collection of grobs rather than a modified data.frame and this is something outside of the comfort zone of many developers. Still, apart from the last point above, it is possible to get by without having to think too much about grid and grobs.
### Modifying geom defaults {#modifying-geom-defaults}
In many situations your new geom may simply be an existing geom that expects slightly different input or has different default parameter values. The `stat_chull()` example from the previous section is a good example of this. Notice that in when creating plots using `stat_chull()` we had to manually specify the `fill` and `colour` parameters if those were not mapped to aesthetics. The reason for this is that `GeomPolygon` creates a borderless filled polygon by default, and this is not well suited to the needs of our convex hull geom. To make our lives a little easier then, we can create a subclass of `GeomPolygon` that modifies the defaults so that it produces a hollow polygon by default. We can do this in a straightforward way by overriding the `default_aes` value:
```{r}
GeomPolygonHollow <- ggproto("GeomPolygonHollow", GeomPolygon,
default_aes = aes(
colour = "black",
fill = NA,
linewidth = 0.5,
linetype = 1,
alpha = NA
)
)
```
We can now define our `geom_chull()` constructor function using `GeomPolygonHollow` as the default geom:
```{r}
geom_chull <- function(mapping = NULL, data = NULL, stat = "chull",
position = "identity", na.rm = FALSE,
show.legend = NA, inherit.aes = TRUE, ...) {
layer(
geom = GeomPolygonHollow,
data = data,
mapping = mapping,
stat = stat,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
```
For the sake of consistency we would also define `stat_chull()` to use this as the default. In any case, we now have a new `geom_chull()` function that works fairly well without the user needing to set parameters:
`r columns(1, 1)`
```{r}
ggplot(mpg, aes(displ, hwy)) +
geom_chull() +
geom_point()
```
### Modifying geom data {#modifying-geom-data}
In other cases you may want to define a geom that is visually equivalent to an existing geom, but accepts data in a different format. An example of this in the ggplot2 source code is `geom_spoke()`, a variation of `geom_segment()` that accepts data in polar coordinates. To make this work, the `GeomSpoke` ggproto object is subclassed from `GeomSegment`, and uses the `setup_data()` method to take polar coordinate data from the user and then transform it to the format that `GeomSegment` expects. To illustrate this technique we'll create `geom_spike()`, a geom that re-implements the functionality of `geom_spoke()`. This requires us to overwrite the `required_aes` field as well as the `setup_data()` method:
```{r}
GeomSpike <- ggproto("GeomSpike", GeomSegment,
# Specify the required aesthetics
required_aes = c("x", "y", "angle", "radius"),
# Transform the data before any drawing takes place
setup_data = function(data, params) {
transform(data,
xend = x + cos(angle) * radius,
yend = y + sin(angle) * radius
)
}
)
```
We now write the user facing `geom_spike()` function:
```{r}
geom_spike <- function(mapping = NULL, data = NULL,
stat = "identity", position = "identity",
..., na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
geom = GeomSpike,
stat = stat,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
```
We are now able to use `geom_spike()` in plots:
`r columns(1, 1/2, 1)`
```{r}
df <- data.frame(
x = 1:10,
y = 0,
angle = seq(from = 0, to = 2 * pi, length.out = 10),
radius = seq(from = 0, to = 2, length.out = 10)
)
ggplot(df, aes(x, y)) +
geom_spike(aes(angle = angle, radius = radius)) +
coord_equal()
```
As with stats, geoms have a `setup_params()` method in addition to the `setup_data()` method, which can be used to modify parameters before any drawing takes place (see Section \@ref(modifying-stat-params) for an example). One thing to note in the geom context, however, is that `setup_data()` is called before any position adjustment is done.
### Combining multiple geoms
A useful technique for defining new geoms is to combine functionality from different geoms. For example, the `geom_smooth()` function for drawing nonparametric regression lines uses functionality from `geom_line()` to draw the regression line and `geom_ribbon()` to draw the shaded error bands. To do this within your new geom, it is helpful to consider the drawing process. In much the same way that a stat works by a tiered succession of calls to `compute_layer()` then `compute_panel()` and finally `compute_group()`, a geom is constructed by calls to `draw_layer()`, `draw_panel()`, and `draw_group()`.
If you want to combine the functionality of multiple geoms it can usually be achieved by preparing the data for each of the geoms inside the `draw_*()` call
and send it off to the different geoms, collecting the output using `grid::gList()` when a list of grobs is needed or `grid::gTree()` if a single grob with multiple children is required. As a relatively minimal example, consider the `GeomBarbell` ggproto object that creates geoms consisting of two points connected by a bar:
```{r}
GeomBarbell <- ggproto("GeomBarbell", Geom,
required_aes = c("x", "y", "xend", "yend"),
default_aes = aes(
colour = "black",
linewidth = .5,
size = 2,
linetype = 1,
shape = 19,
fill = NA,
alpha = NA,
stroke = 1
),
draw_panel = function(data, panel_params, coord, ...) {
# Transformed data for the points
point1 <- transform(data)
point2 <- transform(data, x = xend, y = yend)
# Return all three components
grid::gList(
GeomSegment$draw_panel(data, panel_params, coord, ...),
GeomPoint$draw_panel(point1, panel_params, coord, ...),
GeomPoint$draw_panel(point2, panel_params, coord, ...)
)
}
)
```
In this example the `draw_panel()` method returns a list of three grobs, one generated from `GeomSegment` and two from `GeomPoint`. As usual, if we want the geom to be exposed to the user we add a wrapper function:
```{r, cache.vars=GeomBarbell}
geom_barbell <- function(mapping = NULL, data = NULL,
stat = "identity", position = "identity",
..., na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomBarbell,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
```
We are now able to use the composite geom:
`r columns(2, 1)`
```{r}
df <- data.frame(x = 1:10, xend = 0:9, y = 0, yend = 1:10)
base <- ggplot(df, aes(x, y, xend = xend, yend = yend))
base + geom_barbell()
base + geom_barbell(shape = 4, linetype = "dashed")
```
If you cannot leverage any existing geom implementation for creating the grobs, you'd have to implement the full `draw_*()` method from scratch, which requires a little more understanding of the grid package. For more information about grid and an example that uses this to construct a geom from grid primitives, see Chapter \@ref(spring1).
## New coords {#new-coords}
The primary role of the coord is to rescale the position aesthetics onto the [0, 1] range, potentially transforming them in the process. Defining new coords is relatively rare: the coords described in Chapter \@ref(coord) are suitable for most non-cartographic cases, and with the introduction of `coord_sf()` discussed in Chapter \@ref(maps), ggplot2 is able to capture most cartographic projections out of the box.
The most common situation in which developers may need to know the internals of coordinate systems is when defining new geoms. It is not uncommon for one of the `draw_*()` methods in a geom to call the `transform()` method of the coord. For example, the `transform()` method for `CoordCartesian` is used to rescale position data but does not transform it in any other way, and the geom may need to apply this rescaling to draw the grob properly. An example of this use appears in Chapter \@ref(spring1).
In addition to transforming position data, the coord has responsibility for rendering the axes, axis labels, panel foreground and panel background. Additionally, the coord can intercept and modify the layer data and the facet layout. Much of this functionality is available to developers to leverage if it is absolutely necessary (an example is shown in Section \@ref(defining-theme-elements)), but in the majority of cases it is better to leave this functionality alone.
## New scales
There are three ways one might want to extend ggplot2 with new scales. The simplest case is when you want to provide a convenient wrapper for a new palette, typically for a colour or fill aesthetic. As an impractical example, suppose you wanted to sample random colours to fill a violin or box plot, using a palette function like this:
```{r}
random_colours <- function(n) {
sample(colours(distinct = TRUE), n, replace = TRUE)
}
```
We can then write a `scale_fill_random()` constructor function that passes the palette to `discrete_scale()` and then use it in plots:
`r columns(1, 1)`
```{r}
scale_fill_random <- function(..., aesthetics = "fill") {
discrete_scale(
aesthetics = aesthetics,
scale_name = "random",
palette = random_colours
)
}
ggplot(mpg, aes(hwy, class, fill = class)) +
geom_violin(show.legend = FALSE) +
scale_fill_random()
```
Another relatively simple case is where you provide a geom that takes a new type of aesthetic that needs to be scaled. Let's say that you created a new line geom, and instead of the `size` aesthetic you decided on using a `width` aesthetic. In order to get `width` scaled in the same way as you've come to expect scaling of `size` you must provide a default scale for the aesthetic. Default scales are found based on their name and the data type provided to the aesthetic. If you assign continuous values to the `width` aesthetic ggplot2 will look for a `scale_width_continuous()` function and use this if no other width scale has been added to the plot. If such a function is not found (and no width scale was added explicitly), the aesthetic will not be scaled.
A last possibility worth mentioning, but outside the scope of this book, is the possibility of creating a new primary scale type. Historically, ggplot2 has had two primary scale types, continuous and discrete. Recently the binned scale type joined which allows for binning of continuous data into discrete bins. It is possible to develop further primary scales, by following the example of `ScaleBinned`. It requires subclassing `Scale` or one of the provided primary scales, and create new `train()` and `map()` methods, among others.
## New positions
The `Position` ggproto class is somewhat simpler than other ggproto classes, reflecting the fact that the `position_*()` functions have a very narrow scope. The role of the position is to receive and modify the data immediately before it is passed to any drawing functions. Strictly speaking, the position is able to modify the data in any fashion, but there is an implicit expectation that it modifies position aesthetics only. A position possesses `compute_layer()` and `compute_panel()` methods that behave analogously to the equivalent methods for a stat, but it does not possess a `compute_group()` method. It also contains `setup_params()` and `setup_data()` methods that are similar to the `setup_*()` methods for other ggproto classes, with one notable exception: the `setup_params()` method only receives the data as input, and not a list of parameter. The reason for this is that `position_*()` functions are never used on their own in ggplot2: rather, they are always called within the main `geom_*()` or `stat_*()` command that specifies the layer, and the parameters from the main command are not passed to the `position_*()` function call.
To give a simple example, we'll implement a slightly simplified version of the `position_jitternormal()` function from the ggforce package, which behaves in the same way as `position_jitter()` except that the perturbations are sampled from a normal distribution rather than a uniform distribution. In order to keep the exposition simple, we'll assume we have the following convenience function defined:
```{r}
normal_transformer <- function(x, sd) {
function(x) {x + rnorm(length(x), sd = sd)}
}
```
When called, `normal_transformer()` returns a function that perturbs the input vector by adding random noise with mean zero and standard deviation `sd`. The first step when creating our new position is to make a subclass of the `Position` object:
```{r}
PositionJitterNormal <- ggproto('PositionJitterNormal', Position,
# We need an x and y position aesthetic
required_aes = c('x', 'y'),
# By using the "self" argument we can access parameters that the
# user has passed to the position, and add them as layer parameters
setup_params = function(self, data) {
list(
sd_x = self$sd_x,
sd_y = self$sd_y
)
},
# When computing the layer, we can read the standard deviation
# parameters off the param list, and use them to transform the
# position aesthetics
compute_layer = function(data, params, panel) {
# construct transformers for the x and y position scales
x_transformer <- normal_transformer(x, params$sd_x)
y_transformer <- normal_transformer(y, params$sd_y)
# return the transformed data
transform_position(
df = data,
trans_x = x_transformer,
trans_y = y_transformer
)
}
)
```
The `compute_layer()` method makes use of `transform_position()`, a convenience function provided by ggplot2 whose role is to apply the user-supplied functions to all aesthetics associated with the relevant position scale (e.g., not just x and y, but also xend and yend).
In a realistic implementation, the `position_jitternormal()` constructor would apply some input validation to make sure the user has not specified negative standard deviations, but in this context we'll keep it simple:
```{r}
position_jitternormal <- function(sd_x = .15, sd_y = .15) {
ggproto(NULL, PositionJitterNormal, sd_x = sd_x, sd_y = sd_y)
}
```
We are now able to use our new position function when creating plots. To see the difference between `position_jitter()` and the `position_jitternormal()` function we have just defined, compare the following plots:
`r columns(2, 1)`
```{r}
df <- data.frame(
x = sample(1:3, 1500, TRUE),
y = sample(1:3, 1500, TRUE)
)
ggplot(df, aes(x, y)) + geom_point(position = position_jitter())
ggplot(df, aes(x, y)) + geom_point(position = position_jitternormal())
```
One practical consideration to keep in mind when designing new positions is that users very rarely call the position constructor directly. The command specifying the layer is more likely to include an expression like `position = "dodge"` rather than `position = position_dodge()`, and even less likely to override your default values as would occur if the user specified `position = position_dodge(width = 0.9)`. As a consequence, it is important to think carefully and make the defaults work for most cases if at all possible. This can be quite tricky: positions have very little control over the shape and format of the layer data, but the user will expect them to behave predictably in all situations. An example is the case of dodging, where users might like to dodge a boxplot and a point-cloud, and would expect the point-cloud to appear in the same area as its respective boxplot. This is a perfectly reasonable expectation at the user level, but it can be tricky for the developer. A boxplot has an explicit width that can be used to control the dodging whereas the same is not true for points, but the user will expect them to be moved in the same way. Such considerations often mean that position implementations end up much more complex than their simplest solution to take care of a wide range of edge cases.
## New facets
Facets are one of the most powerful concepts in ggplot2, and extending facets is one of the most powerful ways to modify how ggplot2 operates. This power comes at a cost: facets are responsible for receiving all the panels, attaching the axes and strips to them, and then arranging them in the expected manner. To create an entirely new faceting system requires an in-depth understanding of grid and gtable, and can be a daunting challenge. Fortunately, you don't always need to create the facet from scratch. For example, if your new facet will produce panels that lie on a grid, you can often subclass `FacetWrap` or `FacetGrid` and modify one or two methods. In particular, you may wish to define new `compute_layout()` and/OR `map_data()` methods:
- The `compute_layout()` method receives the original data set and creates a layout specification, a data frame with one row per panel that indicates where each panel falls on the grid, along with information about which axis limits should be free and which should be fixed.
- The `map_data()` method receives this layout specification and the original data as the input, and attaches a `PANEL` column to it, which is used to assign each row in the data frame to one of the panels in the layout.
To illustrate how you can create new facets by subclassing an existing facet, we'll create a relatively simple facetting system that "scatters" the panels, placing them in random locations on a grid. To do this we'll create a new ggproto object called `FacetScatter` that is a subclass of `FacetWrap`, and write a new `compute_layout()` method that places each panel in a randomly chosen cell of the panel grid:
```{r}
FacetScatter <- ggproto("FacetScatter", FacetWrap,
# This isn't important to the example: all we're doing is
# forcing all panels to use fixed scale so that the rest
# of the example can be kept simple
setup_params = function(data, params) {
params <- FacetWrap$setup_params(data, params)
params$free <- list(x = FALSE, y = FALSE)
return(params)
},
# The compute_layout() method does the work
compute_layout = function(data, params) {
# create a data frame with one column per facetting
# variable, and one row for each possible combination
# of values (i.e., one row per panel)
panels <- combine_vars(
data = data,
env = params$plot_env,
vars = params$facets,
drop = FALSE
)
# Create a data frame with columns for ROW and COL,
# with one row for each possible cell in the panel grid
locations <- expand.grid(ROW = 1:params$nrow, COL = 1:params$ncol)
# Randomly sample a subset of the locations
shuffle <- sample(nrow(locations), nrow(panels))
# Assign each panel a location
layout <- data.frame(
PANEL = 1:nrow(panels), # panel identifier
ROW = locations$ROW[shuffle], # row number for the panels
COL = locations$COL[shuffle], # column number for the panels
SCALE_X = 1L, # all x-axis scales are fixed
SCALE_Y = 1L # all y-axis scales are fixed
)
# Bind the layout information with the panel identification
# and return the resulting specification
return(cbind(layout, panels))
}
)
```
To give you a sense of what this output looks like, this is the layout specification that is created when building the plot shown at the end of this section:
```{r facet-scatter, echo=FALSE}
facet_scatter <- function(facets, nrow, ncol,
strip.position = "top",
labeller = "label_value") {
ggproto(NULL, FacetScatter,
params = list(
facets = rlang::quos_auto_name(facets),
strip.position = strip.position,
labeller = labeller,
ncol = ncol,
nrow = nrow
)
)
}
```
```{r facet-scatter-plot, echo=FALSE}
scatter <- ggplot(mpg, aes(displ, hwy)) +
geom_point() +
facet_scatter(vars(manufacturer), nrow = 5, ncol = 6)
scatter_built <- ggplot_build(scatter)
scatter_built$layout$layout
```
Next, we'll write the `facet_scatter()` constructor function to expose this functionality to the user. For facets this is as simple as creating a new instance of the relevant ggproto object (`FacetScatter` in this case) that passes user-specified parameters to the facet:
```{r, ref.label="facet-scatter"}
```
There a couple of things to note about this constructor function. First, to keep the example simple, `facet_scatter()` contains fewer arguments than `facet_wrap()`, and we've made `nrow` and `ncol` required arguments: the user needs to specify the size of the grid over which the panels should be scattered. Second, the `facet_scatter()` function requires you to specify the facets using `vars()`. It won't work if the user tries to supply a formula. Relatedly, note the use of `rlang::quos_auto_name()`: the `vars()` function returns an unnamed list of expressions (technically, quosures), but the downstream code requires a named list. As long as you're expecting the user to use `vars()` this is all the preprocessing you need, but if you want to support other input formats you'll need to be a little fancier (you can see how to do this by looking at the ggplot2 source code).
In any case, we now have a working facet:
`r columns(1, 1, 1)`
```{r eval=FALSE}
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
facet_scatter(vars(manufacturer), nrow = 5, ncol = 6)
```
```{r echo=FALSE}
# Use the earlier built one so that the random ROW and COL info
# matches the layout specification shown above
scatter_built$plot
```
<!-- ## New guides -->
<!-- >Should probably not mention anything until they have been ported to `ggproto` -->