You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
additional first-to-third person pronoun changes (#352)
* me-to-us
* change "I'd" to "we'd"
* rephrase "I'm" to "we"
* rephrase to avoid first-person "I'm not a fan"
* change "I'm" to "we'll"
* change "I'll" to "we'll"
* a few more "I'll" to "we'll" changes
* pluralise in the commented out bit just in case
* rephrase to avoide awkward first person "I regret"
Copy file name to clipboardExpand all lines: getting-started.Rmd
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -341,7 +341,7 @@ ggplot(mpg, aes(hwy)) +
341
341
geom_freqpoly(binwidth = 1)
342
342
```
343
343
344
-
An alternative to the frequency polygon is the density plot, `geom_density()`. I'm not a fan of density plots because they are harder to interpret since the underlying computations are more complex. They also make assumptions that are not true for all data, namely that the underlying distribution is continuous, unbounded, and smooth.
344
+
An alternative to the frequency polygon is the density plot, `geom_density()`. A little care is required if you're using density plots: compared to frequency polygons they are harder to interpret since the underlying computations are more complex. They also make assumptions that are not true for all data, namely that the underlying distribution is continuous, unbounded, and smooth.
345
345
346
346
To compare the distributions of different subgroups, you can map a categorical variable to either fill (for `geom_histogram()`) or colour (for `geom_freqpoly()`). It's easier to compare distributions using the frequency polygon because the underlying perceptual task is easier. You can also use faceting: this makes comparisons a little harder, but it's easier to see the distribution of each group.
Copy file name to clipboardExpand all lines: introduction.Rmd
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -154,7 +154,7 @@ There you will learn about how to control the theming system of ggplot2 and how
154
154
Before we continue, make sure you have all the software you need for this book:
155
155
156
156
-**R**: If you don't have R installed already, you may be reading the wrong book; we assume a basic familiarity with R throughout this book.
157
-
If you'd like to learn how to use R, I'd recommend my[*R for Data Science*](https://r4ds.had.co.nz/) which is designed to get you up and running with R with a minimum of fuss.
157
+
If you'd like to learn how to use R, we'd recommend [*R for Data Science*](https://r4ds.had.co.nz/) which is designed to get you up and running with R with a minimum of fuss.
158
158
159
159
-**RStudio**: RStudio is a free and open source integrated development environment (IDE) for R.
160
160
While you can write and use R code with any R environment (including R GUI and [ESS](http://ess.r-project.org)), RStudio has some nice features specifically for authoring and debugging your code.
Copy file name to clipboardExpand all lines: layers.Rmd
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -96,7 +96,7 @@ observations in the rows. This is a strong restriction, but there are good reaso
96
96
* It enforces a clean separation of concerns: ggplot2 turns data frames into
97
97
visualisations. Other packages can make data frames in the right format.
98
98
99
-
The data on each layer doesn't need to be the same, and it's often useful to combine multiple datasets in a single plot. To illustrate that idea I'm going to generate two new datasets related to the mpg dataset. First we'll fit a loess model and generate predictions from it. (This is what `geom_smooth()` does behind the scenes)
99
+
The data on each layer doesn't need to be the same, and it's often useful to combine multiple datasets in a single plot. To illustrate that idea we'll generate two new datasets related to the mpg dataset. First we'll fit a loess model and generate predictions from it. (This is what `geom_smooth()` does behind the scenes)
Copy file name to clipboardExpand all lines: maps.Rmd
+5-5Lines changed: 5 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -77,10 +77,10 @@ The `coord_sf()` function governs the map projection, discussed in Section \@ref
77
77
78
78
### Layered maps
79
79
80
-
In some instances you may want to overlay one map on top of another. The ggplot2 package supports this by allowing you to add multiple `geom_sf()` layers to a plot. As an example, I'll use the `oz_states` data to draw the Australian states in different colours, and will overlay this plot with the boundaries of Australian electoral regions. To do this, there are two preprocessing steps to perform. First, we'll use `dplyr::filter()` to remove the "Other Territories" from the state boundaries.
80
+
In some instances you may want to overlay one map on top of another. The ggplot2 package supports this by allowing you to add multiple `geom_sf()` layers to a plot. As an example, we'll use the `oz_states` data to draw the Australian states in different colours, and will overlay this plot with the boundaries of Australian electoral regions. To do this, there are two preprocessing steps to perform. First, we'll use `dplyr::filter()` to remove the "Other Territories" from the state boundaries.
81
81
82
82
83
-
The code below draws a plot with two map layers: the first uses `oz_states` to fill the states in different colours, and the second uses `oz_votes` to draw the electoral boundaries. Second, I'll extract the electoral boundaries in a simplified form using the `ms_simplify()` function from the rmapshaper package [@rmapshaper]. This is generally a good idea if the original data set (in this case `ozmaps::abs_ced`) is stored at a higher resolution than your plot requires, in order to reduce the time taken to render the plot.
83
+
The code below draws a plot with two map layers: the first uses `oz_states` to fill the states in different colours, and the second uses `oz_votes` to draw the electoral boundaries. Second, we'll extract the electoral boundaries in a simplified form using the `ms_simplify()` function from the rmapshaper package [@rmapshaper]. This is generally a good idea if the original data set (in this case `ozmaps::abs_ced`) is stored at a higher resolution than your plot requires, in order to reduce the time taken to render the plot.
p + coord_sf(xlim = c(150, 150.25), ylim = c(-36.3, -36))
224
224
```
225
225
226
-
As this illustrates, Eden-Monaro is defined in terms of two distinct polygons, a large one on the Australian mainland and a small island. However, the large region has a hole in the middle (the hole exists because the Australian Capital Territory is a distinct political unit that is wholly contained within Eden-Monaro, and as illustrated above, electoral boundaries in Australia do not cross state lines). In sf terminology this is an example of a `MULTIPOLYGON` geometry. In this section I'll talk about the structure of these objects and how to work with them.
226
+
As this illustrates, Eden-Monaro is defined in terms of two distinct polygons, a large one on the Australian mainland and a small island. However, the large region has a hole in the middle (the hole exists because the Australian Capital Territory is a distinct political unit that is wholly contained within Eden-Monaro, and as illustrated above, electoral boundaries in Australia do not cross state lines). In sf terminology this is an example of a `MULTIPOLYGON` geometry. In this section we'll talk about the structure of these objects and how to work with them.
227
227
228
228
First, let's use dplyr to grab only the geometry object:
229
229
@@ -283,7 +283,7 @@ ggplot(dawson[-69]) +
283
283
284
284
A second way to supply geospatial information for mapping is to rely on **raster data**. Unlike the simple features format, in which geographical entities are specified in terms of a set of lines, points and polygons, rasters take the form of images. In the simplest case raster data might be nothing more than a bitmap file, but there are many different image formats out there. In the geospatial context specifically, there are image formats that include metadata (e.g., geodetic datum, coordinate reference system) that can be used to map the image information to the surface of the Earth. For example, one common format is GeoTIFF, which is a regular TIFF file with additional metadata supplied. Happily, most formats can be easily read into R with the assistance of GDAL (the Geospatial Data Abstraction Library, https://gdal.org/). For example the sf package contains a function `sf::gdal_read()` that provides access to the GDAL raster drivers from R. However, you rarely need to call this function directly, as there are other high level functions that take care of this for you.
285
285
286
-
As an illustration, suppose we wish to plot satellite images made publicly available by the Australian Bureau of Meterorology (BOM) on their FTP server. The bomrang package [@bomrang] provides a convenient interface to the server, including a `get_available_imagery()` function that returns a vector of filenames and a `get_satellite_imagery()` function that downloads a file and imports it directly into R. For expository purposes, however, I'll use a more flexible method that could be adapted to any FTP server, and use the `download.file()` function:
286
+
As an illustration, suppose we wish to plot satellite images made publicly available by the Australian Bureau of Meterorology (BOM) on their FTP server. The bomrang package [@bomrang] provides a convenient interface to the server, including a `get_available_imagery()` function that returns a vector of filenames and a `get_satellite_imagery()` function that downloads a file and imports it directly into R. For expository purposes, however, we'll use a more flexible method that could be adapted to any FTP server, and use the `download.file()` function:
287
287
288
288
```{r eval=FALSE}
289
289
# list of all file names with time stamp 2020-01-07 21:00 GMT
p2 <- base + scale_x_binned(breaks = seq(-50,50,10), limits = c(-50, 50), trans = "reverse")
256
256
```
257
257
258
-
Binned scales can be transformed, much like continuous scales, but some care is required because the bins are constructed in the transformed space. In some cases this can produce undesirable outcomes. In the code below, I take a uniformly distributed variable and use `scale_x_binned()` and `geom_bar()` to construct a histogram of the logarithmically transformed data.
258
+
Binned scales can be transformed, much like continuous scales, but some care is required because the bins are constructed in the transformed space. In some cases this can produce undesirable outcomes. In the code below, we take a uniformly distributed variable and use `scale_x_binned()` and `geom_bar()` to construct a histogram of the logarithmically transformed data.
Copy file name to clipboardExpand all lines: scales-position.Rmd
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -99,7 +99,7 @@ base + coord_cartesian(ylim = c(10, 35)) # works as expected
99
99
base + ylim(10, 35) # distorts the boxplot
100
100
```
101
101
102
-
The only difference between the left and middle plots is that the latter is zoomed in. Some of the outlier points are not shown due to the restriction of the range, but the boxplots themselves remain identical. In contrast, in the plot on the right one of the boxplots has changed. When modifying the scale limits, all observations with highway mileage greater than 35 are converted to `NA` before the stat (in this case the boxplot) is computed. Because these "out of bounds" values are no longer available, the end result is that the sample median is shifted downward, which is almost never desirable behaviour. In hindsight, I regret this design choice as it is a common source of confusion for users. Unfortunately it would be very hard to change this default without breaking a lot of existing code.
102
+
The only difference between the left and middle plots is that the latter is zoomed in. Some of the outlier points are not shown due to the restriction of the range, but the boxplots themselves remain identical. In contrast, in the plot on the right one of the boxplots has changed. When modifying the scale limits, all observations with highway mileage greater than 35 are converted to `NA` before the stat (in this case the boxplot) is computed. Because these "out of bounds" values are no longer available, the end result is that the sample median is shifted downward, which is almost never desirable behaviour. With the benefit of hindsight it's clear this wasn't a good design choice, because it is a common source of confusion for users. Unfortunately it would be very hard to change this default without breaking a lot of existing code.
103
103
104
104
You can learn more about coordinate systems in Section \@ref(cartesian). To learn more about how "out of bounds" values are handled for continuous and binned scales, see Section \@ref(oob).
0 commit comments