Skip to content

Commit

Permalink
additional first-to-third person pronoun changes (#352)
Browse files Browse the repository at this point in the history
* me-to-us

* change "I'd" to "we'd"

* rephrase "I'm" to "we"

* rephrase to avoid first-person "I'm not a fan"

* change "I'm" to "we'll"

* change "I'll" to "we'll"

* a few more "I'll" to "we'll" changes

* pluralise in the commented out bit just in case

* rephrase to avoide awkward first person "I regret"
  • Loading branch information
djnavarro authored Feb 8, 2023
1 parent 2dbb8eb commit a14ebbd
Show file tree
Hide file tree
Showing 8 changed files with 12 additions and 12 deletions.
2 changes: 1 addition & 1 deletion collective-geoms.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ In the plot on the right, the "shaded bars" for each `class` have been construct
1. Install the babynames package. It contains data about the popularity of
baby names in the US. Run the following code and fix the resulting graph.
Why does this graph make me unhappy?
Why does this graph make us unhappy?
```{r, eval = FALSE}
library(babynames)
Expand Down
2 changes: 1 addition & 1 deletion ext-springs.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ This gives us two parameters for our spring:

- The `tension`, how fast we move along x.

While I'm pretty sure this is not a physically correct parameterisation of a spring, it is good enough for us.
Although we can be pretty sure this is not a physically correct parameterisation of a spring, it is good enough for us.

At this point, it's worthwhile to spend a little time thinking about how we might turn this into a geom.
How will we specify the diameter?
Expand Down
2 changes: 1 addition & 1 deletion getting-started.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -341,7 +341,7 @@ ggplot(mpg, aes(hwy)) +
geom_freqpoly(binwidth = 1)
```

An alternative to the frequency polygon is the density plot, `geom_density()`. I'm not a fan of density plots because they are harder to interpret since the underlying computations are more complex. They also make assumptions that are not true for all data, namely that the underlying distribution is continuous, unbounded, and smooth.
An alternative to the frequency polygon is the density plot, `geom_density()`. A little care is required if you're using density plots: compared to frequency polygons they are harder to interpret since the underlying computations are more complex. They also make assumptions that are not true for all data, namely that the underlying distribution is continuous, unbounded, and smooth.

To compare the distributions of different subgroups, you can map a categorical variable to either fill (for `geom_histogram()`) or colour (for `geom_freqpoly()`). It's easier to compare distributions using the frequency polygon because the underlying perceptual task is easier. You can also use faceting: this makes comparisons a little harder, but it's easier to see the distribution of each group.

Expand Down
2 changes: 1 addition & 1 deletion introduction.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ There you will learn about how to control the theming system of ggplot2 and how
Before we continue, make sure you have all the software you need for this book:

- **R**: If you don't have R installed already, you may be reading the wrong book; we assume a basic familiarity with R throughout this book.
If you'd like to learn how to use R, I'd recommend my [*R for Data Science*](https://r4ds.had.co.nz/) which is designed to get you up and running with R with a minimum of fuss.
If you'd like to learn how to use R, we'd recommend [*R for Data Science*](https://r4ds.had.co.nz/) which is designed to get you up and running with R with a minimum of fuss.

- **RStudio**: RStudio is a free and open source integrated development environment (IDE) for R.
While you can write and use R code with any R environment (including R GUI and [ESS](http://ess.r-project.org)), RStudio has some nice features specifically for authoring and debugging your code.
Expand Down
2 changes: 1 addition & 1 deletion layers.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ observations in the rows. This is a strong restriction, but there are good reaso
* It enforces a clean separation of concerns: ggplot2 turns data frames into
visualisations. Other packages can make data frames in the right format.

The data on each layer doesn't need to be the same, and it's often useful to combine multiple datasets in a single plot. To illustrate that idea I'm going to generate two new datasets related to the mpg dataset. First we'll fit a loess model and generate predictions from it. (This is what `geom_smooth()` does behind the scenes)
The data on each layer doesn't need to be the same, and it's often useful to combine multiple datasets in a single plot. To illustrate that idea we'll generate two new datasets related to the mpg dataset. First we'll fit a loess model and generate predictions from it. (This is what `geom_smooth()` does behind the scenes)

```{r loess-pred}
mod <- loess(hwy ~ displ, data = mpg)
Expand Down
10 changes: 5 additions & 5 deletions maps.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -77,10 +77,10 @@ The `coord_sf()` function governs the map projection, discussed in Section \@ref

### Layered maps

In some instances you may want to overlay one map on top of another. The ggplot2 package supports this by allowing you to add multiple `geom_sf()` layers to a plot. As an example, I'll use the `oz_states` data to draw the Australian states in different colours, and will overlay this plot with the boundaries of Australian electoral regions. To do this, there are two preprocessing steps to perform. First, we'll use `dplyr::filter()` to remove the "Other Territories" from the state boundaries.
In some instances you may want to overlay one map on top of another. The ggplot2 package supports this by allowing you to add multiple `geom_sf()` layers to a plot. As an example, we'll use the `oz_states` data to draw the Australian states in different colours, and will overlay this plot with the boundaries of Australian electoral regions. To do this, there are two preprocessing steps to perform. First, we'll use `dplyr::filter()` to remove the "Other Territories" from the state boundaries.


The code below draws a plot with two map layers: the first uses `oz_states` to fill the states in different colours, and the second uses `oz_votes` to draw the electoral boundaries. Second, I'll extract the electoral boundaries in a simplified form using the `ms_simplify()` function from the rmapshaper package [@rmapshaper]. This is generally a good idea if the original data set (in this case `ozmaps::abs_ced`) is stored at a higher resolution than your plot requires, in order to reduce the time taken to render the plot.
The code below draws a plot with two map layers: the first uses `oz_states` to fill the states in different colours, and the second uses `oz_votes` to draw the electoral boundaries. Second, we'll extract the electoral boundaries in a simplified form using the `ms_simplify()` function from the rmapshaper package [@rmapshaper]. This is generally a good idea if the original data set (in this case `ozmaps::abs_ced`) is stored at a higher resolution than your plot requires, in order to reduce the time taken to render the plot.

`r columns(n = 1, aspect_ratio = 1)`
```{r}
Expand Down Expand Up @@ -223,7 +223,7 @@ p + coord_sf(xlim = c(147.75, 150.25), ylim = c(-37.5, -34.5))
p + coord_sf(xlim = c(150, 150.25), ylim = c(-36.3, -36))
```

As this illustrates, Eden-Monaro is defined in terms of two distinct polygons, a large one on the Australian mainland and a small island. However, the large region has a hole in the middle (the hole exists because the Australian Capital Territory is a distinct political unit that is wholly contained within Eden-Monaro, and as illustrated above, electoral boundaries in Australia do not cross state lines). In sf terminology this is an example of a `MULTIPOLYGON` geometry. In this section I'll talk about the structure of these objects and how to work with them.
As this illustrates, Eden-Monaro is defined in terms of two distinct polygons, a large one on the Australian mainland and a small island. However, the large region has a hole in the middle (the hole exists because the Australian Capital Territory is a distinct political unit that is wholly contained within Eden-Monaro, and as illustrated above, electoral boundaries in Australia do not cross state lines). In sf terminology this is an example of a `MULTIPOLYGON` geometry. In this section we'll talk about the structure of these objects and how to work with them.

First, let's use dplyr to grab only the geometry object:

Expand Down Expand Up @@ -283,7 +283,7 @@ ggplot(dawson[-69]) +

A second way to supply geospatial information for mapping is to rely on **raster data**. Unlike the simple features format, in which geographical entities are specified in terms of a set of lines, points and polygons, rasters take the form of images. In the simplest case raster data might be nothing more than a bitmap file, but there are many different image formats out there. In the geospatial context specifically, there are image formats that include metadata (e.g., geodetic datum, coordinate reference system) that can be used to map the image information to the surface of the Earth. For example, one common format is GeoTIFF, which is a regular TIFF file with additional metadata supplied. Happily, most formats can be easily read into R with the assistance of GDAL (the Geospatial Data Abstraction Library, https://gdal.org/). For example the sf package contains a function `sf::gdal_read()` that provides access to the GDAL raster drivers from R. However, you rarely need to call this function directly, as there are other high level functions that take care of this for you.

As an illustration, suppose we wish to plot satellite images made publicly available by the Australian Bureau of Meterorology (BOM) on their FTP server. The bomrang package [@bomrang] provides a convenient interface to the server, including a `get_available_imagery()` function that returns a vector of filenames and a `get_satellite_imagery()` function that downloads a file and imports it directly into R. For expository purposes, however, I'll use a more flexible method that could be adapted to any FTP server, and use the `download.file()` function:
As an illustration, suppose we wish to plot satellite images made publicly available by the Australian Bureau of Meterorology (BOM) on their FTP server. The bomrang package [@bomrang] provides a convenient interface to the server, including a `get_available_imagery()` function that returns a vector of filenames and a `get_satellite_imagery()` function that downloads a file and imports it directly into R. For expository purposes, however, we'll use a more flexible method that could be adapted to any FTP server, and use the `download.file()` function:

```{r eval=FALSE}
# list of all file names with time stamp 2020-01-07 21:00 GMT
Expand Down Expand Up @@ -314,7 +314,7 @@ img_vis <- file.path("raster", "IDE00422.202001072100.tif")
img_inf <- file.path("raster", "IDE00421.202001072100.tif")
```

To import the data in the img_visible file into R, I'll use the stars package [@stars] to import the data as stars objects:
To import the data in the img_visible file into R, we'll use the stars package [@stars] to import the data as stars objects:

```{r}
library(stars)
Expand Down
2 changes: 1 addition & 1 deletion scales-guides.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ p1 <- base + scale_x_binned(breaks = seq(-50,50,10), limits = c(-50, 50))
p2 <- base + scale_x_binned(breaks = seq(-50,50,10), limits = c(-50, 50), trans = "reverse")
```
Binned scales can be transformed, much like continuous scales, but some care is required because the bins are constructed in the transformed space. In some cases this can produce undesirable outcomes. In the code below, I take a uniformly distributed variable and use `scale_x_binned()` and `geom_bar()` to construct a histogram of the logarithmically transformed data.
Binned scales can be transformed, much like continuous scales, but some care is required because the bins are constructed in the transformed space. In some cases this can produce undesirable outcomes. In the code below, we take a uniformly distributed variable and use `scale_x_binned()` and `geom_bar()` to construct a histogram of the logarithmically transformed data.
`r columns(1, 1/2, 1)`
```{r}
Expand Down
2 changes: 1 addition & 1 deletion scales-position.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ base + coord_cartesian(ylim = c(10, 35)) # works as expected
base + ylim(10, 35) # distorts the boxplot
```

The only difference between the left and middle plots is that the latter is zoomed in. Some of the outlier points are not shown due to the restriction of the range, but the boxplots themselves remain identical. In contrast, in the plot on the right one of the boxplots has changed. When modifying the scale limits, all observations with highway mileage greater than 35 are converted to `NA` before the stat (in this case the boxplot) is computed. Because these "out of bounds" values are no longer available, the end result is that the sample median is shifted downward, which is almost never desirable behaviour. In hindsight, I regret this design choice as it is a common source of confusion for users. Unfortunately it would be very hard to change this default without breaking a lot of existing code.
The only difference between the left and middle plots is that the latter is zoomed in. Some of the outlier points are not shown due to the restriction of the range, but the boxplots themselves remain identical. In contrast, in the plot on the right one of the boxplots has changed. When modifying the scale limits, all observations with highway mileage greater than 35 are converted to `NA` before the stat (in this case the boxplot) is computed. Because these "out of bounds" values are no longer available, the end result is that the sample median is shifted downward, which is almost never desirable behaviour. With the benefit of hindsight it's clear this wasn't a good design choice, because it is a common source of confusion for users. Unfortunately it would be very hard to change this default without breaking a lot of existing code.

You can learn more about coordinate systems in Section \@ref(cartesian). To learn more about how "out of bounds" values are handled for continuous and binned scales, see Section \@ref(oob).

Expand Down

0 comments on commit a14ebbd

Please sign in to comment.