Skip to content

Commit

Permalink
Merge pull request #1132 from geocompx/refs-proofing
Browse files Browse the repository at this point in the history
Refs proofing
  • Loading branch information
Robinlovelace authored Sep 30, 2024
2 parents 6b8fabf + b8201ac commit 0b4734a
Show file tree
Hide file tree
Showing 18 changed files with 104 additions and 91 deletions.
4 changes: 2 additions & 2 deletions 01-introduction.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Building on this early definition, *Geocomputation with R* goes beyond data anal
Our approach differs from early definitions of geocomputation in one important way, however: in its emphasis on reproducibility\index{reproducibility} and collaboration.
At the turn of the 21^st^ Century, it was unrealistic to expect readers to be able to reproduce code examples, due to barriers preventing access to the necessary hardware, software and data.
Fast-forward to today and things have progressed rapidly.
Anyone with access to a laptop with sufficient RAM (at least 8 GB recommended) can install and run software for geocomputation, and reproduce the contents of this book.
Anyone with access to a laptop with sufficient RAM (at least eight GB recommended) can install and run software for geocomputation, and reproduce the contents of this book.
Financial and hardware barriers to geocomputation that existed in 1990s and early 2000s, when high-performance computers were too expensive for most people, have been removed.^[
A suitable laptop can be acquired second-hand for $100 or less in most countries today from websites such as [Ebay](https://www.ebay.com/sch/i.html?_from=R40&_nkw=laptop&_sacat=0&_oaa=1&_udhi=100&rt=nc&RAM%2520Size=4%2520GB%7C16%2520GB%7C8%2520GB&_dcat=177).
Guidance on installing R and a suitable code editor is provided in Chapter \@ref(spatial-class).
Expand Down Expand Up @@ -298,7 +298,7 @@ R's spatial capabilities originated in early spatial packages in the S language
The 1990s saw the development of numerous S scripts and a handful of packages for spatial statistics\index{statistics}.
By the year 2000, there were R packages for various spatial methods, including "point pattern analysis, geostatistics, exploratory spatial data analysis and spatial econometrics" [@bivand_open_2000].
Some of these, notably **spatial**, **sgeostat** and **splancs** are still available on CRAN\index{CRAN} [@rowlingson_splancs_1993; @rowlingson_splancs_2017;@venables_modern_2002; @majure_sgeostat_2016].
Key spatial packages were described in @ripley_spatial_2001, which outlined R packages for spatial smoothing and interpolation [@akima_akima_2016; @jr_geor_2016] and point pattern analysis [@rowlingson_splancs_2017; @baddeley_spatial_2015].
Key spatial packages were described in @ripley_spatial_2001, which outlined R packages for spatial smoothing and interpolation and point pattern analysis.
One of these (**spatstat**) is still being actively maintained, more than 20 years after its first release.

A following commentary outlined the future prospects of spatial statistics [@bivand_more_2001], setting the stage for the development of the popular **spdep** package [@bivand_spdep_2017].
Expand Down
2 changes: 1 addition & 1 deletion 03-attribute-operations.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ Base R functions are mature, stable and widely used, making them a rock solid ch
Key functions for subsetting data frames (including `sf` data frames) with **dplyr** functions are demonstrated below.

```{r, echo=FALSE, eval=FALSE}
# Aim: benchmark base vs dplyr subsetting
# Aim: benchmark base vs. dplyr subsetting
# Could move elsewhere?
i = sample(nrow(world), size = 10)
benchmark_subset = bench::mark(
Expand Down
2 changes: 1 addition & 1 deletion 05-geometry-operations.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ nz_pos = st_point_on_surface(nz)
seine_pos = st_point_on_surface(seine)
```

```{r centr, warning=FALSE, echo=FALSE, fig.cap="Centroids (black points) and 'points on surface' (red points) of New Zealand's regions (left) and the Seine (right) datasets.", fig.scap="Centroid vs point on surface operations."}
```{r centr, warning=FALSE, echo=FALSE, fig.cap="Centroids (black points) and 'points on surface' (red points) of New Zealand's regions (left) and the Seine (right) datasets.", fig.scap="Centroid vs. point on surface operations."}
p_centr1 = tm_shape(nz) + tm_polygons(col = "gray80", fill = "gray90") +
tm_shape(nz_centroid) + tm_symbols(shape = 1, col = "black", size = 0.5) +
tm_shape(nz_pos) + tm_symbols(shape = 1, col = "red", size = 0.5) +
Expand Down
6 changes: 3 additions & 3 deletions 08-read-write-plot.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -271,15 +271,15 @@ It is fast and flexible, but it may be worth looking at other packages such as *
### Raster data {#raster-data-read}

\index{raster!data input}
Similar to vector data, raster data comes in many file formats with some supporting multi-layerfiles.
Similar to vector data, raster data comes in many file formats with some supporting multi-layer files.
**terra**'s `rast()` command reads in a single layer when a file with just one layer is provided.

```{r 07-read-write-plot-24, message=FALSE}
raster_filepath = system.file("raster/srtm.tif", package = "spDataLarge")
single_layer = rast(raster_filepath)
```

It also works in case you want to read a multi-layerfile.
It also works in case you want to read a multi-layer file.

```{r 07-read-write-plot-25}
multilayer_filepath = system.file("raster/landsat.tif", package = "spDataLarge")
Expand Down Expand Up @@ -519,7 +519,7 @@ usa_sf = ne_countries(country = "United States of America", returnclass = "sf")
Country borders can be also accessed with other packages, such as **geodata**, **giscoR**, or **rgeoboundaries**.

A second example downloads a series of rasters containing global monthly precipitation sums with spatial resolution of 10 minutes (~18.5 km at the equator) using the **geodata** package [@R-geodata].
The result is a multi-layerobject of class `SpatRaster`.
The result is a multi-layer object of class `SpatRaster`.

```{r 07-read-write-plot-5, eval=FALSE}
library(geodata)
Expand Down
2 changes: 1 addition & 1 deletion 10-gis.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ According to the creator of the popular QGIS software [@sherman_desktop_2008]:

> With the advent of 'modern' GIS software, most people want to point and click their way through life. That’s good, but there is a tremendous amount of flexibility and power waiting for you with the command line. Many times you can do something on the command line in a fraction of the time you can do it with a GUI.
The 'CLI vs GUI' debate does not have to be adversarial: both ways of working have advantages, depending on a range of factors including the task (with drawing new features being well suited to GUIs), the level of reproducibility desired, and the user's skillset.
The 'CLI vs. GUI' debate does not have to be adversarial: both ways of working have advantages, depending on a range of factors including the task (with drawing new features being well suited to GUIs), the level of reproducibility desired, and the user's skillset.
GRASS GIS is a good example of GIS software that is primarily based on a CLI but which also has a prominent GUI.
Likewise, while R is focused on its CLI, IDEs such as RStudio provide a GUI for improving accessibility.
Software cannot be neatly categorized into CLI or GUI-based.
Expand Down
8 changes: 4 additions & 4 deletions 12-spatial-cv.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -270,11 +270,11 @@ There are dozens of packages for statistical learning\index{statistical learning
Getting acquainted with each of these packages, including how to undertake cross-validation and hyperparameter\index{hyperparameter} tuning, can be a time-consuming process.
Comparing model results from different packages can be even more laborious.
The **mlr3** package and ecosystem was developed to address these issues.
It acts as a 'meta-package', providing a unified interface to popular supervised and unsupervised statistical learning techniques including classification, regression\index{regression}, survival analysis and clustering\index{clustering} [@lang_mlr3_2019; @becker_mlr3_2022].
It acts as a 'meta-package', providing a unified interface to popular supervised and unsupervised statistical learning techniques including classification, regression\index{regression}, survival analysis and clustering\index{clustering} [@lang_mlr3_2019; @bischl_applied_2024].
The standardized **mlr3** interface is based on eight 'building blocks'.
As illustrated in Figure \@ref(fig:building-blocks), these have a clear order.

(ref:building-blocks) Basic building blocks of the mlr3 package. Source: @becker_mlr3_2022. (Permission to reuse this figure was kindly granted.)
(ref:building-blocks) Basic building blocks of the mlr3 package. Source: @bischl_applied_2024. (Permission to reuse this figure was kindly granted.)

```{r building-blocks, echo=FALSE, fig.height=4, fig.width=4, fig.cap="(ref:building-blocks)", fig.scap="Basic building blocks of the mlr3 package."}
knitr::include_graphics("images/12_ml_abstraction_crop.png")
Expand Down Expand Up @@ -635,7 +635,7 @@ round(mean(score_spcv_svm$classif.auc), 2)

It appears that the GLM\index{GLM} (aggregated AUROC\index{AUROC} was `r score[resampling_id == "repeated_spcv_coords" & learner_id == "classif.log_reg", round(mean(classif.auc), 2)]`) is slightly better than the SVM\index{SVM} in this specific case.
To guarantee an absolute fair comparison, one should also make sure that the two models use the exact same partitions -- something we have not shown here but have silently used in the background (see `code/12_cv.R` in the book's GitHub repository for more information).
To do so, **mlr3** offers the functions `benchmark_grid()` and `benchmark()` [see also https://mlr3book.mlr-org.com/chapters/chapter3/evaluation_and_benchmarking.html#sec-benchmarking, @becker_mlr3_2022].
To do so, **mlr3** offers the functions `benchmark_grid()` and `benchmark()` [see also https://mlr3book.mlr-org.com/chapters/chapter3/evaluation_and_benchmarking.html#sec-benchmarking, @bischl_applied_2024].
We will explore these functions in more detail in the Exercises.
Please note also that using more than 50 iterations in the random search of the SVM would probably yield hyperparameters\index{hyperparameter} that result in models with a better AUROC [@schratz_hyperparameter_2019].
On the other hand, increasing the number of random search iterations would also increase the total number of models and thus runtime.
Expand All @@ -658,7 +658,7 @@ Machine learning algorithms often require hyperparameter\index{hyperparameter} i
Machine learning overall, and its use to understand spatial data, is a large field and this chapter has provided the basics, but there is more to learn.
We recommend the following resources in this direction:

- The **mlr3 book** (@becker_mlr3_2022; https://mlr3book.mlr-org.com/) and especially the [chapter on the handling of spatiotemporal data](https://mlr3book.mlr-org.com/chapters/chapter13/beyond_regression_and_classification.html#sec-spatiotemporal)
- The **mlr3 book** (@bischl_applied_2024; https://mlr3book.mlr-org.com/) and especially the [chapter on the handling of spatiotemporal data](https://mlr3book.mlr-org.com/chapters/chapter13/beyond_regression_and_classification.html#sec-spatiotemporal)
- An academic paper on hyperparameter\index{hyperparameter} tuning [@schratz_hyperparameter_2019]
- An academic paper on how to use **mlr3spatiotempcv** [@schratz_mlr3spatiotempcv_2021]
- In case of spatiotemporal data, one should account for spatial\index{autocorrelation!spatial} and temporal\index{autocorrelation!temporal} autocorrelation when doing CV\index{cross-validation} [@meyer_improving_2018]
Expand Down
Loading

0 comments on commit 0b4734a

Please sign in to comment.