Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

estimate densities from point patterns #27

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 167 additions & 0 deletions content/tutorials/spatial_point_pattern/index.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
---
title: Estimating densities from a point pattern
maintainer: Thierry Onkelinx
date: '2017-06-28'
output:
html_document: default
md_document:
preserve_yaml: true
variant: gfm+footnotes
params:
cellsize: 20
---

In this example we focus on a set of 10450 coordinates in a small area.
The goal is to estimate the local density of points, expressed as the number of points per unit area.
The raw coordinates are given in [WGS84 (EPSG:4326)](https://epsg.io/4326), which is a geodetic coordinate reference system.
That is not suited for calculating distances, so we need to re-project the points into a local projected coordinate reference system.
In this case we use [Belgian Lambert72 (EPSG:3170)](https://epsg.io/31370).
Next we calculate the density.
To visualise the density, we have to transform the results back to WGS84.

The data used in this example is real data but centred to a different location for privacy reasons. The dataset is available on [GitHub](https://github.com/ThierryO/my_blog/tree/master/data/20170628).

First we must read the data into R. Plotting the raw data helps to check errors in the data.

```{r read-data, fig.cap = "Raw data"}
points <- read.delim("point-pattern/points.txt", sep = " ")
library(ggmap)
map <- get_map(
location = c(lon = mean(points$lon), lat = mean(points$lat)),
zoom = 17,
maptype = "satellite",
source = "google"
)
ggmap(map) +
geom_point(data = points, alpha = 0.1, colour = "blue", shape = 4)
```

The next step is to convert the dataset to a `SpatialPoints` object with WGS84 projection and re-project it to Belgian Lambert72.
`sp::CRS()` defines the coordinate reference systems (CRS).
`sp::coordinates()<-` is an easy way to convert a `data.frame` into a `SpatialPointsDataFrame`, but without specifying a CRS.
Therefore we need to override the CRS with the correct one.

`sp` will derive a 'WKT2 string' representation from an EPSG-code [^epsg] that we provide, in order to represent the CRS.
The WKT2 string (well known text) is a recent open standard by the Open Geospatial Consortium to represent a CRS, and it replaces the older (deprecated) 'PROJ.4 string'.
Currently, the function to set the CRS is still called `proj4string()`.
`sp::spTransform()` converts the spatial object from the current CRS to another CRS.

[^epsg]: Most coordinate reference systems have an [EPSG](https://en.wikipedia.org/wiki/International_Association_of_Oil_%26_Gas_Producers#European_Petroleum_Survey_Group) code which you can find at http://epsg.io/.

```{r crs-objects}
library(sp)
crs_wgs84 <- CRS(SRS_string = "EPSG:4326")
crs_lambert <- CRS(SRS_string = "EPSG:31370")
```

The warning above once again demonstrates that some PROJ.4 information is not supported anymore.

```{r reproject}
coordinates(points) <- ~lon + lat
# proj4string() - still the only available function to set the CRS - may in the
# future get a more general name:
proj4string(points) <- crs_wgs84
points_lambert <- spTransform(points, crs_lambert)
```

Once we have the points into a projected coordinate system, we can calculate the densities. We start by defining a grid. `cellsize` is the dimension of the square grid cell in the units of the projected coordinate system. Meters in case of Lambert72. The boundaries of the grid are defined using `pretty()`, which turns a vector of numbers into a "pretty" vector with rounded numbers. The combination of the boundaries and the cell size determine the number of grid cells `n` in each dimension. `diff()` calculates the difference between to adjacent numbers of a vector. The density is calculated with `MASS::kde2d()` based on the vectors with the longitude and latitude, the number of grid cells in each dimension and the boundaries of the grid. This returns the grid as a list with elements `x` (a vector of longitude coordinates of the centroids), `y` (a vector of latitude coordinates of the centroids) and `z` (a matrix with densities). The values in `z` are densities for the 'average' point per unit area. When we multiply the value `z` with the area of the grid cell and sum all of them we get 1. So if we multiple `z` with the number of points we get the density of the points per unit area.

We use [`dplyr::mutate()`](http://dplyr.tidyverse.org/) to convert it into a `data.frame`. The last two steps convert the centroids into a set of coordinates for square polygons.

```{r density}
library(MASS)
library(dplyr)
xlim <- range(pretty(points_lambert$lon)) + c(-100, 100)
ylim <- range(pretty(points_lambert$lat)) + c(-100, 100)
n <- c(
diff(xlim),
diff(ylim)
) / params$cellsize + 1
dens <- kde2d(
x = points_lambert$lon,
y = points_lambert$lat,
n = n,
lims = c(xlim, ylim)
)
dx <- diff(dens$x[1:2])
dy <- diff(dens$y[1:2])
sum(dens$z * dx * dy)
dens <- expand.grid(
lon = dens$x,
lat = dens$y
) %>%
mutate(
density = as.vector(dens$z) * length(points_lambert),
id = seq_along(density)
) %>%
merge(
data.frame(
x = dx * (c(0, 0, 1, 1, 0) - 0.5),
y = dy * (c(0, 1, 1, 0, 0) - 0.5)
)
) %>%
mutate(
lon = lon + x,
lat = lat + y
)
```

In order to visualise the result, we have to re-project the coordinates back to WGS84. Then we can display the raster with a web based background image.

```{r ggmap, fig.cap = "Static image of density"}
coordinates(dens) <- ~lon + lat
proj4string(dens) <- crs_lambert
dens_wgs <- spTransform(dens, crs_wgs84) %>%
as.data.frame()
ggmap(map) +
geom_polygon(data = dens_wgs, aes(group = id, fill = density), alpha = 0.5) +
scale_fill_gradientn(
"density\n(#/m²)",
colours = rev(rainbow(100, start = 0, end = .7)),
limits = c(0, NA)
)
```

Using `leaflet` to generate a map was a bit more laborious.
Using the `data.frame dens_wgs`directly failed.
So we converted the `data.frame` to a `SpatialPolygonsDataframe`, which is a combination of a `SpatialPolygons` and a `data.frame`.
The `SpatialPolygons` consists of a list of `Polygons`, one for each row of the `data.frame`.
A `Polygons` object consists of a list of one or more `Polygon` objects.
In this example it is a single polygon which represents the grid cell.

```{r convert-to-Spatial-Polygons}
dens_sp <- lapply(
unique(dens_wgs$id),
function(i){
filter(dens_wgs, id == i) %>%
select(lon, lat) %>%
Polygon() %>%
list() %>%
Polygons(ID = i)
}
) %>%
SpatialPolygons() %>%
SpatialPolygonsDataFrame(
data = dens_wgs %>%
distinct(id, density),
match.ID = FALSE
)
```

`leaflet` requires a predefined function with a colour palette.
We use `leaflet::colorNumeric()` to get a continuous palette.
Setting `stroke = FALSE` removes the borders of the polygon.
`fillOpacity` sets the transparency of the polygons.

```{r leaflet, fig.cap = "Dynamic map of density"}
library(leaflet)
pal <- colorNumeric(
palette = rev(rainbow(100, start = 0, end = .7)),
domain = c(0, dens_sp$density)
)
leaflet(dens_sp) %>%
addTiles() %>%
addPolygons(color = ~pal(density), stroke = FALSE, fillOpacity = 0.5) %>%
addLegend(pal = pal, values = ~density)
```

Loading