-
Notifications
You must be signed in to change notification settings - Fork 0
/
spatial_core.qmd
200 lines (126 loc) · 10.4 KB
/
spatial_core.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
---
title: "Spatial Fundamentals"
engine: knitr
---
## Spatial Data Types
There are two primary types of spatial data: _raster_ and _vector_ data. [{{< fa brands python >}} Python]{.py} and [{{< fa brands r-project >}} R]{.r} can both handle either of these types so coding language doesn't matter but there are fundamental structural differences between the two data types. See below for more information about each.
:::panel-tabset
### Raster Data
Raster data stores information in pixels. Each pixel is located at a specific geographic location (i.e., a specific X and Y coordinate pair). These pixel values can be continuous (e.g., rainfall, elevation, etc.) or categorical (e.g., land cover categories, date of first green up, etc.). Even if you've never worked with spatial data before you've certainly worked with rasters: technically every digital image is a raster!
<u>Raster files are typically GeoTIFFs and use the `.tif` extension.</u>
Consider this visual depiction of raster data:
<figure align="center">
<img src="images/spatial/raster_diagram.png" alt="Picture of a forest with an inset showing how the pixels in that image relate to information stored in each pixel" width="65%">
<figcaption>Image Source - National Ecological Observatory Network (NEON)</figcaption>
</figure>
### Vector Data
Vector data store information in "features". These features use specific geographic points (again, think X and Y coordinates) and store information about the geometric relationship among features. This allows vector data to be in terms of particular geometries like points, lines, or polygons.
<u>Vector data are typically preserved as shapefiles and use **several extensions**.</u> When we refer to shapefiles in code we only refer to the `.shp` file but there are several associated files that must also be present in the same folder for the data to be read properly. These usually include `.dbf`, `.prj`, and `.shx` but there may sometimes also be a `.xml` file or two. For our purposes, the specifics of these files are not relevant but it is important to remember that you will need them in order to work with vector data.
Consider this visual depiction of vector data:
<figure align="center">
<img src="images/spatial/vector_diagram.png" alt="Diagram of points, lines (points connected by lines), and polygons (three or more points that define the edges of a shape)" width="65%">
<figcaption>Image Source - National Ecological Observatory Network (NEON)</figcaption>
</figure>
:::
## Library Loading
Before we can dive into "actual" spatial work we'll need to load some libraries.
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
Load `terra` and `sf` to work with raster and vector data respectively.
```{r r-libs, warning = F, message = F}
# Load needed libraries
library(terra)
library(sf)
```
## [{{< fa brands python >}} Python]{.py}
Load the `rioxarray`, `rasterio`, and `geopandas` libraries to work with raster and vector data. We'll also load the `os` library to deal with file path issues.
```{python py-libs}
# Load needed libraries
import rasterio as rio
import rioxarray as rxr
import geopandas as gpd
import os
```
:::
## Coordinate Reference Systems (CRS)
While raster and vector data may both refer to non-spatial or spatial data, true spatial data _requires_ a coordinate reference system (CRS). CRS has a very specific format that all geospatial applications (including [{{< fa brands python >}} Python]{.py} and [{{< fa brands r-project >}} R]{.r}!) use to display/process the data correctly. CRS includes three components:
1. **Datum** -- a model for the shape of the earth. It defines the starting coordinate pair and angular units that--when used with the starting point--define a particular spot on the planet. There can be global datums (e.g., WGS84, NAD83, etc.) that apply anywhere on the planet and local datums that work well for a particular area but do not work outside of that area
2. **Projection** -- mathematical transformation to get from a round planet to a flat map
3. **Additional Parameters** -- any other information necessary to support the projection (e.g., the coordinates of the center of the map, etc.)
A hopefully useful analogy is to consider the datum as a choice between a set of citrus fruits of varying shapes (e.g., lemon, orange, grapefruit, etc.) while the projection is a set of instructions on how to flatten the peel of the chosen fruit.
#### CRS Importance
Coordinate reference systems may sound dry and uninteresting--even in a pretty technical coding context--but they are vitally important! For many scientific purposes we want to know how a set of points intersect with a given map or how well several maps line up. To answer questions like these or interpret virtually any geospatial information, we must make sure that each spatial component uses the same CRS. Some coordinate reference systems use similar units which can mean a quick glance makes all spatial data _seem_ interoperable while in reality the data cannot be directly compared without transforming to a standard CRS.
A rule of thumb that may help is that <u>_every_ spatial script you write should be very careful to check the CRS(s) used by the data.</u>
## Working with Rasters
To demonstrate raster data operations we'll use NASA's [Shuttle Radar Topography Mission Global 3 arc second](https://lpdaac.usgs.gov/products/srtmgl3v003/) elevation data. Note that some minor preparatory work was necessary to get the data ready for our purposes here and is preserved in [this folder](https://github.com/njlyon0/collab_bilingualism/tree/main/dev) of the website's {{< fa brands github >}} GitHub repository.
### Loading Rasters
We can begin by reading in the data.
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
There are several [{{< fa brands r-project >}} R]{.r} packages for working with raster data but we'll focus on `terra`.
```{r r-rast-read}
# Load a raster of elevation data
rast_r <- terra::rast(x = file.path("data", "elevation.tif"))
# Check the class of this object
class(rast_r)
```
## [{{< fa brands python >}} Python]{.py}
The `rioxarray` library will be our focal library for raster operations with [{{< fa brands python >}} Python].
pd.read_csv(os.path.join("data", "verts.csv"))
```{python py-rast-read}
# Load a raster of elevation data
rast_py = rxr.open_rasterio(os.path.join("data", "elevation.tif"), masked = True).squeeze()
# Check the type of this variable
type(rast_py.rio.crs)
```
:::
### Checking Raster CRS
As noted earlier, the very first thing we should do after reading in spatial data is _check the coordinate reference system!_
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
We can check the CRS with the `crs` function (from `terra`).
```{r r-crs-check}
# Check the CRS of the elevation data
terra::crs(x = rast_r)
```
## [{{< fa brands python >}} Python]{.py}
[{{< fa brands python >}} Python]{.py} stores raster CRS as an attribute.
```{python py-crs-check}
# Check the CRS of the elevation data
rast_py.rio.crs
```
:::
You may note in the above operations that [{{< fa brands python >}} Python]{.py} and [{{< fa brands r-project >}} R]{.r} seem to be returning different information for the same data. In fact they are saying the same thing just in slightly different ways. [{{< fa brands r-project >}} R]{.r} is telling us the name of the coordinate reference system (World Geodetic System 1984) while [{{< fa brands python >}} Python]{.py} is giving us the four-digit "EPSG" code. For context, the European Petroleum Survey Group (EPSG) is a major authority on accepted coordinate reference systems. Each CRS is given a unique, four-digit code. As you may now be able to guess, "4326" is the EPSG code for the World Geodetic System 1984 CRS!
This CRS (WGS84) is an especially common one so you may eventually commit its EPSG code to memory but often you'll encounter either CRS names or EPSG codes with which you are not familiar. It may not sound like sage advice but it can be simplest to just use whichever piece of information your coding language returns to Google the "missing" information.
### Transforming Raster CRS
Once we know the current CRS, we can transform into a different coordinate reference system as needed. Let's transform from WGS84 into another commonly-used CRS: Albers Equal Area (EPSG code 3083).
Quick warning, transforming rasters can take a **long** time and is very computationally intensive. It'll work for a relatively small raster like the one we're working with here but in general you should be careful with attempting to re-project a raster into a different CRS.
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
For CRS transformations in [{{< fa brands r-project >}} R]{.r} we can use the `project` function (pronounced like the verb not the noun).
```{r r-crs-change}
# Transform the raster CRS
rast_alber_r <- terra::project(x = rast_r, y = "epsg:3083")
# Re-check the CRS to confirm it worked
terra::crs(rast_alber_r)
```
## [{{< fa brands python >}} Python]{.py}
[{{< fa brands python >}} Python]{.py} CRS re-projections have slightly more steps to work through.
```{python py-crs-change}
# Create a CRS variable for the desired ending CRS
wgs84_crs = rio.crs.CRS.from_string("EPSG:3083")
# Reproject the raster we had
rast_alber_py = rast_py.rio.reproject(wgs84_crs)
# Re-check the CRS to confirm it worked
rast_alber_py.rio.crs
```
:::
## Working with Vector Data
### Loading Vector Data
### Checking Vector CRS
### Transforming Vector CRS
## Additional Resources
Spatial operations have gotten _a ton_ of attention in both [{{< fa brands python >}} Python]{.py} and [{{< fa brands r-project >}} R]{.r}! This website is mostly focused on translating between the two languages though so much of this nuance is not covered here. For those interested in a deeper dive in spatial computing, consider the following.
- [{{< fa brands r-project >}} R]{.r} -- The Data Carpentries has a solid ["Introduction to Geospatial Concepts"](https://datacarpentry.org/organization-geospatial/index.html) lesson
- [{{< fa brands r-project >}} R]{.r} -- [Rachel King](https://github.com/king0708) created a really nice ["Spatial Data Visualization"](https://github.com/king0708/spatial-data-viz) workshop
- [{{< fa brands python >}} Python]{.py} -- The Arctic Data Center made a "Scalable and Computationally Reproducible Approaches to Arctic Resources" course that includes a ["Spatial and Image Data Using GeoPandas"](https://learning.nceas.ucsb.edu/2023-03-arctic/sections/geopandas.html#calculate-total-distance-per-fishing-area) chapter