From 9092405b241eb765695a7edee783588918fd9c2a Mon Sep 17 00:00:00 2001 From: njlyon0 Date: Tue, 2 Jul 2024 15:52:04 -0400 Subject: [PATCH] Finished draft of spatial data extraction topic --- _freeze/mod_spatial/execute-results/html.json | 4 +- mod_spatial.qmd | 40 +++++++++++++++++++ 2 files changed, 42 insertions(+), 2 deletions(-) diff --git a/_freeze/mod_spatial/execute-results/html.json b/_freeze/mod_spatial/execute-results/html.json index cf4528e..d8c24f1 100644 --- a/_freeze/mod_spatial/execute-results/html.json +++ b/_freeze/mod_spatial/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "c4254a511a7b02046eb172319997254e", + "hash": "89a9890b585bd7bfc4448edcb321aa6c", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Working with Spatial Data\"\ncode-annotations: hover\n---\n\n\n\n\n## Overview\n\nSynthesis projects often have need of spatial datasets. At its simplest, it can be helpful to have a map of the original project locations including in the synthesis dataset. In more complex instances you want to extract spatial data within a certain area of sampling locations. Regardless of 'why' you're using spatial data, it may come up during your primary or synthesis work and thus deserves consideration in this course's materials. There are _many_ modes of working with spatial data, and not all of these tools require coding literacy but for consistency with the rest of the modules **this module will focus on _scripted_ approaches to interacting with spatial data**.\n\n## Learning Objectives\n\nAfter completing this topic you will be able to: \n\n- Define the difference between the two major types of spatial data\n- Manipulate spatial data with R\n- Create maps using spatial data\n- Integrate spatial data with tabular data\n\n## Needed Packages\n\nIf you'd like to follow along with the code chunks included throughout this module, you'll need to install the following packages:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Note that these lines only need to be run once per computer\n## So you can skip this step if you've installed these before\ninstall.packages(\"tidyverse\")\ninstall.packages(\"sf\")\ninstall.packages(\"terra\")\ninstall.packages(\"maps\")\n```\n:::\n\n\n## Types of Spatial Data\n\nThere are two main types of spatial data: vector and raster. Both types (and the packages they require) are described in the tabs below.\n\n:::{.panel-tabset}\n### Vector Data\n\nVector data are stored as polygons. Essentially vector data are a set of points and--sometimes--the lines between them that define the edges of a shape. They may store additional data that is retained in a semi-tabular format that relates to the polygon(s) but isn't directly stored in them.\n\nCommon vector data types include shape files or GeoJSONs.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load needed library\nlibrary(sf)\n\n# Read in shapefile\nnc_poly <- sf::st_read(dsn = file.path(\"data\", \"nc_borders.shp\")) #<1>\n```\n:::\n\n1. Note that even though we're only specifying the \".shp\" file in this function you _must_ also have the associated files in that same folder. In this case that includes a \".dbf\", \".prj\", and \".shx\", though in other contexts you may have others.\n\nOnce you have read in the shapefile, you can check its structure as you would any other data object. Note that the object has both the 'data.frame' class and the 'sf' (\"simple features\") class. In this case, the shapefile relates to counties in North Carolina and some associated demographic data in those counties.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Check structure\nstr(nc_poly)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nClasses 'sf' and 'data.frame':\t100 obs. of 15 variables:\n $ AREA : num 0.114 0.061 0.143 0.07 0.153 0.097 0.062 0.091 0.118 0.124 ...\n $ PERIMETER: num 1.44 1.23 1.63 2.97 2.21 ...\n $ CNTY_ : num 1825 1827 1828 1831 1832 ...\n $ CNTY_ID : num 1825 1827 1828 1831 1832 ...\n $ NAME : chr \"Ashe\" \"Alleghany\" \"Surry\" \"Currituck\" ...\n $ FIPS : chr \"37009\" \"37005\" \"37171\" \"37053\" ...\n $ FIPSNO : num 37009 37005 37171 37053 37131 ...\n $ CRESS_ID : int 5 3 86 27 66 46 15 37 93 85 ...\n $ BIR74 : num 1091 487 3188 508 1421 ...\n $ SID74 : num 1 0 5 1 9 7 0 0 4 1 ...\n $ NWBIR74 : num 10 10 208 123 1066 ...\n $ BIR79 : num 1364 542 3616 830 1606 ...\n $ SID79 : num 0 3 6 2 3 5 2 2 2 5 ...\n $ NWBIR79 : num 19 12 260 145 1197 ...\n $ geometry :sfc_MULTIPOLYGON of length 100; first list element: List of 1\n ..$ :List of 1\n .. ..$ : num [1:27, 1:2] -81.5 -81.5 -81.6 -81.6 -81.7 ...\n ..- attr(*, \"class\")= chr [1:3] \"XY\" \"MULTIPOLYGON\" \"sfg\"\n - attr(*, \"sf_column\")= chr \"geometry\"\n - attr(*, \"agr\")= Factor w/ 3 levels \"constant\",\"aggregate\",..: NA NA NA NA NA NA NA NA NA NA ...\n ..- attr(*, \"names\")= chr [1:14] \"AREA\" \"PERIMETER\" \"CNTY_\" \"CNTY_ID\" ...\n```\n\n\n:::\n:::\n\n\nIf desired, we could make a simple R base plot-style map. In this case we'll do it based on just the county areas so that the polygons are filled with a color corresponding to how large the county is.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Make a graph\nplot(nc_poly[\"AREA\"], axes = T)\n```\n\n::: {.cell-output-display}\n![](mod_spatial_files/figure-html/plot-vector-1.png){fig-align='center' width=672}\n:::\n:::\n\n\n### Raster Data\n\nRaster data are stored as values in pixels. The resolution (i.e., size of the pixels) may differ among rasters but in all cases the data are stored at a per-pixel level.\n\nCommon raster data types include GeoTIFFs (.tif) and NetCDF (.nc) files.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load needed library\nlibrary(terra)\n\n# Read in raster\nnc_pixel <- terra::rast(x = file.path(\"data\", \"nc_elevation.tif\"))\n```\n:::\n\n\nOnce you've read in the raster file you can check it's structure as you would any other object but the resulting output is much less informative than for other object classes.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Check structure\nstr(nc_pixel)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nS4 class 'SpatRaster' [package \"terra\"]\n```\n\n\n:::\n:::\n\n\nRegardless, now that we have the raster loaded we can make a simple graph to check out what sort of data is stored in it. In this case, each pixel is 3 arcseconds on each side (~0.0002° latitude/longitude) and contains the elevation (in meters) of that pixel.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Make a graph\nterra::plot(nc_pixel)\n```\n\n::: {.cell-output-display}\n![](mod_spatial_files/figure-html/plot-raster-1.png){fig-align='center' width=672}\n:::\n:::\n\n\n:::\n\n## Coordinate Reference Systems\n\nA fundamental problem in spatial data is how to project data collected on a nearly spherical planet onto a two-dimensional plane. This has been solved--or at least clarified--by the use of Coordinate Reference Systems (a.k.a. \"CRS\"). All spatial data have a CRS that is explicitly identified in the data and/or the metadata because the data _are not interpretable_ without knowing which CRS is used.\n\n\nThe CRS defines the following information:\n\n1. **Datum** -- model for shape of the Earth including the starting coordinate pair and angular units that together define any particular point on the planet\n - Note that there can be global datums that work for any region of the world and local datums that only work for a particular area\n2. **Projection** -- math for the transformation to get from a round planet to a flat map\n3. **Additional parameters** -- any other information necessary to support the projection\n - E.g., the coordinates at the center of the map\n\nSome people use the analogy of peeling a citrus fruit and flattening the peel to describe the components of a CRS. The datum is the choice between a lemon or a grapefruit (i.e., the shape of the not-quite-spherical object) while the projection is the instructions for taking the complete peel and flattening it.\n\nYou can check and transform the CRS in any scripted language that allows the loading of spatial data though the specifics do differ between the types of spatial data we introduced earlier.\n\n:::{.panel-tabset}\n### Vector CRS\n\nFor vector data we can check the CRS with other functions from the `sf` library. It can be a little difficult to parse all of the information that returns but essentially it is most important that the CRS match that of any other spatial data with which we are working.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Check CRS\nsf::st_crs(x = nc_poly)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nCoordinate Reference System:\n User input: WGS 84 \n wkt:\nGEOGCRS[\"WGS 84\",\n DATUM[\"World Geodetic System 1984\",\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"latitude\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"longitude\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4326]]\n```\n\n\n:::\n:::\n\n\nOnce you know the CRS, you can transform the data to another CRS if desired. This is a relatively fast operation for vector data because we're transforming geometric data rather than potentially millions of pixels.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Transform CRS\nnc_poly_nad83 <- sf::st_transform(x = nc_poly, crs = 3083) # <1>\n\n# Re-check CRS\nsf::st_crs(nc_poly_nad83)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nCoordinate Reference System:\n User input: EPSG:3083 \n wkt:\nPROJCRS[\"NAD83 / Texas Centric Albers Equal Area\",\n BASEGEOGCRS[\"NAD83\",\n DATUM[\"North American Datum 1983\",\n ELLIPSOID[\"GRS 1980\",6378137,298.257222101,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4269]],\n CONVERSION[\"Texas Centric Albers Equal Area\",\n METHOD[\"Albers Equal Area\",\n ID[\"EPSG\",9822]],\n PARAMETER[\"Latitude of false origin\",18,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8821]],\n PARAMETER[\"Longitude of false origin\",-100,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8822]],\n PARAMETER[\"Latitude of 1st standard parallel\",27.5,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8823]],\n PARAMETER[\"Latitude of 2nd standard parallel\",35,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8824]],\n PARAMETER[\"Easting at false origin\",1500000,\n LENGTHUNIT[\"metre\",1],\n ID[\"EPSG\",8826]],\n PARAMETER[\"Northing at false origin\",6000000,\n LENGTHUNIT[\"metre\",1],\n ID[\"EPSG\",8827]]],\n CS[Cartesian,2],\n AXIS[\"easting (X)\",east,\n ORDER[1],\n LENGTHUNIT[\"metre\",1]],\n AXIS[\"northing (Y)\",north,\n ORDER[2],\n LENGTHUNIT[\"metre\",1]],\n USAGE[\n SCOPE[\"State-wide spatial data presentation requiring true area measurements.\"],\n AREA[\"United States (USA) - Texas.\"],\n BBOX[25.83,-106.66,36.5,-93.5]],\n ID[\"EPSG\",3083]]\n```\n\n\n:::\n:::\n\n1. In order to transform to a new CRS you'll need to identify the four-digit EPSG code for the desired CRS.\n\n### Raster CRS\n\nFor raster data we can check the CRS with other functions from the `terra` library. It can be a little difficult to parse all of the information that returns but essentially it is most important that the CRS match that of any other spatial data with which we are working.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Check CRS\nterra::crs(nc_pixel)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"GEOGCRS[\\\"WGS 84\\\",\\n ENSEMBLE[\\\"World Geodetic System 1984 ensemble\\\",\\n MEMBER[\\\"World Geodetic System 1984 (Transit)\\\"],\\n MEMBER[\\\"World Geodetic System 1984 (G730)\\\"],\\n MEMBER[\\\"World Geodetic System 1984 (G873)\\\"],\\n MEMBER[\\\"World Geodetic System 1984 (G1150)\\\"],\\n MEMBER[\\\"World Geodetic System 1984 (G1674)\\\"],\\n MEMBER[\\\"World Geodetic System 1984 (G1762)\\\"],\\n MEMBER[\\\"World Geodetic System 1984 (G2139)\\\"],\\n ELLIPSOID[\\\"WGS 84\\\",6378137,298.257223563,\\n LENGTHUNIT[\\\"metre\\\",1]],\\n ENSEMBLEACCURACY[2.0]],\\n PRIMEM[\\\"Greenwich\\\",0,\\n ANGLEUNIT[\\\"degree\\\",0.0174532925199433]],\\n CS[ellipsoidal,2],\\n AXIS[\\\"geodetic latitude (Lat)\\\",north,\\n ORDER[1],\\n ANGLEUNIT[\\\"degree\\\",0.0174532925199433]],\\n AXIS[\\\"geodetic longitude (Lon)\\\",east,\\n ORDER[2],\\n ANGLEUNIT[\\\"degree\\\",0.0174532925199433]],\\n USAGE[\\n SCOPE[\\\"Horizontal component of 3D system.\\\"],\\n AREA[\\\"World.\\\"],\\n BBOX[-90,-180,90,180]],\\n ID[\\\"EPSG\\\",4326]]\"\n```\n\n\n:::\n:::\n\n\nAs with vector data, if desired you can transform the data to another CRS. However, unlike vector data, transforming the CRS of raster data is _very_ computationally intense. If at all possible you should avoid re-projecting rasters. If you must re-project, consider doing so in an environment with greater computing power than a typical laptop. Also, you should export a new raster in your preferred CRS after transforming so that you reduce the likelihood that you need to re-project again later in the lifecylce of your project.\n\nIn the interests of making this website reasonably quick to re-build, the following code chunk is not actually evaluated but is the correct syntax for this operation.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Transform CRS\nnc_pixel_nad83 <- terra::project(x = nc_pixel, y = \"epsg:3083\")\n\n# Re-check CRS\nterra::crs(nc_pixel_nad83)\n```\n:::\n\n\n:::\n\n## Making Maps\n\nNow that we've covered the main types of spatial data as well as how to check the CRS (and transform if needed) we're ready to make maps! For consistency with other modules on data visualization, we'll use `ggplot2` to make our maps but note that base R also supports map making and there are many useful tutorials elsewhere on making maps in that framework.\n\nThe `maps` package includes some useful national and US state borders so we'll begin by preparing an object that combines both sets of borders.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load library\nlibrary(maps)\n\n# Make 'borders' object\nborders <- sf::st_as_sf(maps::map(database = \"world\", plot = F, fill = T)) %>%\n dplyr::bind_rows(sf::st_as_sf(maps::map(database = \"state\", plot = F, fill = T)))\n```\n:::\n\n\nNote that the simplest way of making a map that includes a raster is to coerce that raster into a dataframe. To do this we will translate each pixel's geographic coordinates into X and Y values.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnc_pixel_df <- as.data.frame(nc_pixel, xy = T) %>% \n # Rename the 'actual' data layer more clearly\n dplyr::rename(elevation_m = SRTMGL3_NC.003_SRTMGL3_DEM_doy2000042_aid0001)\n```\n:::\n\n\nWith the borders object and our modified raster in hand, we can now make a map that includes useful context for state/nation borders. Synthesis projects often cover a larger geographic extent than primary projects so this is particularly useful in ways it might not be for primary research.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load library\nlibrary(ggplot2)\n\n# Make map\nggplot(borders) +\n geom_sf(fill = \"gray95\") + # <1>\n coord_sf(xlim = c(-70, -90), ylim = c(30, 40), expand = F) + # <2>\n geom_tile(data = nc_pixel_df, aes(x = x, y = y, fill = elevation_m)) +\n labs(x = \"Longitude\", y = \"Latitude\")\n```\n:::\n\n1. This line is filling our nation polygons with a pale gray (helps to differentiate from ocean)\n2. Here we set the map extent so that we're only getting our region of interest\n\n

\n\"ggplot2-style\n

\n\nFrom here we can make additional `ggplot2`-style modifications as/if needed. This variant of map-making supports adding tabular data objects as well (though they would require separate geometries). Many of the LTER Network Office-funded groups that make maps include points for specific study locations along with a raster layer for an environmental / land cover characteristic that is particularly relevant to their research question and/or hypotheses.\n\n## Extracting Spatial Data\n\n\n\n\n\n## Additional Resources\n\n### Papers & Documents\n\n- \n\n### Workshops & Courses\n\n- The Carpentries' [Introduction to Geospatial Raster & Vector Data with R](https://datacarpentry.org/r-raster-vector-geospatial/)\n- The Carpentries' [Introduction to R for Geospatial Data](https://datacarpentry.org/r-intro-geospatial/index.html)\n- Arctic Data Center's [Spatial and Image Data Using GeoPandas](https://learning.nceas.ucsb.edu/2023-03-arctic/sections/geopandas.html) chapter of their Scalable Computing course\n- Jason Flower's (UC Santa Barbara) [Introduction to rasters with `terra`](https://jflowernet.github.io/intro-terra-ecodatascience/)\n- King, R. [Spatial Data Visualization](https://github.com/king0708/spatial-data-viz) workshop\n\n### Websites\n\n- NASA's Application for Extracting and Exploring Analysis Ready Samples [(AppEEARS) Portal](https://appeears.earthdatacloud.nasa.gov/)\n", + "markdown": "---\ntitle: \"Working with Spatial Data\"\ncode-annotations: hover\n---\n\n\n\n\n## Overview\n\nSynthesis projects often have need of spatial datasets. At its simplest, it can be helpful to have a map of the original project locations including in the synthesis dataset. In more complex instances you want to extract spatial data within a certain area of sampling locations. Regardless of 'why' you're using spatial data, it may come up during your primary or synthesis work and thus deserves consideration in this course's materials. There are _many_ modes of working with spatial data, and not all of these tools require coding literacy but for consistency with the rest of the modules **this module will focus on _scripted_ approaches to interacting with spatial data**.\n\n## Learning Objectives\n\nAfter completing this topic you will be able to: \n\n- Define the difference between the two major types of spatial data\n- Manipulate spatial data with R\n- Create maps using spatial data\n- Integrate spatial data with tabular data\n\n## Needed Packages\n\nIf you'd like to follow along with the code chunks included throughout this module, you'll need to install the following packages:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Note that these lines only need to be run once per computer\n## So you can skip this step if you've installed these before\ninstall.packages(\"tidyverse\")\ninstall.packages(\"sf\")\ninstall.packages(\"terra\")\ninstall.packages(\"maps\")\ninstall.packages(\"exactextractr\")\n```\n:::\n\n\n## Types of Spatial Data\n\nThere are two main types of spatial data: vector and raster. Both types (and the packages they require) are described in the tabs below.\n\n:::{.panel-tabset}\n### Vector Data\n\nVector data are stored as polygons. Essentially vector data are a set of points and--sometimes--the lines between them that define the edges of a shape. They may store additional data that is retained in a semi-tabular format that relates to the polygon(s) but isn't directly stored in them.\n\nCommon vector data types include shape files or GeoJSONs.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load needed library\nlibrary(sf)\n\n# Read in shapefile\nnc_poly <- sf::st_read(dsn = file.path(\"data\", \"nc_borders.shp\")) #<1>\n```\n:::\n\n1. Note that even though we're only specifying the \".shp\" file in this function you _must_ also have the associated files in that same folder. In this case that includes a \".dbf\", \".prj\", and \".shx\", though in other contexts you may have others.\n\nOnce you have read in the shapefile, you can check its structure as you would any other data object. Note that the object has both the 'data.frame' class and the 'sf' (\"simple features\") class. In this case, the shapefile relates to counties in North Carolina and some associated demographic data in those counties.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Check structure\nstr(nc_poly)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nClasses 'sf' and 'data.frame':\t100 obs. of 15 variables:\n $ AREA : num 0.114 0.061 0.143 0.07 0.153 0.097 0.062 0.091 0.118 0.124 ...\n $ PERIMETER: num 1.44 1.23 1.63 2.97 2.21 ...\n $ CNTY_ : num 1825 1827 1828 1831 1832 ...\n $ CNTY_ID : num 1825 1827 1828 1831 1832 ...\n $ NAME : chr \"Ashe\" \"Alleghany\" \"Surry\" \"Currituck\" ...\n $ FIPS : chr \"37009\" \"37005\" \"37171\" \"37053\" ...\n $ FIPSNO : num 37009 37005 37171 37053 37131 ...\n $ CRESS_ID : int 5 3 86 27 66 46 15 37 93 85 ...\n $ BIR74 : num 1091 487 3188 508 1421 ...\n $ SID74 : num 1 0 5 1 9 7 0 0 4 1 ...\n $ NWBIR74 : num 10 10 208 123 1066 ...\n $ BIR79 : num 1364 542 3616 830 1606 ...\n $ SID79 : num 0 3 6 2 3 5 2 2 2 5 ...\n $ NWBIR79 : num 19 12 260 145 1197 ...\n $ geometry :sfc_MULTIPOLYGON of length 100; first list element: List of 1\n ..$ :List of 1\n .. ..$ : num [1:27, 1:2] -81.5 -81.5 -81.6 -81.6 -81.7 ...\n ..- attr(*, \"class\")= chr [1:3] \"XY\" \"MULTIPOLYGON\" \"sfg\"\n - attr(*, \"sf_column\")= chr \"geometry\"\n - attr(*, \"agr\")= Factor w/ 3 levels \"constant\",\"aggregate\",..: NA NA NA NA NA NA NA NA NA NA ...\n ..- attr(*, \"names\")= chr [1:14] \"AREA\" \"PERIMETER\" \"CNTY_\" \"CNTY_ID\" ...\n```\n\n\n:::\n:::\n\n\nIf desired, we could make a simple R base plot-style map. In this case we'll do it based on just the county areas so that the polygons are filled with a color corresponding to how large the county is.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Make a graph\nplot(nc_poly[\"AREA\"], axes = T)\n```\n\n::: {.cell-output-display}\n![](mod_spatial_files/figure-html/plot-vector-1.png){fig-align='center' width=672}\n:::\n:::\n\n\n### Raster Data\n\nRaster data are stored as values in pixels. The resolution (i.e., size of the pixels) may differ among rasters but in all cases the data are stored at a per-pixel level.\n\nCommon raster data types include GeoTIFFs (.tif) and NetCDF (.nc) files.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load needed library\nlibrary(terra)\n\n# Read in raster\nnc_pixel <- terra::rast(x = file.path(\"data\", \"nc_elevation.tif\"))\n```\n:::\n\n\nOnce you've read in the raster file you can check it's structure as you would any other object but the resulting output is much less informative than for other object classes.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Check structure\nstr(nc_pixel)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nS4 class 'SpatRaster' [package \"terra\"]\n```\n\n\n:::\n:::\n\n\nRegardless, now that we have the raster loaded we can make a simple graph to check out what sort of data is stored in it. In this case, each pixel is 3 arcseconds on each side (~0.0002° latitude/longitude) and contains the elevation (in meters) of that pixel.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Make a graph\nterra::plot(nc_pixel)\n```\n\n::: {.cell-output-display}\n![](mod_spatial_files/figure-html/plot-raster-1.png){fig-align='center' width=672}\n:::\n:::\n\n\n:::\n\n## Coordinate Reference Systems\n\nA fundamental problem in spatial data is how to project data collected on a nearly spherical planet onto a two-dimensional plane. This has been solved--or at least clarified--by the use of Coordinate Reference Systems (a.k.a. \"CRS\"). All spatial data have a CRS that is explicitly identified in the data and/or the metadata because the data _are not interpretable_ without knowing which CRS is used.\n\n\nThe CRS defines the following information:\n\n1. **Datum** -- model for shape of the Earth including the starting coordinate pair and angular units that together define any particular point on the planet\n - Note that there can be global datums that work for any region of the world and local datums that only work for a particular area\n2. **Projection** -- math for the transformation to get from a round planet to a flat map\n3. **Additional parameters** -- any other information necessary to support the projection\n - E.g., the coordinates at the center of the map\n\nSome people use the analogy of peeling a citrus fruit and flattening the peel to describe the components of a CRS. The datum is the choice between a lemon or a grapefruit (i.e., the shape of the not-quite-spherical object) while the projection is the instructions for taking the complete peel and flattening it.\n\nYou can check and transform the CRS in any scripted language that allows the loading of spatial data though the specifics do differ between the types of spatial data we introduced earlier.\n\n:::{.panel-tabset}\n### Vector CRS\n\nFor vector data we can check the CRS with other functions from the `sf` library. It can be a little difficult to parse all of the information that returns but essentially it is most important that the CRS match that of any other spatial data with which we are working.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Check CRS\nsf::st_crs(x = nc_poly)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nCoordinate Reference System:\n User input: WGS 84 \n wkt:\nGEOGCRS[\"WGS 84\",\n DATUM[\"World Geodetic System 1984\",\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"latitude\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"longitude\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4326]]\n```\n\n\n:::\n:::\n\n\nOnce you know the CRS, you can transform the data to another CRS if desired. This is a relatively fast operation for vector data because we're transforming geometric data rather than potentially millions of pixels.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Transform CRS\nnc_poly_nad83 <- sf::st_transform(x = nc_poly, crs = 3083) # <1>\n\n# Re-check CRS\nsf::st_crs(nc_poly_nad83)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nCoordinate Reference System:\n User input: EPSG:3083 \n wkt:\nPROJCRS[\"NAD83 / Texas Centric Albers Equal Area\",\n BASEGEOGCRS[\"NAD83\",\n DATUM[\"North American Datum 1983\",\n ELLIPSOID[\"GRS 1980\",6378137,298.257222101,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4269]],\n CONVERSION[\"Texas Centric Albers Equal Area\",\n METHOD[\"Albers Equal Area\",\n ID[\"EPSG\",9822]],\n PARAMETER[\"Latitude of false origin\",18,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8821]],\n PARAMETER[\"Longitude of false origin\",-100,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8822]],\n PARAMETER[\"Latitude of 1st standard parallel\",27.5,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8823]],\n PARAMETER[\"Latitude of 2nd standard parallel\",35,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8824]],\n PARAMETER[\"Easting at false origin\",1500000,\n LENGTHUNIT[\"metre\",1],\n ID[\"EPSG\",8826]],\n PARAMETER[\"Northing at false origin\",6000000,\n LENGTHUNIT[\"metre\",1],\n ID[\"EPSG\",8827]]],\n CS[Cartesian,2],\n AXIS[\"easting (X)\",east,\n ORDER[1],\n LENGTHUNIT[\"metre\",1]],\n AXIS[\"northing (Y)\",north,\n ORDER[2],\n LENGTHUNIT[\"metre\",1]],\n USAGE[\n SCOPE[\"State-wide spatial data presentation requiring true area measurements.\"],\n AREA[\"United States (USA) - Texas.\"],\n BBOX[25.83,-106.66,36.5,-93.5]],\n ID[\"EPSG\",3083]]\n```\n\n\n:::\n:::\n\n1. In order to transform to a new CRS you'll need to identify the four-digit EPSG code for the desired CRS.\n\n### Raster CRS\n\nFor raster data we can check the CRS with other functions from the `terra` library. It can be a little difficult to parse all of the information that returns but essentially it is most important that the CRS match that of any other spatial data with which we are working.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Check CRS\nterra::crs(nc_pixel)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"GEOGCRS[\\\"WGS 84\\\",\\n ENSEMBLE[\\\"World Geodetic System 1984 ensemble\\\",\\n MEMBER[\\\"World Geodetic System 1984 (Transit)\\\"],\\n MEMBER[\\\"World Geodetic System 1984 (G730)\\\"],\\n MEMBER[\\\"World Geodetic System 1984 (G873)\\\"],\\n MEMBER[\\\"World Geodetic System 1984 (G1150)\\\"],\\n MEMBER[\\\"World Geodetic System 1984 (G1674)\\\"],\\n MEMBER[\\\"World Geodetic System 1984 (G1762)\\\"],\\n MEMBER[\\\"World Geodetic System 1984 (G2139)\\\"],\\n ELLIPSOID[\\\"WGS 84\\\",6378137,298.257223563,\\n LENGTHUNIT[\\\"metre\\\",1]],\\n ENSEMBLEACCURACY[2.0]],\\n PRIMEM[\\\"Greenwich\\\",0,\\n ANGLEUNIT[\\\"degree\\\",0.0174532925199433]],\\n CS[ellipsoidal,2],\\n AXIS[\\\"geodetic latitude (Lat)\\\",north,\\n ORDER[1],\\n ANGLEUNIT[\\\"degree\\\",0.0174532925199433]],\\n AXIS[\\\"geodetic longitude (Lon)\\\",east,\\n ORDER[2],\\n ANGLEUNIT[\\\"degree\\\",0.0174532925199433]],\\n USAGE[\\n SCOPE[\\\"Horizontal component of 3D system.\\\"],\\n AREA[\\\"World.\\\"],\\n BBOX[-90,-180,90,180]],\\n ID[\\\"EPSG\\\",4326]]\"\n```\n\n\n:::\n:::\n\n\nAs with vector data, if desired you can transform the data to another CRS. However, unlike vector data, transforming the CRS of raster data is _very_ computationally intense. If at all possible you should avoid re-projecting rasters. If you must re-project, consider doing so in an environment with greater computing power than a typical laptop. Also, you should export a new raster in your preferred CRS after transforming so that you reduce the likelihood that you need to re-project again later in the lifecylce of your project.\n\nIn the interests of making this website reasonably quick to re-build, the following code chunk is not actually evaluated but is the correct syntax for this operation.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Transform CRS\nnc_pixel_nad83 <- terra::project(x = nc_pixel, y = \"epsg:3083\")\n\n# Re-check CRS\nterra::crs(nc_pixel_nad83)\n```\n:::\n\n\n:::\n\n## Making Maps\n\nNow that we've covered the main types of spatial data as well as how to check the CRS (and transform if needed) we're ready to make maps! For consistency with other modules on data visualization, we'll use `ggplot2` to make our maps but note that base R also supports map making and there are many useful tutorials elsewhere on making maps in that framework.\n\nThe `maps` package includes some useful national and US state borders so we'll begin by preparing an object that combines both sets of borders.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load library\nlibrary(maps)\n\n# Make 'borders' object\nborders <- sf::st_as_sf(maps::map(database = \"world\", plot = F, fill = T)) %>%\n dplyr::bind_rows(sf::st_as_sf(maps::map(database = \"state\", plot = F, fill = T)))\n```\n:::\n\n\nNote that the simplest way of making a map that includes a raster is to coerce that raster into a dataframe. To do this we will translate each pixel's geographic coordinates into X and Y values.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnc_pixel_df <- as.data.frame(nc_pixel, xy = T) %>% \n # Rename the 'actual' data layer more clearly\n dplyr::rename(elevation_m = SRTMGL3_NC.003_SRTMGL3_DEM_doy2000042_aid0001)\n```\n:::\n\n\nWith the borders object and our modified raster in hand, we can now make a map that includes useful context for state/nation borders. Synthesis projects often cover a larger geographic extent than primary projects so this is particularly useful in ways it might not be for primary research.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load library\nlibrary(ggplot2)\n\n# Make map\nggplot(borders) +\n geom_sf(fill = \"gray95\") + # <1>\n coord_sf(xlim = c(-70, -90), ylim = c(30, 40), expand = F) + # <2>\n geom_tile(data = nc_pixel_df, aes(x = x, y = y, fill = elevation_m)) +\n labs(x = \"Longitude\", y = \"Latitude\")\n```\n:::\n\n1. This line is filling our nation polygons with a pale gray (helps to differentiate from ocean)\n2. Here we set the map extent so that we're only getting our region of interest\n\n

\n\"ggplot2-style\n

\n\nFrom here we can make additional `ggplot2`-style modifications as/if needed. This variant of map-making supports adding tabular data objects as well (though they would require separate geometries). Many of the LTER Network Office-funded groups that make maps include points for specific study locations along with a raster layer for an environmental / land cover characteristic that is particularly relevant to their research question and/or hypotheses.\n\n## Extracting Spatial Data\n\nBy far the most common spatial operation that LNO-funded synthesis working groups want to perform is extraction of some spatial covariate data within their region of interest. \"Extraction\" here includes (1) the actual gathering of values from the desired area, (2) summarization of those values, and (3) attaching those summarized values to an existing tabular dataset for further analysis/visualization. As with any coding task there are many ways of accomplishing this end but we'll focus on one method in the following code chunks: extraction in R via the `exactextractr` package.\n\nThis package expects that you'll want to extract raster data within a the borders described in some type of vector data. If you want the values in all the pixels of a GeoTIFF that fall inside the boundary defined by a shapefile, tools in this package will be helpful.\n\nWe'll begin by making a simpler version of our North Carolina vector data. This ensures that the extraction is as fast as possible for demonstrative purposes while still being replicable for you.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Simplify the vector data\nnc_poly_simp <- nc_poly %>% \n dplyr::filter(NAME %in% c(\"Wake\", \"Swain\")) %>% \n dplyr::select(NAME, AREA)\n\n# Check structure to demonstrate simplicity\ndplyr::glimpse(nc_poly_simp) # <1>\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 2\nColumns: 3\n$ NAME \"Wake\", \"Swain\"\n$ AREA 0.219, 0.141\n$ geometry MULTIPOLYGON (((-78.92082 3..., MULTIPOLYGON (((-83.3317 35..…\n```\n\n\n:::\n:::\n\n1. Note that even though we used `select` to remove all but one column, the geometry information is retained!\n\nNow let's use this simplified object and extract elevation for our counties of interest (normally we'd likely do this for all counties but the process is the same).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load needed libraries\nlibrary(exactextractr)\nlibrary(purrr)\n\n# Perform extraction\nextracted_df <- exactextractr::exact_extract(x = nc_pixel, y = nc_poly_simp, # <1>\n include_cols = c(\"NAME\", \"AREA\"), # <2>\n progress = F) %>% # <3>\n # Collapse to a dataframe\n purrr::list_rbind(x = .) # <4>\n\n# Check structure\ndplyr::glimpse(extracted_df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 521,671\nColumns: 4\n$ NAME \"Wake\", \"Wake\", \"Wake\", \"Wake\", \"Wake\", \"Wake\", \"Wak…\n$ AREA 0.219, 0.219, 0.219, 0.219, 0.219, 0.219, 0.219, 0.2…\n$ value 95, 97, 100, 100, 100, 101, 103, 105, 110, 111, 111,…\n$ coverage_fraction 0.027789708, 0.084839255, 0.141893625, 0.198947996, …\n```\n\n\n:::\n:::\n\n1. Note that functions like this one assume that both spatial data objects use the same CRS. We checked that earlier so we're good but remember to include that check _every_ time you do something like this!\n2. All column names specified here from the vector data (see the `y` argument) will be retained in the output. Otherwise only the extracted value and coverage fraction are included.\n3. This argument controls whether a progress bar is included. _Extremely_ useful when you have many polygons / the extraction takes a long time!\n4. The default output of this function is a list with one dataframe of extracted values per polygon in your vector data so we'll unlist to a dataframe for ease of future operations\n\nIn the above output we can see that it has extracted the elevation of _every pixel_ within each of our counties of interest and provided us with the percentage of that pixel that is covered by the polygon (i.e., by the shapefile). We can now summarize this however we'd like and--eventually--join it back onto the county data via the column(s) we specified should be retained from the original vector data.\n\n## Additional Resources\n\n### Papers & Documents\n\n- \n\n### Workshops & Courses\n\n- The Carpentries' [Introduction to Geospatial Raster & Vector Data with R](https://datacarpentry.org/r-raster-vector-geospatial/)\n- The Carpentries' [Introduction to R for Geospatial Data](https://datacarpentry.org/r-intro-geospatial/index.html)\n- Arctic Data Center's [Spatial and Image Data Using GeoPandas](https://learning.nceas.ucsb.edu/2023-03-arctic/sections/geopandas.html) chapter of their Scalable Computing course\n- Jason Flower's (UC Santa Barbara) [Introduction to rasters with `terra`](https://jflowernet.github.io/intro-terra-ecodatascience/)\n- King, R. [Spatial Data Visualization](https://github.com/king0708/spatial-data-viz) workshop\n\n### Websites\n\n- NASA's Application for Extracting and Exploring Analysis Ready Samples [(AppEEARS) Portal](https://appeears.earthdatacloud.nasa.gov/)\n", "supporting": [ "mod_spatial_files" ], diff --git a/mod_spatial.qmd b/mod_spatial.qmd index 2209c0d..c246765 100644 --- a/mod_spatial.qmd +++ b/mod_spatial.qmd @@ -39,6 +39,7 @@ install.packages("tidyverse") install.packages("sf") install.packages("terra") install.packages("maps") +install.packages("exactextractr") ``` ## Types of Spatial Data @@ -245,9 +246,48 @@ From here we can make additional `ggplot2`-style modifications as/if needed. Thi ## Extracting Spatial Data +By far the most common spatial operation that LNO-funded synthesis working groups want to perform is extraction of some spatial covariate data within their region of interest. "Extraction" here includes (1) the actual gathering of values from the desired area, (2) summarization of those values, and (3) attaching those summarized values to an existing tabular dataset for further analysis/visualization. As with any coding task there are many ways of accomplishing this end but we'll focus on one method in the following code chunks: extraction in R via the `exactextractr` package. +This package expects that you'll want to extract raster data within a the borders described in some type of vector data. If you want the values in all the pixels of a GeoTIFF that fall inside the boundary defined by a shapefile, tools in this package will be helpful. +We'll begin by making a simpler version of our North Carolina vector data. This ensures that the extraction is as fast as possible for demonstrative purposes while still being replicable for you. +```{r extract-prep} +# Simplify the vector data +nc_poly_simp <- nc_poly %>% + dplyr::filter(NAME %in% c("Wake", "Swain")) %>% + dplyr::select(NAME, AREA) + +# Check structure to demonstrate simplicity +dplyr::glimpse(nc_poly_simp) # <1> +``` +1. Note that even though we used `select` to remove all but one column, the geometry information is retained! + +Now let's use this simplified object and extract elevation for our counties of interest (normally we'd likely do this for all counties but the process is the same). + +```{r extract-actual} +#| message: false + +# Load needed libraries +library(exactextractr) +library(purrr) + +# Perform extraction +extracted_df <- exactextractr::exact_extract(x = nc_pixel, y = nc_poly_simp, # <1> + include_cols = c("NAME", "AREA"), # <2> + progress = F) %>% # <3> + # Collapse to a dataframe + purrr::list_rbind(x = .) # <4> + +# Check structure +dplyr::glimpse(extracted_df) +``` +1. Note that functions like this one assume that both spatial data objects use the same CRS. We checked that earlier so we're good but remember to include that check _every_ time you do something like this! +2. All column names specified here from the vector data (see the `y` argument) will be retained in the output. Otherwise only the extracted value and coverage fraction are included. +3. This argument controls whether a progress bar is included. _Extremely_ useful when you have many polygons / the extraction takes a long time! +4. The default output of this function is a list with one dataframe of extracted values per polygon in your vector data so we'll unlist to a dataframe for ease of future operations + +In the above output we can see that it has extracted the elevation of _every pixel_ within each of our counties of interest and provided us with the percentage of that pixel that is covered by the polygon (i.e., by the shapefile). We can now summarize this however we'd like and--eventually--join it back onto the county data via the column(s) we specified should be retained from the original vector data. ## Additional Resources