Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add engine-choice as a standard functionality to all indicator calc functions #69

Closed
Jo-Schie opened this issue May 14, 2022 · 7 comments · Fixed by #151
Closed

Add engine-choice as a standard functionality to all indicator calc functions #69

Jo-Schie opened this issue May 14, 2022 · 7 comments · Fixed by #151

Comments

@Jo-Schie
Copy link
Member

@Ohm-Np masters thesis will show that choosing the right processing strategy for big geospatial data analysis in R depends amongst others on the spatial resolution of the data and the amount of features to be processed. Different packages, particularly terra and exactextract perform differently in different settings, and it is desirable to have the flexibility to choose the best function for a specific task (in addition to being able to parallelize the processing, as our package currently already allows).

So my suggestion is to create a standardized "engine chooser function" that allows people to choose e.g. between terra zonal, terrax extract and exact extract. Similarly, it could be possible to implement this for vector data, letting people choose between terra intersection and sf intersection. However, that would be probably more difficult since both work with different object classes. Maybe despite the knowledge that Om's thesis is creating around big data analysis, this could be a central contribution from his thesis to the package.

fyi: @goergen95

@Ohm-Np
Copy link
Contributor

Ohm-Np commented Jun 7, 2022

Here's a small reprex showing use of - zonal vs terra::extract - for computation of areal statistics:

## load sample datasets
library(terra)
rc <- rast("../../Om/test/classified_2000.tif")
roi <- vect("../../Om/test/roi_grids.gpkg")

## area raster
crop <- terra::crop(rc, roi[1, ])
mask <- terra::mask(crop, roi[1, ])
area <- terra::cellSize(mask, unit = "km")

## area of a single aoi
polygon_area <- terra::expanse(roi[1, ], "km")
polygon_area
# [1] 4.988084

Note The values 0 and 1 in column max indicates whether the polygon is affected or not by hurricane & raster resolution is ~111 m.

masked_roi

The result from patches_zonal indicates area with values 0 and 1, however, the sum of the both areas seem to exceed the area of polygon by ~0.5 sqkm which is much off considering the accuracy.

## zonal
patches_zonal <- terra::zonal(area, mask, "sum")
patches_zonal

#   max     area
# 1   0 1.189579
# 2   1 4.384729

In case of patches_terra we only get the overall area of the raster intersected by polygon, but we can not get areal statistics directly (for values 0 & 1) as in zonal.

## terra extract on area raster
patches_terra <- terra::extract(area, roi[1, ], sum)
patches_terra

# ID     area
# 1  1 4.962497

The probable workaround to get area from terra::extract is to divide the total area by the number of cells - and - multiply by the number of cells for the particular values.

## terra extract on mask layer
mask_terra <- terra::extract(mask, roi[1, ])
mask_terra <- aggregate(. ~ max, data = mask_terra, FUN = sum)
mask_terra

# max  ID
# 1   0  86
# 2   1 352

area_of_polygon = patches_terra[[2]]
ncells = mask_terra[[2]][1] + mask_terra[[2]][2]
area_per_cell <- area_of_polygon/ncells

area_values_0 <- area_per_cell * mask_terra[[2]][1]
area_values_0
# 0.9743715

area_values_1 <- area_per_cell * mask_terra[[2]][2]
area_values_1
# 3.988125

extract_area <-  area_values_0 + area_values_1
extract_area
# 4.962497

Result (area comparison)

zonal_area <-  sum(patches_zonal$area)
result <- data.frame(polygon_area = polygon_area,
                     zonal_area = zonal_area,
                     diff_zonal = zonal_area - polygon_area,
                     extract_area = extract_area,
                     diff_extract = extract_area - polygon_area)
result
polygon_area zonal_area diff_zonal extract_area diff_extract
4.988084 5.574308 0.5862243 4.962497 -0.0255875

So, I think the use of engine choices also depends on the complexity of the codes/routines.

PS: I am not sure if we have much easier way to compute areal stats using terra::extract.

@goergen95
Copy link
Member

It would be more helpful if examples like above include data we all have access to.

The current way how the different engines are implemented is shown with the example below. As we can see, there seems to be no difference in the area estimates between the two options using terra and only a slight difference using exactextractr. In my view it boils down to a performance question, that is which implementation is the fastest. Please also note that this is only an area estimation because the original data is unprojected. Again, this is a trade-off between accuracy and performance.

library(mapme.biodiversity)
library(sf)
#> Linking to GEOS 3.10.1, GDAL 3.4.0, PROJ 8.2.0; sf_use_s2() is TRUE
library(terra)
#> terra 1.5.30
library(exactextractr)
library(tibble)

clay <- system.file("res", "soilgrids", "clay_0-5cm_mean.tif", 
                        package = "mapme.biodiversity") |>
  rast()
aoi <- system.file("extdata", "sierra_de_neiba_478140_2.gpkg",
                   package = "mapme.biodiversity")

# slightly different area estimates between sf and terra for unprojected geometries
aoi_sf <- read_sf(aoi)
st_area(aoi_sf) 
#> 183491611 [m^2]

aoi_terra <- vect(aoi)
expanse(aoi_terra)
#> [1] 182921585

# retrieve areal statistics using terra::extract
clay_binary <- clay < 350
area <- cellSize(clay, unit = "m")
(result_extract <- extract(area * clay_binary, aoi_terra, sum))
#>   ID     area
#> 1  1 94329419

# retrive areal statistics via terra::zonal
aoi_zone <- rasterize(aoi_terra, clay)
(result_zonal <- zonal(area * clay_binary, aoi_zone, sum))
#>   layer     area
#> 1     1 94329419

# retrieve areal statistics via exactextractr:exact_extract
(result_exact <- exact_extract(area * clay_binary, aoi_sf, "sum"))
#> [1] 94455208


# calculate difference in percent
result_zonal$area / result_exact * 100
#> [1] 99.86683

Created on 2022-06-11 by the reprex package (v2.0.1)

@Jo-Schie
Copy link
Member Author

Thanks to both of you for your examples. @goergen95 : your example most probably does not show considerable differences because your AOI is rather big (183.492 sqkm) with a small raster as input (see also my issue about this). So it's not purely about performances when having edge cases (small AOIs or coarse raster resolutions). Not sure how exact extract does it though to get more proximate to the real area...

@Ohm-Np
Copy link
Contributor

Ohm-Np commented Aug 12, 2022

original extent how terra works how exactextractr works
crop_raster mask_raster mask_rasterExact

@Ohm-Np
Copy link
Contributor

Ohm-Np commented Aug 12, 2022

Status of engine choices availability in current raster calc. functions:

SN Indicator Engine-choice
1 drought_indicator Yes
2 elevation Yes
3 tri Yes
4 population_count Yes
5 precipitation_chirps Yes
6 precipitation_wc Yes
7 soilproperties Yes
8 temperature_max_wc Yes
9 temperature_min_wc Yes
10 traveltime Yes
11 Landcover No
12 treecover_area No
13 treecover_area_and_emissions No
14 treecoverloss_emissions No

@Jo-Schie
Copy link
Member Author

original extent how terra works how exactextractr works
crop_raster mask_raster mask_rasterExact

Is there a difference between terra extract and terra zonal?

@Ohm-Np
Copy link
Contributor

Ohm-Np commented Aug 15, 2022

Is there a difference between terra extract and terra zonal?

Both the terra extract and terra zonal follow the same approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants