From 6645404fe5de1c938f1051cbbc8c232373d4a56f Mon Sep 17 00:00:00 2001 From: Johannes Heisig Date: Wed, 10 Dec 2025 10:40:52 +0100 Subject: [PATCH 1/6] add qmd files 53 & 54 (r + rarr/gdal) --- 05_zarr_tools/53_eopf_r_rarr.qmd | 278 +++++++++++++++++++++++++++++++ 05_zarr_tools/54_eopf_r_gdal.qmd | 221 ++++++++++++++++++++++++ 2 files changed, 499 insertions(+) create mode 100644 05_zarr_tools/53_eopf_r_rarr.qmd create mode 100644 05_zarr_tools/54_eopf_r_gdal.qmd diff --git a/05_zarr_tools/53_eopf_r_rarr.qmd b/05_zarr_tools/53_eopf_r_rarr.qmd new file mode 100644 index 0000000..5cce9ba --- /dev/null +++ b/05_zarr_tools/53_eopf_r_rarr.qmd @@ -0,0 +1,278 @@ +--- +title: "Accessing EOPF Sentinel Zarr using `Rarr`" +format: html +execute: + cache: refresh +--- + +### Introduction + +This notebook demonstrates how to **retrieve** remotely stored Zarr data using the `Rarr` package in R. We will explore how to **read and visualize** Zarr data (zarrays) and their metadata. For subsequent analysis and visualization we turn the zarray into `stars` objects which are more suitable for spatial data. + +::: {.callout-tip} +This notebook has a sibling, which demonstrates how to access the same data using the GDAL Zarr driver. You can find it [here](./remote_Zarr_via_GDAL.html). +::: + +### What we will learn + +- ✏️ How to edit URLs of Zarr stores to make them readable for GDAL +- 🔎 Which read-functions and arguments to use in `stars` and `terra` +- 🚧 Current limitations of these packages + +### Prerequisites + +To follow this notebook, you will need to load a STAC item in Zarr format from the [EOPF Zarr STAC Catalog](https://stac.browser.user.eopf.eodc.eu/?.language=en). If you are new to STAC inside the R environment, we suggest to follow the [Access the EOPF Zarr STAC API with R tutorial](https://eopf-toolkit.github.io/eopf-101/51_eopf_stac_r.html). The item we are using for this example mirrors the one used in the [Sentinel-2 L1C MSI Zarr Product Exploration notebook](https://eopf-sample-service.github.io/eopf-sample-notebooks/sentinel-2-l1c-msi-zarr-product-exploration/#introduction) which is hosted on EODC's object storage and is accessible under the URL below. + +```{r} +zarr_url = "https://objects.eodc.eu/e05ab01a9d56408d82ac32d69a5aae2a:sample-data/tutorial_data/cpm_v253/S2B_MSIL1C_20250113T103309_N0511_R108_T32TLQ_20250113T122458.zarr" +``` + +### Import packages + +```{r} +#| message: false +#| warning: false + +library(Rarr) # reading; BiocManager::install("Rarr") +library(stars) # spatial objects +library(dplyr) # table operations +library(jsonlite) # parsing metadata +library(mapview) # interactive maps +``` + +## Metadata exploration with Rarr + +::: {.callout-important} +In this notebook we show the current capabilities and limitations of `Rarr` in terms of reading Zarr array(s) and their metadata. Successful attempts are marked with a ✅, failed attempts with a ⛔. +::: + +Using the `Rarr::zarr_overview` function we can get meta data for all assets in the remote Zarr store as a table. Here list all arrays and their data type, compression, dimensions, etc. + +```{r} +zarr_content = zarr_overview(zarr_url, as_data_frame = T) |> + mutate(var = sub(zarr_url, "", path)) |> + select(var, everything(), -path) + +filter(zarr_content, grepl("reflectance", var)) +``` + + We can see that all Sentinel-2 bands are present, but split into directories (or groups) based on spatial resolution (10, 20, 60 meters), which have different dimensions (number and size of pixels). + +::: {.callout-tip collapse="true"} +## Expand for full Zarr metadata output + +```{r} +#| echo: false + +print(zarr_content) +``` +::: + +Now that we have explored the content, let's get some related metadata. `Rarr` provides the `read_zattrs` function for this purpose, which looks for the `.zattrs` file inside the Zarr store. + +⛔ Unfortunately we were not successful when applying it to our remote store. +```{r} +#| error: true + +read_zattrs(zarr_url) +read_zattrs(file.path(zarr_url, ".zattrs")) +``` + +✅ As a simple workaround we can read the JSON directly: +```{r} +zattrs = read_json(file.path(zarr_url, ".zattrs")) +names(zattrs) +``` + +::: {.callout-tip collapse="true"} +## Expand for full Zarr metadata output + +```{r} +#| echo: false + +print(zarr_content) +``` +::: + +The same information can be retrieved with `sf::gdal_utils('mdiminfo', ...)`. + +## Reading Zarr data with `Rarr` + +⛔ `Rarr`, like `stars` and `terra`, is not capable of reading multiple arrays from the remote store simultaneously. + +```{r} +#| error: true + +read_zarr_array(zarr_url) +``` + +✅ But if a single band array is selected we are able to access the desired information. + +```{r} +band_variable = "/measurements/reflectance/r60m/b01" + +zarray = read_zarr_array(paste0(zarr_url, band_variable)) +str(zarray) +``` + +However, the function returns a matrix (2D array) lacking any spatial metadata (e.g. Coordinate Reference System) or coordinates. + +::: {.callout-warning} +## Caution + +The raw array has flipped dimensions! But we can access X and Y coordinates for each pixel separately and handle the issue. +::: + +```{r} +zarray_60m_x = paste0(zarr_url, "/measurements/reflectance/r60m/x") |> + read_zarr_array() +zarray_60m_y = paste0(zarr_url, "/measurements/reflectance/r60m/y") |> + read_zarr_array() + +print(range(zarray_60m_x)) +print(range(zarray_60m_y)) +``` + +Let's construct a `stars` object from the components found in the Zarr store: +- data array +- X and Y coordinates +- CRS (from metadata) + +```{r} +# read CRS from attributes +zarr_crs = zattrs$stac_discovery$properties$`proj:epsg` + +# band name +var = basename(band_variable) + +# read and transpose matrix before converting to stars +zz = st_as_stars(zarray |> t()) |> + st_set_dimensions(1, names = "X", values = zarray_60m_x) |> + st_set_dimensions(2, names = "Y", values = zarray_60m_y) |> + setNames(var) +st_crs(zz) = st_crs(zarr_crs) + +# display the object +zz + +# plot as a map +plot(zz, axes = TRUE) +``` + +The map shows Sentinel-2 band 1 (coastal aerosol band) reflectance at 60m spatial resolution for north-western Italy with the Alps in the West and the city of Torino in the North-East. + + +We can enclose these steps in a single function with some flexibility for variable names and resolutions. This allows us to retireve any band at any available resolution from the remote Zarr store. + +```{r} +st_read_zarray = function(path, var, res, ...){ + + # room for checks and input tests: + # stopifnot valid zarr url / variable / resolution ... + + # get metadata including CRS + zattrs = jsonlite::read_json(file.path(path, ".zattrs")) + zarr_crs = zattrs$stac_discovery$properties$`proj:epsg` + + # zattrs$stac_discovery$properties$`eopf:resolutions` + res_char = switch(as.character(res), + "10" = "r10m", + "20" = "r20m", + "60" = "r60m", + stop("Resolution must be one of 10, 20, or 60.") + ) + + # ...: make use of index arguments of read_zarr_array if desired + zarray = file.path(path, "measurements/reflectance", res_char, var) |> + read_zarr_array(...) + zarray_x = file.path(path, "measurements/reflectance", res_char, "x") |> + read_zarr_array(...) + zarray_y = file.path(path, "measurements/reflectance", res_char, "y") |> + read_zarr_array(...) + + # transpose matrixe before converting to stars + z = zarray |> + t() |> + st_as_stars() |> + st_set_dimensions(1, names = "X", values = zarray_x) |> + st_set_dimensions(2, names = "Y", values = zarray_y) |> + setNames(var) + st_crs(z) = st_crs(zarr_crs) + return(z) +} +``` + + +## Read and combine multiple zarrays + +✅ Now we can read multiple bands, combine them to a single multi-band object, and visualize them in a static map... + +```{r} +system.time({ + b01 = st_read_zarray(zarr_url, "b01", 60) + b09 = st_read_zarray(zarr_url, "b09", 60) + (multi_band = c(b01, b09) |> merge()) +}) + +plot(multi_band, axes = TRUE) +``` + +... or by generating an interactive visualization: + +```{r} +#| echo: false +mapviewOptions(basemaps = c("OpenTopoMap","Esri.WorldImagery", + "CartoDB.Positron")) +``` + +```{r} +#| message: false +#| cache: true + +mapview(multi_band) +``` + + +## Benchmark + +Using `Rarr` to read remote Zarr stores is just one of several options. R packages `stars` and `terra` provide functionality, that uses GDAl's Zarr driver in the background. To see if there is a benefit to each of the methods, we can compare their execution time and memory allocation with a little benchmark. + +```{r} +# GDAL-compatible VSI URL +vsi_prefix = "ZARR:/vsicurl/" +vsi_url = paste0(vsi_prefix, dplyr::as_label(zarr_url)) + +# functions for reading a single Sentinel-2 band to memory +f_stars = function(){rs = read_stars(paste(vsi_url, band_variable, sep = ":"))} +f_mdim = function(){sm = read_mdim(vsi_url, band_variable, proxy = FALSE) |> + setNames(basename(band_variable))} +f_rast = function(){tr = terra::rast(vsi_url, band_variable) |> terra::toMemory()} +f_rarr = function(){rr = st_read_zarray(zarr_url, "b01", 60)} +``` + +```{r} +#| warning: false + +bench::mark(f_stars(), f_mdim(), f_rast(), f_rarr(), + check = F, iterations = 5) +``` + +The benchmark does not reveal a clear winner in terms of speed. However, `Rarr` seems to allocate more memory compared to `stars` and `terra`. + +## 💪 Now it is your turn + +- 🔭 **Task**: Supply more arguments to `st_read_zarray` and pass a subset to read only parts on an array. Visit `?Rarr::read_zarr_array` for more details. +- 💾 **Task**: Try writing zarrays to disk using `Rarr::write_zarr_array()`. +- 📖 Read the [Rarr documentation](https://huber-group-embl.github.io/Rarr/articles/Rarr.html). + +## Conclusion + +In this notebook we have demonstrated how to access remote Zarr stores in R using the `Rarr` package. We have seen that it is possible to read specific data arrays, but `read_zarr_array` lacks the ability of accessing anything other than the raw array. To combine it with a CRS or coordinates we need to leverage a spatial class like `stars` and manually set the relevant metadata. Further, the benchmark did not reveal a significant benefit regarding processing time or memory allocation over "traditional" GDAL-based data retrieval. + +## What's next? + +Further tests should evaluate wether data retrieved via `Rarr` versus GDAL are identical in terms of spatial extent and offset. + +Explore [this related notebook](./remote_Zarr_via_GDAL.html) which demonstrates how to access the same Zarr data using the GDAL Zarr driver via the `stars` and `terra` packages. + + diff --git a/05_zarr_tools/54_eopf_r_gdal.qmd b/05_zarr_tools/54_eopf_r_gdal.qmd new file mode 100644 index 0000000..6301f13 --- /dev/null +++ b/05_zarr_tools/54_eopf_r_gdal.qmd @@ -0,0 +1,221 @@ +--- +title: "Accessing EOPF Sentinel Zarr via GDAL in `stars` and `terra`" +format: html +execute: + cache: refresh +--- + +### Introduction + +This notebook demonstrates how to **retrieve** remotely stored Zarr data using the Zarr GDAL driver in R. We will explore how to **read and visualize** Zarr data (zarrays) and their metadata using the `stars`, `sf` and `terra` packages as well as their current limitations. + +::: {.callout-tip} +This notebook has a sibling, which demonstrates how to access the same Zarr data using the `Rarr` package. You can find it [here](./remote_Zarr_via_Rarr.html). +::: + +### What we will learn + +- ✏️ How to edit URLs of Zarr stores to make them readable for GDAL +- 🔎 Which read-functions and arguments to use in `stars` and `terra` +- 🚧 Current limitations of these packages + +### Prerequisites + +To follow this notebook, you will need to load a STAC item in Zarr format from the [EOPF Zarr STAC Catalog](https://stac.browser.user.eopf.eodc.eu/?.language=en). If you are new to STAC inside the R environment, we suggest to follow the [Access the EOPF Zarr STAC API with R tutorial](https://eopf-toolkit.github.io/eopf-101/51_eopf_stac_r.html). The item we are using for this example mirrors the one used in the [Sentinel-2 L1C MSI Zarr Product Exploration notebook](https://eopf-sample-service.github.io/eopf-sample-notebooks/sentinel-2-l1c-msi-zarr-product-exploration/#introduction) which is hosted on EODC's object storage and is accessible under the URL below. + +```{r} +zarr_url = "https://objects.eodc.eu/e05ab01a9d56408d82ac32d69a5aae2a:sample-data/tutorial_data/cpm_v253/S2B_MSIL1C_20250113T103309_N0511_R108_T32TLQ_20250113T122458.zarr" +``` + +### Import packages + +```{r} +#| message: false +#| warning: false + +library(stars) # reading; imports sf too +library(terra) # reading +library(jsonlite) # parsing metadata +library(mapview) # interactive maps +``` + + +## GDAL's Zarr driver + +Common R-packages for raster data like `stars` and `terra` can leverage GDAL's raster drivers to read data directly from remote locations. This is done using GDAL's virtual file system (VSI) capabilities, specifically the `/vsicurl/` prefix for accessing files over HTTP(S). + +For Zarr data GDAL provides a dedicated Zarr driver that requires an additional `ZARR:`-prefix to the VSI URL. Finally, the complete prefix needs to be quoted: `"ZARR:/vsicurl/"`. + +```{r} +vsi_prefix = "ZARR:/vsicurl/" +vsi_url = paste0(vsi_prefix, dplyr::as_label(zarr_url)) # needs special quoting +print(vsi_url) +``` + +## Metadata exploration with GDAL + +With the VSI URL pointing to the remote Zarr store we can now use GDAL utilities to explore the metadata of the Zarr store. The `sf::gdal_utils` function provides an interface to GDAL command-line utilities. We use the `mdiminfo` command to retrieve metadata about the multidimensional arrays (zarrays) contained within the Zarr store. We only print the top-level names of the arrays here. + +```{r} +metadata = sf::gdal_utils("mdiminfo", source = vsi_url, quiet = T) |> + fromJSON() +names(metadata) +``` + +::: {.callout-tip collapse="true"} +## Expand for full GDAL MDIM metadata output +```{r} +#| echo: false + +print(metadata) +``` +::: + + +## Reading Zarr data with `stars` and `terra` + +::: {.callout-important} +In this section we show the current capabilities and limitations of `stars` and `terra` when it comes to reading Zarr array(s). We test the functions `read_stars` and `read_mdim` from `stars` and `rast` from the `terra`package. +Successful attempts are marked with a ✅, failed attempts with a ⛔. +::: + +First we define the path to a specific data array (band 1, 60 meter resolution) inside the Zarr store which can be understood as a single raster layer. + +```{r, error=T} +band_variable = "/measurements/reflectance/r60m/b01" +``` + +### `stars::read_stars()` + +⛔ Traditional approach for reading all bands/layers and specifying a driver. +```{r} +#| error: true + +r = read_stars(vsi_url, driver = "ZARR") +``` + +⛔ Traditional approach for reading all bands/layers and specifying a sub-dataset (integer). +```{r} +#| error: true + +r = read_stars(vsi_url, sub = 1) +``` + +⛔ Traditional approach for reading all bands/layers and specifying a sub-dataset (path). +```{r} +#| error: true + +r = read_stars(vsi_url, sub = band_variable) +``` + +✅ Constructing the full Zarray path from prefix, URL, and band variable. + +```{r} +(r = read_stars(paste(vsi_url, band_variable, sep = ":"))) +st_crs(r) # NA: empty +``` + +This method successfully reads the specified band from the remote Zarr store. + +:::{.callout-warning} +Note that the coordinate reference system (CRS) is not automatically recognized and needs to be set manually. We can look it up in the metadata where it is listed as one of the STAC properties. +::: + +```{r} +(crs = metadata$attributes$stac_discovery$properties$`proj:epsg`) +st_crs(r) = crs +``` + +Our data is in CRS _WGS 84 / UTM zone 32N (EPSG:32632)_. +We can now safely plot it as a static map... + +```{r} +system.time(plot(r, axes = TRUE)) +``` + +... or is an interactive visualization: + +```{r} +#| message: false +#| echo: false + +mapviewOptions(basemaps = c("OpenTopoMap","Esri.WorldImagery", + "CartoDB.Positron")) +``` + +```{r} +mapview(r) +``` + +--- + +### `stars::read_mdim()` + +⛔ Traditional approach for reading all bands/layers at once. "?" should return a list of array possible names. +```{r} +#| error: true + +m = read_mdim(vsi_url) +m = read_mdim(vsi_url, variable = "?") # query array names +``` + + +✅ Specifying the full Zarray path from prefix, URL, and band variable. Read as proxy object. +```{r} +system.time({ + (m = read_mdim(vsi_url, variable = band_variable, proxy = TRUE)) + }) # fast: only reads metadata + +st_crs(m) # NA: empty +system.time(plot(m, axes = T)) # slow: only here the full array is downloaded +``` + +As for `read_stars()`, the CRS is not automatically recognized and needs to be set manually. + +--- + +### `terra::rast()` + +✅ / ⛔ Traditional approach for reading all bands/layers at once. + +```{r, error=T} +(tr = terra::rast(vsi_url)) +tr$b02 + +names(tr) # has non-unique names, e.g. three times "b02" +``` + +`terra` reads some of the available bands, but fails to name them uniquely. It likely searches for band names like "b02" and finds the first group of data arrays by order, in this case the 10-meter bands (2, 3, 4, and 8). The Zarr store contains multiple array named by band such as "/measurements/reflectance/r10m/b02", their corresponding masks (e.g. "quality/mask/r10m/b02) and the detector footprint (e.g. "/conditions/mask/detector_footprint/r10m/b02"). All Zarrays share the same name, "b02", leading to non-unique layer names in the resulting `SpatRaster` object. When constructing unique names we can at least visualize the data. + +```{r} +# rename layers to (not meaningful) unique names +names(tr) = paste0(letters[1:nlyr(tr)], names(tr)) +names(tr) + +plot(tr) +``` + +✅ Specifying the full Zarray path from prefix, URL, and band variable as sub-dataset. Read as proxy object. +```{r} +system.time({ + tr = rast(vsi_url, subds = band_variable) + })# fast: only reads metadata + +crs(tr) # "": empty +system.time(plot(tr)) # slow: only here the full array is downloaded +``` + +As for the `stars`-functions, the CRS is not automatically recognized and needs to be set manually. + +## 💪 Now it is your turn + +- 🔍 **Task**: Explore the metadata of other Zarr stores using GDAL's `mdiminfo` command via `sf::gdal_utils()`. Store the JSON output in an R-object and find entries that store relevant metadata, such as the CRS, bounding box, or sensor-related information. +- 📖 Read the [GDAL Zarr driver documentation](https://gdal.org/en/stable/drivers/raster/zarr.html) to learn more about opening options. + +## Conclusion + +In this notebook we have demonstrated how to access remote Zarr stores using GDAL's Zarr driver in R with the `stars` and `terra` packages. We have seen that while it is possible to read specific data arrays, there are some limitations, such as the lack of automatic CRS recognition and challenges with reading multiple arrays simultaneously. + +## What's next? + +Explore [this related notebook](./remote_Zarr_via_Rarr.html) which demonstrates how to access the same Zarr data using the `Rarr` package and building a `stars` object from it. From eca5fe2c0354f1c9ff6ed071051f47d905e5e398 Mon Sep 17 00:00:00 2001 From: joheisig Date: Wed, 10 Dec 2025 10:59:07 +0100 Subject: [PATCH 2/6] match file naming convention --- 05_zarr_tools/54_zarr_R_rarr.qmd | 278 +++++++++++++++++++++++++++++++ 05_zarr_tools/55_zarr_R_gdal.qmd | 221 ++++++++++++++++++++++++ 2 files changed, 499 insertions(+) create mode 100644 05_zarr_tools/54_zarr_R_rarr.qmd create mode 100644 05_zarr_tools/55_zarr_R_gdal.qmd diff --git a/05_zarr_tools/54_zarr_R_rarr.qmd b/05_zarr_tools/54_zarr_R_rarr.qmd new file mode 100644 index 0000000..5cce9ba --- /dev/null +++ b/05_zarr_tools/54_zarr_R_rarr.qmd @@ -0,0 +1,278 @@ +--- +title: "Accessing EOPF Sentinel Zarr using `Rarr`" +format: html +execute: + cache: refresh +--- + +### Introduction + +This notebook demonstrates how to **retrieve** remotely stored Zarr data using the `Rarr` package in R. We will explore how to **read and visualize** Zarr data (zarrays) and their metadata. For subsequent analysis and visualization we turn the zarray into `stars` objects which are more suitable for spatial data. + +::: {.callout-tip} +This notebook has a sibling, which demonstrates how to access the same data using the GDAL Zarr driver. You can find it [here](./remote_Zarr_via_GDAL.html). +::: + +### What we will learn + +- ✏️ How to edit URLs of Zarr stores to make them readable for GDAL +- 🔎 Which read-functions and arguments to use in `stars` and `terra` +- 🚧 Current limitations of these packages + +### Prerequisites + +To follow this notebook, you will need to load a STAC item in Zarr format from the [EOPF Zarr STAC Catalog](https://stac.browser.user.eopf.eodc.eu/?.language=en). If you are new to STAC inside the R environment, we suggest to follow the [Access the EOPF Zarr STAC API with R tutorial](https://eopf-toolkit.github.io/eopf-101/51_eopf_stac_r.html). The item we are using for this example mirrors the one used in the [Sentinel-2 L1C MSI Zarr Product Exploration notebook](https://eopf-sample-service.github.io/eopf-sample-notebooks/sentinel-2-l1c-msi-zarr-product-exploration/#introduction) which is hosted on EODC's object storage and is accessible under the URL below. + +```{r} +zarr_url = "https://objects.eodc.eu/e05ab01a9d56408d82ac32d69a5aae2a:sample-data/tutorial_data/cpm_v253/S2B_MSIL1C_20250113T103309_N0511_R108_T32TLQ_20250113T122458.zarr" +``` + +### Import packages + +```{r} +#| message: false +#| warning: false + +library(Rarr) # reading; BiocManager::install("Rarr") +library(stars) # spatial objects +library(dplyr) # table operations +library(jsonlite) # parsing metadata +library(mapview) # interactive maps +``` + +## Metadata exploration with Rarr + +::: {.callout-important} +In this notebook we show the current capabilities and limitations of `Rarr` in terms of reading Zarr array(s) and their metadata. Successful attempts are marked with a ✅, failed attempts with a ⛔. +::: + +Using the `Rarr::zarr_overview` function we can get meta data for all assets in the remote Zarr store as a table. Here list all arrays and their data type, compression, dimensions, etc. + +```{r} +zarr_content = zarr_overview(zarr_url, as_data_frame = T) |> + mutate(var = sub(zarr_url, "", path)) |> + select(var, everything(), -path) + +filter(zarr_content, grepl("reflectance", var)) +``` + + We can see that all Sentinel-2 bands are present, but split into directories (or groups) based on spatial resolution (10, 20, 60 meters), which have different dimensions (number and size of pixels). + +::: {.callout-tip collapse="true"} +## Expand for full Zarr metadata output + +```{r} +#| echo: false + +print(zarr_content) +``` +::: + +Now that we have explored the content, let's get some related metadata. `Rarr` provides the `read_zattrs` function for this purpose, which looks for the `.zattrs` file inside the Zarr store. + +⛔ Unfortunately we were not successful when applying it to our remote store. +```{r} +#| error: true + +read_zattrs(zarr_url) +read_zattrs(file.path(zarr_url, ".zattrs")) +``` + +✅ As a simple workaround we can read the JSON directly: +```{r} +zattrs = read_json(file.path(zarr_url, ".zattrs")) +names(zattrs) +``` + +::: {.callout-tip collapse="true"} +## Expand for full Zarr metadata output + +```{r} +#| echo: false + +print(zarr_content) +``` +::: + +The same information can be retrieved with `sf::gdal_utils('mdiminfo', ...)`. + +## Reading Zarr data with `Rarr` + +⛔ `Rarr`, like `stars` and `terra`, is not capable of reading multiple arrays from the remote store simultaneously. + +```{r} +#| error: true + +read_zarr_array(zarr_url) +``` + +✅ But if a single band array is selected we are able to access the desired information. + +```{r} +band_variable = "/measurements/reflectance/r60m/b01" + +zarray = read_zarr_array(paste0(zarr_url, band_variable)) +str(zarray) +``` + +However, the function returns a matrix (2D array) lacking any spatial metadata (e.g. Coordinate Reference System) or coordinates. + +::: {.callout-warning} +## Caution + +The raw array has flipped dimensions! But we can access X and Y coordinates for each pixel separately and handle the issue. +::: + +```{r} +zarray_60m_x = paste0(zarr_url, "/measurements/reflectance/r60m/x") |> + read_zarr_array() +zarray_60m_y = paste0(zarr_url, "/measurements/reflectance/r60m/y") |> + read_zarr_array() + +print(range(zarray_60m_x)) +print(range(zarray_60m_y)) +``` + +Let's construct a `stars` object from the components found in the Zarr store: +- data array +- X and Y coordinates +- CRS (from metadata) + +```{r} +# read CRS from attributes +zarr_crs = zattrs$stac_discovery$properties$`proj:epsg` + +# band name +var = basename(band_variable) + +# read and transpose matrix before converting to stars +zz = st_as_stars(zarray |> t()) |> + st_set_dimensions(1, names = "X", values = zarray_60m_x) |> + st_set_dimensions(2, names = "Y", values = zarray_60m_y) |> + setNames(var) +st_crs(zz) = st_crs(zarr_crs) + +# display the object +zz + +# plot as a map +plot(zz, axes = TRUE) +``` + +The map shows Sentinel-2 band 1 (coastal aerosol band) reflectance at 60m spatial resolution for north-western Italy with the Alps in the West and the city of Torino in the North-East. + + +We can enclose these steps in a single function with some flexibility for variable names and resolutions. This allows us to retireve any band at any available resolution from the remote Zarr store. + +```{r} +st_read_zarray = function(path, var, res, ...){ + + # room for checks and input tests: + # stopifnot valid zarr url / variable / resolution ... + + # get metadata including CRS + zattrs = jsonlite::read_json(file.path(path, ".zattrs")) + zarr_crs = zattrs$stac_discovery$properties$`proj:epsg` + + # zattrs$stac_discovery$properties$`eopf:resolutions` + res_char = switch(as.character(res), + "10" = "r10m", + "20" = "r20m", + "60" = "r60m", + stop("Resolution must be one of 10, 20, or 60.") + ) + + # ...: make use of index arguments of read_zarr_array if desired + zarray = file.path(path, "measurements/reflectance", res_char, var) |> + read_zarr_array(...) + zarray_x = file.path(path, "measurements/reflectance", res_char, "x") |> + read_zarr_array(...) + zarray_y = file.path(path, "measurements/reflectance", res_char, "y") |> + read_zarr_array(...) + + # transpose matrixe before converting to stars + z = zarray |> + t() |> + st_as_stars() |> + st_set_dimensions(1, names = "X", values = zarray_x) |> + st_set_dimensions(2, names = "Y", values = zarray_y) |> + setNames(var) + st_crs(z) = st_crs(zarr_crs) + return(z) +} +``` + + +## Read and combine multiple zarrays + +✅ Now we can read multiple bands, combine them to a single multi-band object, and visualize them in a static map... + +```{r} +system.time({ + b01 = st_read_zarray(zarr_url, "b01", 60) + b09 = st_read_zarray(zarr_url, "b09", 60) + (multi_band = c(b01, b09) |> merge()) +}) + +plot(multi_band, axes = TRUE) +``` + +... or by generating an interactive visualization: + +```{r} +#| echo: false +mapviewOptions(basemaps = c("OpenTopoMap","Esri.WorldImagery", + "CartoDB.Positron")) +``` + +```{r} +#| message: false +#| cache: true + +mapview(multi_band) +``` + + +## Benchmark + +Using `Rarr` to read remote Zarr stores is just one of several options. R packages `stars` and `terra` provide functionality, that uses GDAl's Zarr driver in the background. To see if there is a benefit to each of the methods, we can compare their execution time and memory allocation with a little benchmark. + +```{r} +# GDAL-compatible VSI URL +vsi_prefix = "ZARR:/vsicurl/" +vsi_url = paste0(vsi_prefix, dplyr::as_label(zarr_url)) + +# functions for reading a single Sentinel-2 band to memory +f_stars = function(){rs = read_stars(paste(vsi_url, band_variable, sep = ":"))} +f_mdim = function(){sm = read_mdim(vsi_url, band_variable, proxy = FALSE) |> + setNames(basename(band_variable))} +f_rast = function(){tr = terra::rast(vsi_url, band_variable) |> terra::toMemory()} +f_rarr = function(){rr = st_read_zarray(zarr_url, "b01", 60)} +``` + +```{r} +#| warning: false + +bench::mark(f_stars(), f_mdim(), f_rast(), f_rarr(), + check = F, iterations = 5) +``` + +The benchmark does not reveal a clear winner in terms of speed. However, `Rarr` seems to allocate more memory compared to `stars` and `terra`. + +## 💪 Now it is your turn + +- 🔭 **Task**: Supply more arguments to `st_read_zarray` and pass a subset to read only parts on an array. Visit `?Rarr::read_zarr_array` for more details. +- 💾 **Task**: Try writing zarrays to disk using `Rarr::write_zarr_array()`. +- 📖 Read the [Rarr documentation](https://huber-group-embl.github.io/Rarr/articles/Rarr.html). + +## Conclusion + +In this notebook we have demonstrated how to access remote Zarr stores in R using the `Rarr` package. We have seen that it is possible to read specific data arrays, but `read_zarr_array` lacks the ability of accessing anything other than the raw array. To combine it with a CRS or coordinates we need to leverage a spatial class like `stars` and manually set the relevant metadata. Further, the benchmark did not reveal a significant benefit regarding processing time or memory allocation over "traditional" GDAL-based data retrieval. + +## What's next? + +Further tests should evaluate wether data retrieved via `Rarr` versus GDAL are identical in terms of spatial extent and offset. + +Explore [this related notebook](./remote_Zarr_via_GDAL.html) which demonstrates how to access the same Zarr data using the GDAL Zarr driver via the `stars` and `terra` packages. + + diff --git a/05_zarr_tools/55_zarr_R_gdal.qmd b/05_zarr_tools/55_zarr_R_gdal.qmd new file mode 100644 index 0000000..6301f13 --- /dev/null +++ b/05_zarr_tools/55_zarr_R_gdal.qmd @@ -0,0 +1,221 @@ +--- +title: "Accessing EOPF Sentinel Zarr via GDAL in `stars` and `terra`" +format: html +execute: + cache: refresh +--- + +### Introduction + +This notebook demonstrates how to **retrieve** remotely stored Zarr data using the Zarr GDAL driver in R. We will explore how to **read and visualize** Zarr data (zarrays) and their metadata using the `stars`, `sf` and `terra` packages as well as their current limitations. + +::: {.callout-tip} +This notebook has a sibling, which demonstrates how to access the same Zarr data using the `Rarr` package. You can find it [here](./remote_Zarr_via_Rarr.html). +::: + +### What we will learn + +- ✏️ How to edit URLs of Zarr stores to make them readable for GDAL +- 🔎 Which read-functions and arguments to use in `stars` and `terra` +- 🚧 Current limitations of these packages + +### Prerequisites + +To follow this notebook, you will need to load a STAC item in Zarr format from the [EOPF Zarr STAC Catalog](https://stac.browser.user.eopf.eodc.eu/?.language=en). If you are new to STAC inside the R environment, we suggest to follow the [Access the EOPF Zarr STAC API with R tutorial](https://eopf-toolkit.github.io/eopf-101/51_eopf_stac_r.html). The item we are using for this example mirrors the one used in the [Sentinel-2 L1C MSI Zarr Product Exploration notebook](https://eopf-sample-service.github.io/eopf-sample-notebooks/sentinel-2-l1c-msi-zarr-product-exploration/#introduction) which is hosted on EODC's object storage and is accessible under the URL below. + +```{r} +zarr_url = "https://objects.eodc.eu/e05ab01a9d56408d82ac32d69a5aae2a:sample-data/tutorial_data/cpm_v253/S2B_MSIL1C_20250113T103309_N0511_R108_T32TLQ_20250113T122458.zarr" +``` + +### Import packages + +```{r} +#| message: false +#| warning: false + +library(stars) # reading; imports sf too +library(terra) # reading +library(jsonlite) # parsing metadata +library(mapview) # interactive maps +``` + + +## GDAL's Zarr driver + +Common R-packages for raster data like `stars` and `terra` can leverage GDAL's raster drivers to read data directly from remote locations. This is done using GDAL's virtual file system (VSI) capabilities, specifically the `/vsicurl/` prefix for accessing files over HTTP(S). + +For Zarr data GDAL provides a dedicated Zarr driver that requires an additional `ZARR:`-prefix to the VSI URL. Finally, the complete prefix needs to be quoted: `"ZARR:/vsicurl/"`. + +```{r} +vsi_prefix = "ZARR:/vsicurl/" +vsi_url = paste0(vsi_prefix, dplyr::as_label(zarr_url)) # needs special quoting +print(vsi_url) +``` + +## Metadata exploration with GDAL + +With the VSI URL pointing to the remote Zarr store we can now use GDAL utilities to explore the metadata of the Zarr store. The `sf::gdal_utils` function provides an interface to GDAL command-line utilities. We use the `mdiminfo` command to retrieve metadata about the multidimensional arrays (zarrays) contained within the Zarr store. We only print the top-level names of the arrays here. + +```{r} +metadata = sf::gdal_utils("mdiminfo", source = vsi_url, quiet = T) |> + fromJSON() +names(metadata) +``` + +::: {.callout-tip collapse="true"} +## Expand for full GDAL MDIM metadata output +```{r} +#| echo: false + +print(metadata) +``` +::: + + +## Reading Zarr data with `stars` and `terra` + +::: {.callout-important} +In this section we show the current capabilities and limitations of `stars` and `terra` when it comes to reading Zarr array(s). We test the functions `read_stars` and `read_mdim` from `stars` and `rast` from the `terra`package. +Successful attempts are marked with a ✅, failed attempts with a ⛔. +::: + +First we define the path to a specific data array (band 1, 60 meter resolution) inside the Zarr store which can be understood as a single raster layer. + +```{r, error=T} +band_variable = "/measurements/reflectance/r60m/b01" +``` + +### `stars::read_stars()` + +⛔ Traditional approach for reading all bands/layers and specifying a driver. +```{r} +#| error: true + +r = read_stars(vsi_url, driver = "ZARR") +``` + +⛔ Traditional approach for reading all bands/layers and specifying a sub-dataset (integer). +```{r} +#| error: true + +r = read_stars(vsi_url, sub = 1) +``` + +⛔ Traditional approach for reading all bands/layers and specifying a sub-dataset (path). +```{r} +#| error: true + +r = read_stars(vsi_url, sub = band_variable) +``` + +✅ Constructing the full Zarray path from prefix, URL, and band variable. + +```{r} +(r = read_stars(paste(vsi_url, band_variable, sep = ":"))) +st_crs(r) # NA: empty +``` + +This method successfully reads the specified band from the remote Zarr store. + +:::{.callout-warning} +Note that the coordinate reference system (CRS) is not automatically recognized and needs to be set manually. We can look it up in the metadata where it is listed as one of the STAC properties. +::: + +```{r} +(crs = metadata$attributes$stac_discovery$properties$`proj:epsg`) +st_crs(r) = crs +``` + +Our data is in CRS _WGS 84 / UTM zone 32N (EPSG:32632)_. +We can now safely plot it as a static map... + +```{r} +system.time(plot(r, axes = TRUE)) +``` + +... or is an interactive visualization: + +```{r} +#| message: false +#| echo: false + +mapviewOptions(basemaps = c("OpenTopoMap","Esri.WorldImagery", + "CartoDB.Positron")) +``` + +```{r} +mapview(r) +``` + +--- + +### `stars::read_mdim()` + +⛔ Traditional approach for reading all bands/layers at once. "?" should return a list of array possible names. +```{r} +#| error: true + +m = read_mdim(vsi_url) +m = read_mdim(vsi_url, variable = "?") # query array names +``` + + +✅ Specifying the full Zarray path from prefix, URL, and band variable. Read as proxy object. +```{r} +system.time({ + (m = read_mdim(vsi_url, variable = band_variable, proxy = TRUE)) + }) # fast: only reads metadata + +st_crs(m) # NA: empty +system.time(plot(m, axes = T)) # slow: only here the full array is downloaded +``` + +As for `read_stars()`, the CRS is not automatically recognized and needs to be set manually. + +--- + +### `terra::rast()` + +✅ / ⛔ Traditional approach for reading all bands/layers at once. + +```{r, error=T} +(tr = terra::rast(vsi_url)) +tr$b02 + +names(tr) # has non-unique names, e.g. three times "b02" +``` + +`terra` reads some of the available bands, but fails to name them uniquely. It likely searches for band names like "b02" and finds the first group of data arrays by order, in this case the 10-meter bands (2, 3, 4, and 8). The Zarr store contains multiple array named by band such as "/measurements/reflectance/r10m/b02", their corresponding masks (e.g. "quality/mask/r10m/b02) and the detector footprint (e.g. "/conditions/mask/detector_footprint/r10m/b02"). All Zarrays share the same name, "b02", leading to non-unique layer names in the resulting `SpatRaster` object. When constructing unique names we can at least visualize the data. + +```{r} +# rename layers to (not meaningful) unique names +names(tr) = paste0(letters[1:nlyr(tr)], names(tr)) +names(tr) + +plot(tr) +``` + +✅ Specifying the full Zarray path from prefix, URL, and band variable as sub-dataset. Read as proxy object. +```{r} +system.time({ + tr = rast(vsi_url, subds = band_variable) + })# fast: only reads metadata + +crs(tr) # "": empty +system.time(plot(tr)) # slow: only here the full array is downloaded +``` + +As for the `stars`-functions, the CRS is not automatically recognized and needs to be set manually. + +## 💪 Now it is your turn + +- 🔍 **Task**: Explore the metadata of other Zarr stores using GDAL's `mdiminfo` command via `sf::gdal_utils()`. Store the JSON output in an R-object and find entries that store relevant metadata, such as the CRS, bounding box, or sensor-related information. +- 📖 Read the [GDAL Zarr driver documentation](https://gdal.org/en/stable/drivers/raster/zarr.html) to learn more about opening options. + +## Conclusion + +In this notebook we have demonstrated how to access remote Zarr stores using GDAL's Zarr driver in R with the `stars` and `terra` packages. We have seen that while it is possible to read specific data arrays, there are some limitations, such as the lack of automatic CRS recognition and challenges with reading multiple arrays simultaneously. + +## What's next? + +Explore [this related notebook](./remote_Zarr_via_Rarr.html) which demonstrates how to access the same Zarr data using the `Rarr` package and building a `stars` object from it. From 34b1828a286e7beed1fb889de92d794c10002056 Mon Sep 17 00:00:00 2001 From: joheisig Date: Wed, 10 Dec 2025 11:00:40 +0100 Subject: [PATCH 3/6] integrate with eopf github workflow --- 05_zarr_tools/53_eopf_r_rarr.qmd | 278 ------------------------------- 05_zarr_tools/54_eopf_r_gdal.qmd | 221 ------------------------ _quarto.yml | 6 +- deployment/Dockerfile_r | 2 + 4 files changed, 5 insertions(+), 502 deletions(-) delete mode 100644 05_zarr_tools/53_eopf_r_rarr.qmd delete mode 100644 05_zarr_tools/54_eopf_r_gdal.qmd diff --git a/05_zarr_tools/53_eopf_r_rarr.qmd b/05_zarr_tools/53_eopf_r_rarr.qmd deleted file mode 100644 index 5cce9ba..0000000 --- a/05_zarr_tools/53_eopf_r_rarr.qmd +++ /dev/null @@ -1,278 +0,0 @@ ---- -title: "Accessing EOPF Sentinel Zarr using `Rarr`" -format: html -execute: - cache: refresh ---- - -### Introduction - -This notebook demonstrates how to **retrieve** remotely stored Zarr data using the `Rarr` package in R. We will explore how to **read and visualize** Zarr data (zarrays) and their metadata. For subsequent analysis and visualization we turn the zarray into `stars` objects which are more suitable for spatial data. - -::: {.callout-tip} -This notebook has a sibling, which demonstrates how to access the same data using the GDAL Zarr driver. You can find it [here](./remote_Zarr_via_GDAL.html). -::: - -### What we will learn - -- ✏️ How to edit URLs of Zarr stores to make them readable for GDAL -- 🔎 Which read-functions and arguments to use in `stars` and `terra` -- 🚧 Current limitations of these packages - -### Prerequisites - -To follow this notebook, you will need to load a STAC item in Zarr format from the [EOPF Zarr STAC Catalog](https://stac.browser.user.eopf.eodc.eu/?.language=en). If you are new to STAC inside the R environment, we suggest to follow the [Access the EOPF Zarr STAC API with R tutorial](https://eopf-toolkit.github.io/eopf-101/51_eopf_stac_r.html). The item we are using for this example mirrors the one used in the [Sentinel-2 L1C MSI Zarr Product Exploration notebook](https://eopf-sample-service.github.io/eopf-sample-notebooks/sentinel-2-l1c-msi-zarr-product-exploration/#introduction) which is hosted on EODC's object storage and is accessible under the URL below. - -```{r} -zarr_url = "https://objects.eodc.eu/e05ab01a9d56408d82ac32d69a5aae2a:sample-data/tutorial_data/cpm_v253/S2B_MSIL1C_20250113T103309_N0511_R108_T32TLQ_20250113T122458.zarr" -``` - -### Import packages - -```{r} -#| message: false -#| warning: false - -library(Rarr) # reading; BiocManager::install("Rarr") -library(stars) # spatial objects -library(dplyr) # table operations -library(jsonlite) # parsing metadata -library(mapview) # interactive maps -``` - -## Metadata exploration with Rarr - -::: {.callout-important} -In this notebook we show the current capabilities and limitations of `Rarr` in terms of reading Zarr array(s) and their metadata. Successful attempts are marked with a ✅, failed attempts with a ⛔. -::: - -Using the `Rarr::zarr_overview` function we can get meta data for all assets in the remote Zarr store as a table. Here list all arrays and their data type, compression, dimensions, etc. - -```{r} -zarr_content = zarr_overview(zarr_url, as_data_frame = T) |> - mutate(var = sub(zarr_url, "", path)) |> - select(var, everything(), -path) - -filter(zarr_content, grepl("reflectance", var)) -``` - - We can see that all Sentinel-2 bands are present, but split into directories (or groups) based on spatial resolution (10, 20, 60 meters), which have different dimensions (number and size of pixels). - -::: {.callout-tip collapse="true"} -## Expand for full Zarr metadata output - -```{r} -#| echo: false - -print(zarr_content) -``` -::: - -Now that we have explored the content, let's get some related metadata. `Rarr` provides the `read_zattrs` function for this purpose, which looks for the `.zattrs` file inside the Zarr store. - -⛔ Unfortunately we were not successful when applying it to our remote store. -```{r} -#| error: true - -read_zattrs(zarr_url) -read_zattrs(file.path(zarr_url, ".zattrs")) -``` - -✅ As a simple workaround we can read the JSON directly: -```{r} -zattrs = read_json(file.path(zarr_url, ".zattrs")) -names(zattrs) -``` - -::: {.callout-tip collapse="true"} -## Expand for full Zarr metadata output - -```{r} -#| echo: false - -print(zarr_content) -``` -::: - -The same information can be retrieved with `sf::gdal_utils('mdiminfo', ...)`. - -## Reading Zarr data with `Rarr` - -⛔ `Rarr`, like `stars` and `terra`, is not capable of reading multiple arrays from the remote store simultaneously. - -```{r} -#| error: true - -read_zarr_array(zarr_url) -``` - -✅ But if a single band array is selected we are able to access the desired information. - -```{r} -band_variable = "/measurements/reflectance/r60m/b01" - -zarray = read_zarr_array(paste0(zarr_url, band_variable)) -str(zarray) -``` - -However, the function returns a matrix (2D array) lacking any spatial metadata (e.g. Coordinate Reference System) or coordinates. - -::: {.callout-warning} -## Caution - -The raw array has flipped dimensions! But we can access X and Y coordinates for each pixel separately and handle the issue. -::: - -```{r} -zarray_60m_x = paste0(zarr_url, "/measurements/reflectance/r60m/x") |> - read_zarr_array() -zarray_60m_y = paste0(zarr_url, "/measurements/reflectance/r60m/y") |> - read_zarr_array() - -print(range(zarray_60m_x)) -print(range(zarray_60m_y)) -``` - -Let's construct a `stars` object from the components found in the Zarr store: -- data array -- X and Y coordinates -- CRS (from metadata) - -```{r} -# read CRS from attributes -zarr_crs = zattrs$stac_discovery$properties$`proj:epsg` - -# band name -var = basename(band_variable) - -# read and transpose matrix before converting to stars -zz = st_as_stars(zarray |> t()) |> - st_set_dimensions(1, names = "X", values = zarray_60m_x) |> - st_set_dimensions(2, names = "Y", values = zarray_60m_y) |> - setNames(var) -st_crs(zz) = st_crs(zarr_crs) - -# display the object -zz - -# plot as a map -plot(zz, axes = TRUE) -``` - -The map shows Sentinel-2 band 1 (coastal aerosol band) reflectance at 60m spatial resolution for north-western Italy with the Alps in the West and the city of Torino in the North-East. - - -We can enclose these steps in a single function with some flexibility for variable names and resolutions. This allows us to retireve any band at any available resolution from the remote Zarr store. - -```{r} -st_read_zarray = function(path, var, res, ...){ - - # room for checks and input tests: - # stopifnot valid zarr url / variable / resolution ... - - # get metadata including CRS - zattrs = jsonlite::read_json(file.path(path, ".zattrs")) - zarr_crs = zattrs$stac_discovery$properties$`proj:epsg` - - # zattrs$stac_discovery$properties$`eopf:resolutions` - res_char = switch(as.character(res), - "10" = "r10m", - "20" = "r20m", - "60" = "r60m", - stop("Resolution must be one of 10, 20, or 60.") - ) - - # ...: make use of index arguments of read_zarr_array if desired - zarray = file.path(path, "measurements/reflectance", res_char, var) |> - read_zarr_array(...) - zarray_x = file.path(path, "measurements/reflectance", res_char, "x") |> - read_zarr_array(...) - zarray_y = file.path(path, "measurements/reflectance", res_char, "y") |> - read_zarr_array(...) - - # transpose matrixe before converting to stars - z = zarray |> - t() |> - st_as_stars() |> - st_set_dimensions(1, names = "X", values = zarray_x) |> - st_set_dimensions(2, names = "Y", values = zarray_y) |> - setNames(var) - st_crs(z) = st_crs(zarr_crs) - return(z) -} -``` - - -## Read and combine multiple zarrays - -✅ Now we can read multiple bands, combine them to a single multi-band object, and visualize them in a static map... - -```{r} -system.time({ - b01 = st_read_zarray(zarr_url, "b01", 60) - b09 = st_read_zarray(zarr_url, "b09", 60) - (multi_band = c(b01, b09) |> merge()) -}) - -plot(multi_band, axes = TRUE) -``` - -... or by generating an interactive visualization: - -```{r} -#| echo: false -mapviewOptions(basemaps = c("OpenTopoMap","Esri.WorldImagery", - "CartoDB.Positron")) -``` - -```{r} -#| message: false -#| cache: true - -mapview(multi_band) -``` - - -## Benchmark - -Using `Rarr` to read remote Zarr stores is just one of several options. R packages `stars` and `terra` provide functionality, that uses GDAl's Zarr driver in the background. To see if there is a benefit to each of the methods, we can compare their execution time and memory allocation with a little benchmark. - -```{r} -# GDAL-compatible VSI URL -vsi_prefix = "ZARR:/vsicurl/" -vsi_url = paste0(vsi_prefix, dplyr::as_label(zarr_url)) - -# functions for reading a single Sentinel-2 band to memory -f_stars = function(){rs = read_stars(paste(vsi_url, band_variable, sep = ":"))} -f_mdim = function(){sm = read_mdim(vsi_url, band_variable, proxy = FALSE) |> - setNames(basename(band_variable))} -f_rast = function(){tr = terra::rast(vsi_url, band_variable) |> terra::toMemory()} -f_rarr = function(){rr = st_read_zarray(zarr_url, "b01", 60)} -``` - -```{r} -#| warning: false - -bench::mark(f_stars(), f_mdim(), f_rast(), f_rarr(), - check = F, iterations = 5) -``` - -The benchmark does not reveal a clear winner in terms of speed. However, `Rarr` seems to allocate more memory compared to `stars` and `terra`. - -## 💪 Now it is your turn - -- 🔭 **Task**: Supply more arguments to `st_read_zarray` and pass a subset to read only parts on an array. Visit `?Rarr::read_zarr_array` for more details. -- 💾 **Task**: Try writing zarrays to disk using `Rarr::write_zarr_array()`. -- 📖 Read the [Rarr documentation](https://huber-group-embl.github.io/Rarr/articles/Rarr.html). - -## Conclusion - -In this notebook we have demonstrated how to access remote Zarr stores in R using the `Rarr` package. We have seen that it is possible to read specific data arrays, but `read_zarr_array` lacks the ability of accessing anything other than the raw array. To combine it with a CRS or coordinates we need to leverage a spatial class like `stars` and manually set the relevant metadata. Further, the benchmark did not reveal a significant benefit regarding processing time or memory allocation over "traditional" GDAL-based data retrieval. - -## What's next? - -Further tests should evaluate wether data retrieved via `Rarr` versus GDAL are identical in terms of spatial extent and offset. - -Explore [this related notebook](./remote_Zarr_via_GDAL.html) which demonstrates how to access the same Zarr data using the GDAL Zarr driver via the `stars` and `terra` packages. - - diff --git a/05_zarr_tools/54_eopf_r_gdal.qmd b/05_zarr_tools/54_eopf_r_gdal.qmd deleted file mode 100644 index 6301f13..0000000 --- a/05_zarr_tools/54_eopf_r_gdal.qmd +++ /dev/null @@ -1,221 +0,0 @@ ---- -title: "Accessing EOPF Sentinel Zarr via GDAL in `stars` and `terra`" -format: html -execute: - cache: refresh ---- - -### Introduction - -This notebook demonstrates how to **retrieve** remotely stored Zarr data using the Zarr GDAL driver in R. We will explore how to **read and visualize** Zarr data (zarrays) and their metadata using the `stars`, `sf` and `terra` packages as well as their current limitations. - -::: {.callout-tip} -This notebook has a sibling, which demonstrates how to access the same Zarr data using the `Rarr` package. You can find it [here](./remote_Zarr_via_Rarr.html). -::: - -### What we will learn - -- ✏️ How to edit URLs of Zarr stores to make them readable for GDAL -- 🔎 Which read-functions and arguments to use in `stars` and `terra` -- 🚧 Current limitations of these packages - -### Prerequisites - -To follow this notebook, you will need to load a STAC item in Zarr format from the [EOPF Zarr STAC Catalog](https://stac.browser.user.eopf.eodc.eu/?.language=en). If you are new to STAC inside the R environment, we suggest to follow the [Access the EOPF Zarr STAC API with R tutorial](https://eopf-toolkit.github.io/eopf-101/51_eopf_stac_r.html). The item we are using for this example mirrors the one used in the [Sentinel-2 L1C MSI Zarr Product Exploration notebook](https://eopf-sample-service.github.io/eopf-sample-notebooks/sentinel-2-l1c-msi-zarr-product-exploration/#introduction) which is hosted on EODC's object storage and is accessible under the URL below. - -```{r} -zarr_url = "https://objects.eodc.eu/e05ab01a9d56408d82ac32d69a5aae2a:sample-data/tutorial_data/cpm_v253/S2B_MSIL1C_20250113T103309_N0511_R108_T32TLQ_20250113T122458.zarr" -``` - -### Import packages - -```{r} -#| message: false -#| warning: false - -library(stars) # reading; imports sf too -library(terra) # reading -library(jsonlite) # parsing metadata -library(mapview) # interactive maps -``` - - -## GDAL's Zarr driver - -Common R-packages for raster data like `stars` and `terra` can leverage GDAL's raster drivers to read data directly from remote locations. This is done using GDAL's virtual file system (VSI) capabilities, specifically the `/vsicurl/` prefix for accessing files over HTTP(S). - -For Zarr data GDAL provides a dedicated Zarr driver that requires an additional `ZARR:`-prefix to the VSI URL. Finally, the complete prefix needs to be quoted: `"ZARR:/vsicurl/"`. - -```{r} -vsi_prefix = "ZARR:/vsicurl/" -vsi_url = paste0(vsi_prefix, dplyr::as_label(zarr_url)) # needs special quoting -print(vsi_url) -``` - -## Metadata exploration with GDAL - -With the VSI URL pointing to the remote Zarr store we can now use GDAL utilities to explore the metadata of the Zarr store. The `sf::gdal_utils` function provides an interface to GDAL command-line utilities. We use the `mdiminfo` command to retrieve metadata about the multidimensional arrays (zarrays) contained within the Zarr store. We only print the top-level names of the arrays here. - -```{r} -metadata = sf::gdal_utils("mdiminfo", source = vsi_url, quiet = T) |> - fromJSON() -names(metadata) -``` - -::: {.callout-tip collapse="true"} -## Expand for full GDAL MDIM metadata output -```{r} -#| echo: false - -print(metadata) -``` -::: - - -## Reading Zarr data with `stars` and `terra` - -::: {.callout-important} -In this section we show the current capabilities and limitations of `stars` and `terra` when it comes to reading Zarr array(s). We test the functions `read_stars` and `read_mdim` from `stars` and `rast` from the `terra`package. -Successful attempts are marked with a ✅, failed attempts with a ⛔. -::: - -First we define the path to a specific data array (band 1, 60 meter resolution) inside the Zarr store which can be understood as a single raster layer. - -```{r, error=T} -band_variable = "/measurements/reflectance/r60m/b01" -``` - -### `stars::read_stars()` - -⛔ Traditional approach for reading all bands/layers and specifying a driver. -```{r} -#| error: true - -r = read_stars(vsi_url, driver = "ZARR") -``` - -⛔ Traditional approach for reading all bands/layers and specifying a sub-dataset (integer). -```{r} -#| error: true - -r = read_stars(vsi_url, sub = 1) -``` - -⛔ Traditional approach for reading all bands/layers and specifying a sub-dataset (path). -```{r} -#| error: true - -r = read_stars(vsi_url, sub = band_variable) -``` - -✅ Constructing the full Zarray path from prefix, URL, and band variable. - -```{r} -(r = read_stars(paste(vsi_url, band_variable, sep = ":"))) -st_crs(r) # NA: empty -``` - -This method successfully reads the specified band from the remote Zarr store. - -:::{.callout-warning} -Note that the coordinate reference system (CRS) is not automatically recognized and needs to be set manually. We can look it up in the metadata where it is listed as one of the STAC properties. -::: - -```{r} -(crs = metadata$attributes$stac_discovery$properties$`proj:epsg`) -st_crs(r) = crs -``` - -Our data is in CRS _WGS 84 / UTM zone 32N (EPSG:32632)_. -We can now safely plot it as a static map... - -```{r} -system.time(plot(r, axes = TRUE)) -``` - -... or is an interactive visualization: - -```{r} -#| message: false -#| echo: false - -mapviewOptions(basemaps = c("OpenTopoMap","Esri.WorldImagery", - "CartoDB.Positron")) -``` - -```{r} -mapview(r) -``` - ---- - -### `stars::read_mdim()` - -⛔ Traditional approach for reading all bands/layers at once. "?" should return a list of array possible names. -```{r} -#| error: true - -m = read_mdim(vsi_url) -m = read_mdim(vsi_url, variable = "?") # query array names -``` - - -✅ Specifying the full Zarray path from prefix, URL, and band variable. Read as proxy object. -```{r} -system.time({ - (m = read_mdim(vsi_url, variable = band_variable, proxy = TRUE)) - }) # fast: only reads metadata - -st_crs(m) # NA: empty -system.time(plot(m, axes = T)) # slow: only here the full array is downloaded -``` - -As for `read_stars()`, the CRS is not automatically recognized and needs to be set manually. - ---- - -### `terra::rast()` - -✅ / ⛔ Traditional approach for reading all bands/layers at once. - -```{r, error=T} -(tr = terra::rast(vsi_url)) -tr$b02 - -names(tr) # has non-unique names, e.g. three times "b02" -``` - -`terra` reads some of the available bands, but fails to name them uniquely. It likely searches for band names like "b02" and finds the first group of data arrays by order, in this case the 10-meter bands (2, 3, 4, and 8). The Zarr store contains multiple array named by band such as "/measurements/reflectance/r10m/b02", their corresponding masks (e.g. "quality/mask/r10m/b02) and the detector footprint (e.g. "/conditions/mask/detector_footprint/r10m/b02"). All Zarrays share the same name, "b02", leading to non-unique layer names in the resulting `SpatRaster` object. When constructing unique names we can at least visualize the data. - -```{r} -# rename layers to (not meaningful) unique names -names(tr) = paste0(letters[1:nlyr(tr)], names(tr)) -names(tr) - -plot(tr) -``` - -✅ Specifying the full Zarray path from prefix, URL, and band variable as sub-dataset. Read as proxy object. -```{r} -system.time({ - tr = rast(vsi_url, subds = band_variable) - })# fast: only reads metadata - -crs(tr) # "": empty -system.time(plot(tr)) # slow: only here the full array is downloaded -``` - -As for the `stars`-functions, the CRS is not automatically recognized and needs to be set manually. - -## 💪 Now it is your turn - -- 🔍 **Task**: Explore the metadata of other Zarr stores using GDAL's `mdiminfo` command via `sf::gdal_utils()`. Store the JSON output in an R-object and find entries that store relevant metadata, such as the CRS, bounding box, or sensor-related information. -- 📖 Read the [GDAL Zarr driver documentation](https://gdal.org/en/stable/drivers/raster/zarr.html) to learn more about opening options. - -## Conclusion - -In this notebook we have demonstrated how to access remote Zarr stores using GDAL's Zarr driver in R with the `stars` and `terra` packages. We have seen that while it is possible to read specific data arrays, there are some limitations, such as the lack of automatic CRS recognition and challenges with reading multiple arrays simultaneously. - -## What's next? - -Explore [this related notebook](./remote_Zarr_via_Rarr.html) which demonstrates how to access the same Zarr data using the `Rarr` package and building a `stars` object from it. diff --git a/_quarto.yml b/_quarto.yml index 005ed03..cdb4643 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -47,8 +47,8 @@ book: - 05_zarr_tools/51_eopf_stac_r.qmd - 05_zarr_tools/52_eopf_stac_qgis.qmd # - 05_zarr_tools/53_zarr_R.qmd - # - 05_zarr_tools/54_zarr_R_rarr.qmd - # - 05_zarr_tools/55_zarr_R_gdal.qmd + - 05_zarr_tools/54_zarr_R_rarr.qmd + - 05_zarr_tools/55_zarr_R_gdal.qmd # - 05_zarr_tools/56_titiler.qmd - part: "**EOPF Zarr in Action**" chapters: @@ -75,4 +75,4 @@ format: theme: - cosmo - brand - include-in-header: "plausible.html" \ No newline at end of file + include-in-header: "plausible.html" diff --git a/deployment/Dockerfile_r b/deployment/Dockerfile_r index f34678e..22b0b03 100644 --- a/deployment/Dockerfile_r +++ b/deployment/Dockerfile_r @@ -6,6 +6,8 @@ RUN mamba install --yes \ 'r-tidyverse' \ 'r-stars' \ 'r-terra' \ + 'r-mapview' \ + 'r-jsonlite' \ 'r-httr2' \ 'r-fs' \ 'r-lobstr' \ From 821011c30d4c9d96597d4cd5833e05df6754cb5e Mon Sep 17 00:00:00 2001 From: joheisig Date: Tue, 16 Dec 2025 10:41:19 +0100 Subject: [PATCH 4/6] add paragraph on raw data scale and offset --- 05_zarr_tools/54_zarr_R_rarr.qmd | 182 +++++++++++++++++++++++++------ 1 file changed, 148 insertions(+), 34 deletions(-) diff --git a/05_zarr_tools/54_zarr_R_rarr.qmd b/05_zarr_tools/54_zarr_R_rarr.qmd index 5cce9ba..71da0cd 100644 --- a/05_zarr_tools/54_zarr_R_rarr.qmd +++ b/05_zarr_tools/54_zarr_R_rarr.qmd @@ -1,5 +1,5 @@ --- -title: "Accessing EOPF Sentinel Zarr using `Rarr`" +title: "Access EOPF Sentinel Zarr using `Rarr`" format: html execute: cache: refresh @@ -10,7 +10,7 @@ execute: This notebook demonstrates how to **retrieve** remotely stored Zarr data using the `Rarr` package in R. We will explore how to **read and visualize** Zarr data (zarrays) and their metadata. For subsequent analysis and visualization we turn the zarray into `stars` objects which are more suitable for spatial data. ::: {.callout-tip} -This notebook has a sibling, which demonstrates how to access the same data using the GDAL Zarr driver. You can find it [here](./remote_Zarr_via_GDAL.html). +This notebook has a sibling, which demonstrates how to access the same data using the GDAL Zarr driver. You can find it [here](./55_zarr_R_gdal.qmd). ::: ### What we will learn @@ -98,7 +98,30 @@ The same information can be retrieved with `sf::gdal_utils('mdiminfo', ...)`. ## Reading Zarr data with `Rarr` -⛔ `Rarr`, like `stars` and `terra`, is not capable of reading multiple arrays from the remote store simultaneously. +Reading zarrays from a remote Zarr store with `Rarr` and turning them into useful spatial R objects is not as straightforward as with `GDAL`-based `stars` and `terra` functions. We will demonstrate the current capabilities and limitations of `Rarr` by using a reference object retrieved through the `GDAL` Zarr driver which is loaded in the background. + +```{r} +#| echo: false +#| warning: false + +# parse GDAL-compatible VSI URL +vsi_url = paste0("ZARR:/vsicurl/", dplyr::as_label(zarr_url)) + +# read from remote and set CRS +band_variable = "/measurements/reflectance/r60m/b01" +zz_gdal = read_stars(paste(vsi_url, band_variable, sep = ":")) +st_crs(zz_gdal) = st_crs(32632) +``` + +```{r} +# GDAL-based reference object +zz_gdal +``` + +We start with an naive attempt to read the entire Zarr store using `Rarr::read_zarr_array`. + + +⛔ Rarr, like `stars` and `terra`, is not capable of reading multiple arrays from the remote Zarr store simultaneously. ```{r} #| error: true @@ -115,57 +138,147 @@ zarray = read_zarr_array(paste0(zarr_url, band_variable)) str(zarray) ``` -However, the function returns a matrix (2D array) lacking any spatial metadata (e.g. Coordinate Reference System) or coordinates. +### Raster geometry + +The function returns a matrix (2D array) lacking any spatial metadata (e.g. Coordinate Reference System) or map coordinates. Let's plot it together with the reference object for comparison. + +```{r} +#| fig-subcap: +#| - "Rarr (raw zarray)" +#| - "GDAL (directly retrieved stars object)" +#| layout-ncol: 2 +#| fig-height: 7 + +image(zarray) +image(zz_gdal$b01) +``` + ::: {.callout-warning} ## Caution -The raw array has flipped dimensions! But we can access X and Y coordinates for each pixel separately and handle the issue. +The raw zarray has flipped X- and Y-dimensions! But we can access X and Y coordinates for each pixel separately and handle the issue by transposing them later (using `t()`). ::: ```{r} +# retrieve X and Y coordinates zarray_60m_x = paste0(zarr_url, "/measurements/reflectance/r60m/x") |> read_zarr_array() zarray_60m_y = paste0(zarr_url, "/measurements/reflectance/r60m/y") |> read_zarr_array() -print(range(zarray_60m_x)) -print(range(zarray_60m_y)) +range(zarray_60m_x) +range(zarray_60m_y) +``` + +Regardless of their ordering, are the coordinate values consistent with those retrieved via GDAL? + +```{r} +range(st_get_dimension_values(zz_gdal, "x")) +range(st_get_dimension_values(zz_gdal, "y")) ``` -Let's construct a `stars` object from the components found in the Zarr store: +✅ Yes! + +### Reflectance values + +Before building the spatial object we should also check if the actual data values are consistent between both methods. + +```{r} +head(t(zarray)[1,]) # flipped zarray +head(zz_gdal$b01[1,]) # reference arrray +``` + +::: {.callout-warning} +## Caution + +By comapring the first few values of each array we can observe 2 important differences: + + - `zarray` contains integer values while `zz_gdal` has floating point values scaled with factor 10,000. + - after rescaling, min and max for example still differ (0.1299 vs. 0.2299 / 1.6556 vs. 1.7556), indicating an offset of 0.1. + +💡 **Explanation:** *The data provider applies scale and offset to store more memory-efficient integer values. A scale factor of 10,000 preserves 4 decimals, while an offset of 0.1 avoids the multiplication of 0. When reading raw data like that we need to apply scale and offset manually to get the original reflectance values.* +::: + +Scale and offset parameters are important metadata, but currently not accessible via `Rarr`. At this point we can only find it by digging deep into the metadata retrieved by `GDAL`: + +```{r} +# GDAL-comaptible VSI URL +vsi_url = paste0("ZARR:/vsicurl/", dplyr::as_label(zarr_url)) + +# get entire metadata +info = sf::gdal_utils("mdiminfo", source = vsi_url, quiet = T) |> fromJSON() + +# access scale and offset for 60 meter reflectance bands +(scale = info$groups$measurements$groups$reflectance$groups$r60m$arrays$b01$scale) +(offset = info$groups$measurements$groups$reflectance$groups$r60m$arrays$b01$offset) +``` + +### Constructing a `stars` object from zarray components + +Let's combine all pieces of the puzzle to form a spatial object: + - data array +- scale and offset - X and Y coordinates -- CRS (from metadata) +- CRS ```{r} -# read CRS from attributes +# read CRS from Zarr attributes zarr_crs = zattrs$stac_discovery$properties$`proj:epsg` # band name var = basename(band_variable) -# read and transpose matrix before converting to stars -zz = st_as_stars(zarray |> t()) |> - st_set_dimensions(1, names = "X", values = zarray_60m_x) |> - st_set_dimensions(2, names = "Y", values = zarray_60m_y) |> +# read and transpose data matrix +# apply scale and offset +# convert to stars +zz = st_as_stars(t(zarray) * scale + offset) |> + st_set_dimensions(1, names = "x", values = zarray_60m_x, is_raster = TRUE) |> + st_set_dimensions(2, names = "y", values = zarray_60m_y, is_raster = TRUE) |> setNames(var) + +# assign CRS st_crs(zz) = st_crs(zarr_crs) -# display the object +# print the object zz +``` -# plot as a map +We can compare the newly constructed object with reference read via GDAL. They should have identical geometry and values. + +```{r} +identical(zz[[1]] |> str(), + zz_gdal[[1]] |> str()) +``` + +Looks good! Let's visualize both objects. + +```{r} +#| fig-subcap: +#| - "Rarr (constructed stars object)" +#| layout-ncol: 1 + +# plot as maps plot(zz, axes = TRUE) ``` -The map shows Sentinel-2 band 1 (coastal aerosol band) reflectance at 60m spatial resolution for north-western Italy with the Alps in the West and the city of Torino in the North-East. +```{r} +#| fig-subcap: +#| - "GDAL (directly retrieved stars object)" +#| layout-ncol: 1 +plot(zz_gdal, axes = TRUE) +``` + +The identical maps show Sentinel-2 band 1 (coastal aerosol band) reflectance at 60m spatial resolution for north-western Italy with the Alps in the West and the city of Torino in the North-East. -We can enclose these steps in a single function with some flexibility for variable names and resolutions. This allows us to retireve any band at any available resolution from the remote Zarr store. +### Wrapping it all up in a function + +We can enclose these steps in a single function with some flexibility for variable names and resolutions. This allows us to retrieve any band at any available resolution from the remote Zarr store. Since scale and offset are not accessible via `Rarr`, we need to provide them as function arguments. ```{r} -st_read_zarray = function(path, var, res, ...){ +st_read_zarray = function(path, var, res, scale = 1, offset = 0, ...){ # room for checks and input tests: # stopifnot valid zarr url / variable / resolution ... @@ -174,6 +287,7 @@ st_read_zarray = function(path, var, res, ...){ zattrs = jsonlite::read_json(file.path(path, ".zattrs")) zarr_crs = zattrs$stac_discovery$properties$`proj:epsg` + # selecte the correct spatial resolution # zattrs$stac_discovery$properties$`eopf:resolutions` res_char = switch(as.character(res), "10" = "r10m", @@ -182,6 +296,7 @@ st_read_zarray = function(path, var, res, ...){ stop("Resolution must be one of 10, 20, or 60.") ) + # get 2D data array and 1D coordinate arrays # ...: make use of index arguments of read_zarr_array if desired zarray = file.path(path, "measurements/reflectance", res_char, var) |> read_zarr_array(...) @@ -189,13 +304,12 @@ st_read_zarray = function(path, var, res, ...){ read_zarr_array(...) zarray_y = file.path(path, "measurements/reflectance", res_char, "y") |> read_zarr_array(...) - - # transpose matrixe before converting to stars - z = zarray |> - t() |> - st_as_stars() |> - st_set_dimensions(1, names = "X", values = zarray_x) |> - st_set_dimensions(2, names = "Y", values = zarray_y) |> + + # transpose matrix before converting to stars, apply scale and offset + # assign x and y coordinates and CRS + z = st_as_stars(t(zarray) * scale + offset) |> + st_set_dimensions(1, names = "x", values = zarray_60m_x, is_raster = TRUE) |> + st_set_dimensions(2, names = "y", values = zarray_60m_y, is_raster = TRUE) |> setNames(var) st_crs(z) = st_crs(zarr_crs) return(z) @@ -209,15 +323,16 @@ st_read_zarray = function(path, var, res, ...){ ```{r} system.time({ - b01 = st_read_zarray(zarr_url, "b01", 60) - b09 = st_read_zarray(zarr_url, "b09", 60) - (multi_band = c(b01, b09) |> merge()) + # scale: 0.0001 = 1 / 10,000 + b01 = st_read_zarray(zarr_url, "b01", 60, scale=0.0001, offset = 0.1) + b09 = st_read_zarray(zarr_url, "b09", 60, scale=0.0001, offset = 0.1) + multi_band = c(b01, b09) |> merge(name="band") |> setNames("reflectance") }) plot(multi_band, axes = TRUE) ``` -... or by generating an interactive visualization: +... or by generating an interactive map: ```{r} #| echo: false @@ -252,12 +367,13 @@ f_rarr = function(){rr = st_read_zarray(zarr_url, "b01", 60)} ```{r} #| warning: false +#| cache: true bench::mark(f_stars(), f_mdim(), f_rast(), f_rarr(), check = F, iterations = 5) ``` -The benchmark does not reveal a clear winner in terms of speed. However, `Rarr` seems to allocate more memory compared to `stars` and `terra`. +The benchmark does not reveal a clear winner in terms of speed. However, `Rarr` seems not to be faster than `GDAL` but allocates more memory compared to `stars` and `terra`. ## 💪 Now it is your turn @@ -273,6 +389,4 @@ In this notebook we have demonstrated how to access remote Zarr stores in R usin Further tests should evaluate wether data retrieved via `Rarr` versus GDAL are identical in terms of spatial extent and offset. -Explore [this related notebook](./remote_Zarr_via_GDAL.html) which demonstrates how to access the same Zarr data using the GDAL Zarr driver via the `stars` and `terra` packages. - - +Explore [this related notebook](./55_zarr_R_gdal.qmd) which demonstrates how to access the same Zarr data using the GDAL Zarr driver via the `stars` and `terra` packages. \ No newline at end of file From abb6596649939e9b351a947764d93c60adcb9614 Mon Sep 17 00:00:00 2001 From: joheisig Date: Tue, 16 Dec 2025 10:41:54 +0100 Subject: [PATCH 5/6] minor edits --- 05_zarr_tools/55_zarr_R_gdal.qmd | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/05_zarr_tools/55_zarr_R_gdal.qmd b/05_zarr_tools/55_zarr_R_gdal.qmd index 6301f13..f512827 100644 --- a/05_zarr_tools/55_zarr_R_gdal.qmd +++ b/05_zarr_tools/55_zarr_R_gdal.qmd @@ -1,5 +1,5 @@ --- -title: "Accessing EOPF Sentinel Zarr via GDAL in `stars` and `terra`" +title: "Access EOPF Sentinel Zarr via GDAL in `stars` and `terra`" format: html execute: cache: refresh @@ -10,7 +10,7 @@ execute: This notebook demonstrates how to **retrieve** remotely stored Zarr data using the Zarr GDAL driver in R. We will explore how to **read and visualize** Zarr data (zarrays) and their metadata using the `stars`, `sf` and `terra` packages as well as their current limitations. ::: {.callout-tip} -This notebook has a sibling, which demonstrates how to access the same Zarr data using the `Rarr` package. You can find it [here](./remote_Zarr_via_Rarr.html). +This notebook has a sibling, which demonstrates how to access the same Zarr data using the `Rarr` package. You can find it [here](./54_zarr_R_rarr.qmd). ::: ### What we will learn @@ -108,7 +108,7 @@ r = read_stars(vsi_url, sub = 1) r = read_stars(vsi_url, sub = band_variable) ``` -✅ Constructing the full Zarray path from prefix, URL, and band variable. +✅ Constructing the full zarray path from prefix, URL, and band variable. ```{r} (r = read_stars(paste(vsi_url, band_variable, sep = ":"))) @@ -160,7 +160,7 @@ m = read_mdim(vsi_url, variable = "?") # query array names ``` -✅ Specifying the full Zarray path from prefix, URL, and band variable. Read as proxy object. +✅ Specifying the full zarray path from prefix, URL, and band variable. Read as a proxy object (just metadata; values are accessed lazily when required). ```{r} system.time({ (m = read_mdim(vsi_url, variable = band_variable, proxy = TRUE)) @@ -176,7 +176,7 @@ As for `read_stars()`, the CRS is not automatically recognized and needs to be s ### `terra::rast()` -✅ / ⛔ Traditional approach for reading all bands/layers at once. +⛔ Traditional approach for reading all bands/layers at once. ```{r, error=T} (tr = terra::rast(vsi_url)) @@ -185,7 +185,7 @@ tr$b02 names(tr) # has non-unique names, e.g. three times "b02" ``` -`terra` reads some of the available bands, but fails to name them uniquely. It likely searches for band names like "b02" and finds the first group of data arrays by order, in this case the 10-meter bands (2, 3, 4, and 8). The Zarr store contains multiple array named by band such as "/measurements/reflectance/r10m/b02", their corresponding masks (e.g. "quality/mask/r10m/b02) and the detector footprint (e.g. "/conditions/mask/detector_footprint/r10m/b02"). All Zarrays share the same name, "b02", leading to non-unique layer names in the resulting `SpatRaster` object. When constructing unique names we can at least visualize the data. +`terra` reads some of the available bands, but fails to name them uniquely. It likely searches for band names like "b02" and finds the first group of data arrays by order, in this case the 10-meter bands (2, 3, 4, and 8 aka R,G,B, and NIR). The Zarr store contains multiple arrays named after each band such as `/measurements/reflectance/r10m/b02`, a corresponding mask (e.g. `quality/mask/r10m/b02`) and the detector footprint (e.g. `/conditions/mask/detector_footprint/r10m/b02`). All of these zarrays share the same name, `b02`, leading to non-unique layer names in the resulting `SpatRaster` object. When constructing unique names we can at least visualize the data. ```{r} # rename layers to (not meaningful) unique names @@ -195,18 +195,21 @@ names(tr) plot(tr) ``` -✅ Specifying the full Zarray path from prefix, URL, and band variable as sub-dataset. Read as proxy object. +✅ Specifying the full zarray path from prefix, URL, and band variable as sub-dataset. Read as proxy object. ```{r} system.time({ tr = rast(vsi_url, subds = band_variable) })# fast: only reads metadata crs(tr) # "": empty -system.time(plot(tr)) # slow: only here the full array is downloaded ``` As for the `stars`-functions, the CRS is not automatically recognized and needs to be set manually. +```{r} +system.time(plot(tr)) # slow: only here the full array is downloaded +``` + ## 💪 Now it is your turn - 🔍 **Task**: Explore the metadata of other Zarr stores using GDAL's `mdiminfo` command via `sf::gdal_utils()`. Store the JSON output in an R-object and find entries that store relevant metadata, such as the CRS, bounding box, or sensor-related information. @@ -218,4 +221,4 @@ In this notebook we have demonstrated how to access remote Zarr stores using GDA ## What's next? -Explore [this related notebook](./remote_Zarr_via_Rarr.html) which demonstrates how to access the same Zarr data using the `Rarr` package and building a `stars` object from it. +Explore [this related notebook](./54_zarr_R_rarr.qmd) which demonstrates how to access the same Zarr data using the `Rarr` package and building a `stars` object from it. From d9102924898bbe38f5fbd7b98d7c56605e5fea20 Mon Sep 17 00:00:00 2001 From: joheisig Date: Fri, 23 Jan 2026 13:37:58 +0100 Subject: [PATCH 6/6] move R packages to environment.yaml --- deployment/environment.yaml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/deployment/environment.yaml b/deployment/environment.yaml index eb6e681..f35c985 100644 --- a/deployment/environment.yaml +++ b/deployment/environment.yaml @@ -11,5 +11,7 @@ dependencies: - r-fs=1.6.6 - r-lobstr=1.1.3 - r-biocmanager + - r-mapview=2.11.4 + - r-jsonlite=2.0.0 # - bioconductor-rarr==1.10.1 - Currently not supported in bioconda so installed manually in Dockerfile_r - sqlite=3.51.2