-
Notifications
You must be signed in to change notification settings - Fork 2
STAC Clients for R and Python
by Mika Dinnus and Lukas Räuschel
With R and Python, you can easily explore STACs, which provide access to Earth observation data. In this handout, we'll give a quick introduction on how to query STACs, get their metadata, and download the related datasets using both R and Python.
In R, the rstac package allows you to interact with STACs. It provides a range of functions for accessing, searching, and downloading Earth observation data through the STAC protocol.
To install the package you can use CRAN:
install.packages("rstac")
After installation, ensure that you import the package into your R session:
library(rstac)
To connect to a STAC, save the URL and use the following function:
stac_url <- "example.url"
stac_obj <- stac(stac_url)
Conformance classes define the specific API standards that a STAC server implements. These standards ensure interoperability, meaning that different STAC services and clients can communicate effectively by following the same set of rules. By checking conformance, you can verify which functionalities the STAC API supports.
stac_obj %>% conformance() %>% get_request()
Once connected, you can retrieve information about the STAC service:
stac_obj %>% get_request()
To get the available collections, use:
stac_obj %>% collections() %>% get_request()
To search for specific items within a collection, you can set up a search query:
query <- stac_search(
q = stac_obj, # STAC object
collections = "example-name", # Name of a collection, concatenation functions are also possible
datetime = "2021-01-01/2021-12-31", # Time interval for the search
limit = 10 # Limit the number of results returned
)
query %>% get_request()
In addition to basic search options, you can also filter results by intersections with a geometry or bounding box (bbox), and sort them by specific attributes ( e.g. datetime, id, ground sample distance (gsd), ...).
To download assets of an item, rstac provides a convenient function:
assets_download(
query, # STAC search query
asset_id = "example-asset-name", # Name of the specific asset to download
output_dir = "path-to-the-output-dir", # Directory where downloaded files will be saved
overwrite = TRUE # Overwrite existing files if they already exist
)
Now you have donwloaded your data and the files are ready to use.
Similarly to R, Python provides an extension called pystac to facilitate the use of STACs. The following sections will provide an overview of the functions and a quick start guide.
If you have not yet installed pystac on your computer, please do so.
pip install pystac
The first step is to import the pystac library to your project, along with any other libraries you may require to work with Collections, Items and Assets in a straightforward way.
import json
import requests
import pystac
from pystac import Catalog, get_stac_version
from pystac_client import Client
Furthermore some extensions of pystac are added. Their functionality will be explained when they are first used.
In Order to work with an existing STAC you need to obtain the STACs URL. Depending on the extensions that are used, different functions apply. Standard:
catalog = Catalog.from_file('LINK')
catalog.describe()
With the Client-Extension of pystac
catalog_url = 'LINK'
client = Client.open(catalog_url)
STACs provide a range of information about their collections, items and assets, with some fields being mandatory and others optional. As a result, certain parameters can be obtained with certainty:
print(f"ID: {catalog.id}")
print(f"Title: {catalog.title or 'N/A'}")
print(f"Description: {catalog.description or 'N/A'}")
This semantic is applicable to all objects within a STAC and can be utilised for the retrieval of metadata across all potential parameters:
print(item.geometry)
print(item.bbox)
print(asset.media_type)
print(asset.href)
...
get_collections() retrieves the collections in the given catalog.
get_collection(collection_id) returns a single collection from the catalog, identified by the ID.
get_items(id, recursive=Bool) returns all items within a specified catalog. In this context, the "id" parameter can be omitted, as the "recursive" parameter determines whether subcatalogs are traversed.
get_all_items() retrieves all items from given catalog and all subcatalogs
All these functions are used with "[Object].[Function]".
Additionally, the number of collections within a STAC can be quantified and displayed in a list or any other desired format. This approach combines existing Python syntax with the previously presented pystac functions.
collections = list(catalog.get_collections())
print(f"Number of collections: {len(collections)}")
print("Collections IDs:")
for collection in collections:
print(f"- {collection.id}")
items = list(catalog.get_all_items())
print(f"Number of items: {len(items)}")
for item in items:
print(f"- {item.id}")
In some cases, it is necessary to narrow down the information you wish to retrieve. In such instances, a search without downloading all data is a logical solution. In pystac, a search can be conducted through the Client Extension.
catalog_url = 'example_url'
client = Client.open(catalog_url)
search = client.search(
parameter = value,
limit = value
)
The search can be refined using existing parameters and their associated values. In some cases, it may be beneficial to limit the data search using the 'limit' parameter.
open(url) works with the client extension and integrates a stac
search(parameter = value, (..)) works with the client extension, searching for a STAC object
To download data from a STAC, the pystac library's requests extension should be used. This extension interacts with the API, allowing the data to be downloaded to the user's drive.
for asset_key in item.assets:
asset = item.assets[asset_key]
asset_url = asset.href
file_name = asset_key + '.' + asset.media_type.split('/')[-1]
response = requests.get(asset_url)
with open(file_name, 'wb') as f:
f.write(response.content)
print(f'{file_name} heruntergeladen.')
rstac and pystac both allow users to access SpatioTemporal Asset Catalogs (STAC) for geospatial data, but they cater to different programming languages. rstac is tailored for R users, integrating well with R's data analysis and visualization tools like sf and terra. In contrast, pystac is designed for Python users, working seamlessly with Python libraries like requests. Your choice between them depends on whether you prefer R or Python for your data analysis tasks.
https://stacspec.org/en/tutorials/ https://brazil-data-cube.github.io/rstac/ https://pystac.readthedocs.io/en/stable/
stac client for python:
import json # Zur Anzeige der Abfrageergebnisse import pystac import requests # Für Interaktion mit APIs
from pystac import Catalog, get_stac_version # Erweiterung von pystac zum Einbinden von bestehenden Catalogs from pystac_client import Client # Erweiterung von pystac u.a. zum suchen in STACs
root_catalog = Catalog.from_file('https://raw.githubusercontent.com/stac-utils/pystac/main/docs/example-catalog/catalog.json') root_catalog.describe() # Aufbau des Catalogs
print(f"ID: {root_catalog.id}") print(f"Title: {root_catalog.title or 'N/A'}") print(f"Description: {root_catalog.description or 'N/A'}")
collections = list(root_catalog.get_collections()) # get_collections() und weitere Func. im Handout erläutert print(f"Number of collections: {len(collections)}") # Anzahl der vorhandenen Collections print("Collections IDs:") for collection in collections: print(f"- {collection.id}")
items = list(root_catalog.get_all_items()) print(f"Number of items: {len(items)}") for item in items: print(f"- {item.id}")
item = root_catalog.get_item("LC80140332018166LGN00", recursive=True) # Einzelenes Item, weitere Benutzung im Folgenden
print(item.geometry) print(item.bbox) print(item.datetime) print(item.collection_id) item.get_collection() # Abfrage, zu welcher Collection das item gehört
print(item.common_metadata.instruments) print(item.common_metadata.platform) print(item.common_metadata.gsd)
for asset_key in item.assets: #.assets als Func zur Abfrage aller Assets eines Items asset = item.assets[asset_key] print('{}: {} ({})'.format(asset_key, asset.href, asset.media_type)) # asset-key,(..) werden in den String {},(..) eingesetzt
asset = item.assets['B3'] print(asset.to_dict()) # Ähnlich zur Abfrage mit .format
for asset_key in item.assets: asset = item.assets[asset_key] asset_url = asset.href file_name = asset_key + '.' + asset.media_type.split('/')[-1]
# Fragt die Daten von der API ab
response = requests.get(asset_url) # Nutzung der requests-Library
# Speichere die Datei
with open(file_name, 'wb') as f:
f.write(response.content)
print(f'{file_name} heruntergeladen.')
catalog_url = 'https://planetarycomputer.microsoft.com/api/stac/v1' client = Client.open(catalog_url) # Client interagiert mit API-Endpunkt (URL)
search = client.search( collections=['sentinel-2-l2a'], bbox=[-47.02148, -17.35063, -42.53906, -12.98314], datetime='2023-01-01/2023-01-31', limit = 10 )
items = list(search.items()) print(len(items)) print(items) item = items[5] print(f"Item ID: {item.id}") print(f"Item datetime: {item.datetime}")
for asset_key, asset in item.assets.items(): print(f"Asset Key: {asset_key}") print(f"Asset URL: {asset.href}") print(f"Asset Media Type: {asset.media_type}")
stac client for R:
install.packages("rstac") install.packages("sf") install.packages("terra") install.packages("tibble") library(terra) library(sf) library(tibble) library(rstac)
stac_url <- "https://planetarycomputer.microsoft.com/api/stac/v1"
s_obj <- stac(stac_url) str(s_obj)
get_request(s_obj)
s_obj %>% get_request()
conformance_classes <- s_obj %>% conformance() %>% get_request() conformance_classes
collections_query <- s_obj %>% collections()
collections_query %>% get_request()
stac_search( q = s_obj, collections = "usgs-lcmap-conus-v13", datetime = "2021-01-01/2021-12-31", limit = 10 ) %>% get_request()
ashe <- read_sf(system.file("shape/nc.shp", package = "sf"))[1, ] plot(st_geometry(ashe))
ashe_bbox <- ashe %>% st_transform(4326) %>% st_bbox() ashe_bbox
stac_query <- stac_search( q = s_obj, collections = "usgs-lcmap-conus-v13", bbox = ashe_bbox, datetime = "2021-01-01/2021-12-31", limit = 10 ) %>% get_request() stac_query
signed_stac_query <- items_sign( stac_query, sign_planetary_computer() # Authentifizierung beim Planetary Computer ) signed_stac_query
output_directory <- "C:/Users/lraeu/OneDrive/Desktop/Geosoftware II/geosoft2-2024/data" assets_download(signed_stac_query, "lcpri", output_dir = output_directory, overwrite = TRUE) output_file <- file.path("C:/Users/lraeu/OneDrive/Desktop/Geosoftware II/geosoft2-2024/data/lcmap/CU/V13/025011/2021/LCMAP_CU_025011_2021_20220721_V13_CCDC/LCMAP_CU_025011_2021_20220629_V13_LCPRI.tif") %>% rast() plot(output_file) rast("C:/Users/lraeu/OneDrive/Desktop/Geosoftware II/geosoft2-2024/data/B1.tiff")
ashe %>% st_transform(st_crs(output_file)) %>% st_geometry() %>% plot(add = TRUE, lwd = 3)