xpystac provides the glue that allows xarray.open_dataset
to accept pystac objects.
The goal is that as long as this library is in your env, you should never need to think about it.
- Open one asset: Reads data for an asset pointing to a COG, a zarr store, or a kerchunk reference file.
- Open one item: Reads data for all the assets in a particular item (commonly each COG represents a band).
- Open many items: Reads all the assets in all the items for a particular item collection iterable of items, or output of pystac_client.Client.search.
file format | one asset (item or collection-level) | one item | many items |
---|---|---|---|
COG | x | x | x |
zarr | x | ||
kerchunk | x | x* | x* |
* if stored in item alongside the datacube extension properties |
pip install xpystac
Read from a COG
import pystac
import xarray as xr
item = pystac.Item.from_file(
"https://raw.githubusercontent.com/stac-utils/pystac/v1.12.2/tests/data-files/examples/1.0.0/simple-item.json"
)
asset = item.assets["visual"]
xr.open_dataset(asset)
Here are a few examples from the Planetary Computer Docs which has some good examples of collection-level assets used to catalog zarr stores and kerchunk reference files.
import planetary_computer
import pystac_client
import xarray as xr
catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1",
modifier=planetary_computer.sign_inplace,
)
Read from a kerchunk reference file (ref):
collection = catalog.get_collection("nasa-nex-gddp-cmip6")
asset = collection.assets["ACCESS-CM2.historical"]
xr.open_dataset(asset, patch_url=planetary_computer.sign)
Read from a zarr file (ref)
collection = catalog.get_collection("daymet-daily-hi")
asset = collection.assets["zarr-abfs"]
xr.open_dataset(asset, patch_url=planetary_computer.sign)
Note that this zarr asset uses the xarray-assets extension to store open_kwargs
and storage_options
which xpystac can then pass along to xr.open_dataset
.
A single item containing many COGs:
import pystac
import xarray as xr
item = pystac.Item.from_file(
"https://earth-search.aws.element84.com/v1/collections/landsat-c2-l2/items/LC09_L2SR_081108_20250311_02_T2"
)
xr.open_dataset(item)
This takes advantage of a stacking library (either
odc-stac or stackstac - configurable via the stacking_library
option)
Read all the data from the search results for a collection of COGs:
import pystac_client
import xarray as xr
catalog = pystac_client.Client.open(
"https://earth-search.aws.element84.com/v1",
)
search = catalog.search(
intersects=dict(type="Point", coordinates=[-105.78, 35.79]),
collections=['sentinel-2-l2a'],
datetime="2022-04-01/2022-05-01",
)
xr.open_dataset(search, engine="stac")
Read data from an item collection that uses the exploratory approach of storing kerchunked metadata within the datacube extension metadata:
import pystac
import xarray as xr
item_collection = pystac.ItemCollection.from_file(
"https://raw.githubusercontent.com/stac-utils/xpystac/main/tests/data/data-cube-kerchunk-item-collection.json"
)
xr.open_dataset(item_collection)
When you call xarray.open_dataset(object, engine="stac")
this library maps that open
call to the correct library.
Depending on the type
of object
that might be a stacking library (either
odc-stac or stackstac)
or back to xarray.open_dataset
itself but with the engine and other options pulled from the pystac object.
This work is inspired by https://github.com/TomAugspurger/staccontainers and the discussion in stac-utils/pystac#846