-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathSTAC.qmd
146 lines (119 loc) · 6.08 KB
/
STAC.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
execute:
eval: false
---
# Spatio-Temporal Asset Catalogs
There is currently no good way of making and editing STAC Metadata in R, so this will be a Python-centric tutorial for the time being. The RSTAC Package offers the ability to search and use STAC APIs and should work with static catalogs (such as being built in the below tutorial), in the near future. In the meantime, [pystac](https://pystac.readthedocs.io/en/latest/index.html) offers a fairly simple way of searching, developing, and editing STAC Metadata.
## Building STAC metadata with PySTAC
Load the required packages
```{python python.reticulate = FALSE}
import pystac
import datetime as dt
import shapely
#Other useful packages that help automate parts of this process:
# import riostac #this is a useful package
# import xarray stac
```
### STAC catalogs
Catalogs are the simplest of the possible STAC specifications to make as it does not have to have any specific spatio-temporal extent to describe them. The example below shows how to build a very basic STAC catalog using pystac.
```{python python.reticulate = FALSE}
productivity_catalog = pystac.Catalog(
id="crop_productivity_data",
description="Modelled data of crop productivity",
title="Crop Productivity"
)
```
### STAC collections
STAC collections are very similar to STAC catalogs, except they have extra metadata such as a spatio-temporal extent, keywords, custom fields, and STAC extensions which are explained in more detail in the STAC extension heading. They are a bit more difficult to build due to this extra metadata, but the general process is below.
```{python python.reticulate = FALSE}
# Build the temporal extent of the collection
# the strptime function takes a string time and the format it is in such as %Y-%M-%d
# STAC requires the time to be in UTC, so the replace function forces it into that format
# NOTE: that this method will default to YEAR-01-01 if the month and day aren't provided/NA
time = ['2000', '2030']
start = str(dt.datetime.strptime(time[0], '%Y').replace(tzinfo=datetime.timezone.utc))
end = str(dt.datetime.strptime(time[1], '%Y').replace(tzinfo=datetime.timezone.utc))
# Build the spatial extent of the collection
# The bounds can be calculated using the riostac package, rasterio, terra, etc.
bounds = {'xmin': -180, 'ymin': -90, 'xmax': 180, 'ymax': 90}
bbox = [bounds['xmin'], bounds['ymin'], bounds['xmax'], bounds['ymax']]
# And now put the whole catalog together
productivity_paper_collection = pystac.Collection(id = "paper1_data",
description = "data from paper by Todd",
keywords = "productivity", "maize", "rice", "treated", "adaptation"
license = "CC-BY-4.0",
extent = pystac.Extent(
spatial = pystac.SpatialExtent(bboxes=[bbox]),
temporal = pystac.TemporalExtent(
intervals=[
start,
end
]
)
)
)
```
### STAC items
```{python python.reticulate = FALSE}
# In this example the bbox of the item is the same as the collection,
# but this is not always the case
bounds = {'xmin': -180, 'ymin': -90, 'xmax': 180, 'ymax': 90}
bbox = [bounds['xmin'], bounds['ymin'], bounds['xmax'], bounds['ymax']]
footprint = shapely.Polygon([
[bounds['xmin'], bounds['ymin']],
[bounds['xmin'], bounds['ymax']],
[bounds['xmax'], bounds['ymax']],
[bounds['xmax'], bounds['ymin']]
])
# Now we set the date of the item
# we could use 'strptime()' agin if date is a charachter, or set it like this:
data_date = dt.datetime(2020, 2, 5).replace(tzinfo=dt.timezone.utc)
item = pystac.Item(id='maize_productivity.tif',
geometry=footprint,
bbox=bbox,
datetime=data_date, #Can be set to `None` if multiple dates
#start_datetime=start_date, #Optional if multiple dates
#end_datetime=end_date,
properties={
'extraMetadataField': 'value',
'extraMetadataField2': 'value2',
"unit": "kg/ha",
"ssp": "SSP2-4.5"
})
```
Most often there will be more than one item in a collection. It is often useful to wrap the above code into a for loop and append each item to a list of items.
### STAC assets
This is the final piece of the STAC spec. An asset holds the paths to the STAC Item, other data, derived datasets, and other useful assets. It offers a lot of flexibility, you can have a single asset for each date, a different asset for each model or crop, etc. depending on what fits best with the data.
```{python python.reticulate = FALSE}
asset = pystac.Asset(href="s3://bucket/maize_productivity.tif",
media_type=pystac.MediaType.COG)
item.add_asset("COG Image", asset)
# or for multiple
s3_uris = [
"s3://bucket/maize_productivity_2020-1.tif",
"s3://bucket/maize_productivity_2020-2.tif",
"s3://bucket/maize_productivity_2020-3.tif"
]
for uri in s3_uris:
asset = pystac.Asset(href=uri, media_type=pystac.MediaType.COG)
asset_time = uri.split("_")[-1] # get the filename
asset_name = f'maize productivity cog {asset_time}'
item.add_asset(asset_name, asset)
```
### STAC extensions
STAC extensions can be added to all of the STAC specifications - catalogs, collections, items, and assets. These extensions provide a more formal metadata framework for certain types/aspects of metadata (i.e. raster, projection, or data cube specific metadata).
```{python python.reticulate = FALSE}
proj = pystac.extensions.ProjectionExtension.ext(item, add_if_missing=True) #add to the item
proj.apply(epsg=4326)
```
### Combining them all into a final catalog
```{python python.reticulate = FALSE}
# add the item to the collection
productivity_paper_collection.add_item(item)
# use .add_items() if you have multiple items in a list
# Add the collection to the catalog
productivity_catalog.add_child(productivity_paper_collection)
# Now just save it
productivity_catalog.normalize_and_save("~/stac/", catalog_type=pystac.CatalogType.SELF_CONTAINED)
```
<!-- ## Update an existing catalog -->