diff --git a/pr-previews/pr-42/404.html b/pr-previews/pr-42/404.html index 6f3a70a..393749b 100644 --- a/pr-previews/pr-42/404.html +++ b/pr-previews/pr-42/404.html @@ -399,6 +399,73 @@ + + + + + + + + +
The titiler-cmr
API is deployed as a Lambda function in the SMCE VEDA AWS account. For small time series requests (<500 time points) you can expect a response from any of the endpoints within ~20 seconds. For larger time series requests, you run the risk of bumping into Lambda concurrency or timeout limits. This report shows some results from the test_timeseries_benchmarks.py
script that sends many requests with varying time series lengths as well as several other parameters that affect runtime.
import benchmark_analysis as ba
-
---------------------------------------------------------------------------- -ModuleNotFoundError Traceback (most recent call last) -Cell In[1], line 1 -----> 1 import benchmark_analysis as ba - -ModuleNotFoundError: No module named 'benchmark_analysis'-
The following tests use the following datasets to evaluate the limits of the /timeseries
endpoints for the xarray
backend
The titiler-cmr
API can be deployed as a Lambda function in AWS. Since requests to the time series endpoints will make recursive requests to the Lambda function for the lower-level time step operations, there are some limits in place to avoid making large requests that are likely to overwhelm the API.
x_pixels * y_pixels * n_time
) which is helpful for determining if a request will succeed/timeseries/bbox
endpoint for generating GIFs for a bounding box will struggle on requests for a large AOI and/or a lengthy time series for high spatial resolution datasets. Based on a coarse evaluation of the API, requests are limited to 100,000,000 (1e8
) total pixels. There is a limit in place that will cause requests that exceed this limit to fail fast without firing hundreds of doomed Lambda invocations./timeseries/statistics
endpoint can handle larger requests than the /timeseries/bbox
endpoint Based on a coarse evaluation of the API, requests are limited to 15,000,000,000 (1.5e10
) total pixels as long as the individual images read by the /statistics
endpoint are smaller than 56,250,000 (5.625e7
) pixels.The time series API provides rapid access to time series analysis and visualization of collections in the CMR catalog, but there are some limitations to the API deployment that require some care when making large requests.
+There are several factors that must be considered in order to make a successful time series request:
These factors all influence the runtime and memory footprint of the initial /timeseries
request and requests that are too large in any of those dimensions can result in an API failure. Here are a few guidelines to help you craft successful time series requests.
Under the current deployment configuration statistics
endpoint can process time series requests with up to ~1000 points. Requests that involve more than 1000 points are likely to fail.
for dataset, df in ba.dfs["statistics"].items():
- fig = ba.plot_error_rate_heatmap(
- df=df,
- x="num_timepoints",
- y="bbox_dims",
- z="error_rate",
- labels={"x": "number of time points", "y": "bbox dimensions", "color": "error rate"},
- title=f"{dataset}: error rate by bbox size and number of time points",
- )
- fig.show()
-
---------------------------------------------------------------------------- -NameError Traceback (most recent call last) -Cell In[2], line 1 -----> 1 for dataset, df in ba.dfs["statistics"].items(): - 2 fig = ba.plot_error_rate_heatmap( - 3 df=df, - 4 x="num_timepoints", - (...) - 8 title=f"{dataset}: error rate by bbox size and number of time points", - 9 ) - 10 fig.show() - -NameError: name 'ba' is not defined-
The top factor that determines if a request will succeed or fail is the number of points in the time series. In the default deployment, there is a hard cap of 995 time points in any time series request. This cap is in place because there is a concurrency limit of 1000 on the Lambda function that executes the API requests.
In general, the size of the area you want to analyze will have minimal impact on the runtime! This is because titiler.xarray
has to read the entire granule into memory before subsetting, so reducing the size of the AOI does not reduce the overall footprint of the computation.
for dataset, df in ba.dfs["statistics"].items():
- ba.plot_line_with_error_bars(
- df=df.sort_values(["bbox_size", "num_timepoints"]),
- color="bbox_dims",
- title=f"{dataset}: statistics runtime",
- ).show()
-
---------------------------------------------------------------------------- -NameError Traceback (most recent call last) -Cell In[3], line 1 -----> 1 for dataset, df in ba.dfs["statistics"].items(): - 2 ba.plot_line_with_error_bars( - 3 df=df.sort_values(["bbox_size", "num_timepoints"]), - 4 color="bbox_dims", - 5 title=f"{dataset}: statistics runtime", - 6 ).show() - -NameError: name 'ba' is not defined-
For datasets that use the rasterio
backend, there will be very few limitations on maximum array size as long as the data are COGs and you specify a reasonable output image size (or use the max_size
parameter) in your request.
For datasets without overviews/pyramids, titiler-cmr
will need to read all of the bytes that overlap the request AOI even if the resulting image is going to be downsampled for a GIF. Therefore, if the area of interest for a /timeseries/statistics
or /timeseries/bbox
request will create a large array that is likely to exceed the capacity of the Lambda function, the request will fail fast.
The limits for the xarray
backend are:
/timeseries/bbox
5.6e7
pixels (~7500x7500)x_pixels * y_pixels * n_time
): 1e8
pixels/timeseries/statistics
5.6e7
pixels (~7500x7500)1.5e10
pixelsFor low-resolution datasets (e.g. 28km or 0.25 degree) you will not run into any issues (unless you request too many time points!) because a request for the full dataset will be reading arrays that are ~1440x720 pixels.
+For higher-resolution datasets (e.g. 1km or 0.01 degree), you will start to run into problems as the size of the raw arrays that titiler-cmr is processing increases (and the number of discrete points or intervals increases).
Under the current deployment configuration the bbox
endpoint can reliably process time series requests with up to ~500 points. Requests that involve more than 500 points may fail if the area of interest is very large.
for dataset, df in ba.dfs["bbox"].items():
- for img_size in sorted(df["img_size"].unique()):
- img_size_df = df[df["img_size"] == img_size]
- img_dims = img_size_df["img_dims"].unique()[0]
- ba.plot_error_rate_heatmap(
- df=img_size_df,
- x="num_timepoints",
- y="bbox_dims",
- z="error_rate",
- labels={"x": "number of time points", "y": "bbox dimensions", "color": "error rate"},
- title=f"{dataset}: image size {img_dims}",
- ).show()
+Examples¶
The MUR-SST dataset is good for demonstrating the limits of the time series endpoints with the xarray
backend. It has high resolution (1 km, 0.01 degree) daily global sea surface temperature observations! With this resolution it is easy to craft a request that will break the /timeseries
endpoints. Here are some examples of how to manipulate the time series parameters to achieve success with the /timeseries/bbox
endpoint.
+from datetime import datetime, timedelta
+
+import httpx
-for dataset, df in ba.dfs["bbox"].items():
- for img_size in sorted(df["img_size"].unique()):
- img_size_df = df[df["img_size"] == img_size]
- img_dims = img_size_df["img_dims"].unique()[0]
- ba.plot_error_rate_heatmap(
- df=img_size_df,
- x="num_timepoints",
- y="bbox_dims",
- z="error_rate",
- labels={"x": "number of time points", "y": "bbox dimensions", "color": "error rate"},
- title=f"{dataset}: image size {img_dims}",
- ).show()
-
---------------------------------------------------------------------------- -NameError Traceback (most recent call last) -Cell In[4], line 1 -----> 1 for dataset, df in ba.dfs["bbox"].items(): - 2 for img_size in sorted(df["img_size"].unique()): - 3 img_size_df = df[df["img_size"] == img_size] - -NameError: name 'ba' is not defined-
Here is a request that will succeed (if the lambda is warmed up):
+180 / P1D
)500 * 500 * 180 = 4.5e7
bounds = (-5, -5, 0, 0)
+bbox_str = ",".join(str(x) for x in bounds)
+
+start_datetime = datetime(year=2011, month=1, day=1, hour=0, minute=0, second=1)
+end_datetime = start_datetime + timedelta(days=180)
+
+response = httpx.get(
+ f"https://dev-titiler-cmr.delta-backend.com/timeseries/bbox/{bbox_str}.gif",
+ params={
+ "concept_id": "C1996881146-POCLOUD",
+ "datetime": "/".join(dt.isoformat() for dt in [start_datetime, end_datetime]),
+ "step": "P1D",
+ "variable": "analysed_sst",
+ "backend": "xarray",
+ "rescale": "273,315",
+ "colormap_name": "viridis",
+ "temporal_mode": "point",
+ },
+ timeout=None,
+)
+
That request is about half of the maximum request size for the /timeseries/bbox
endpoint. We can push it to the limit by doubling the length of the time series:
360 / P1D
)500 * 500 * 360 = 9.0e7
bounds = (-5, -5, 0, 0)
+bbox_str = ",".join(str(x) for x in bounds)
+
+start_datetime = datetime(year=2011, month=1, day=1, hour=0, minute=0, second=1)
+end_datetime = start_datetime + timedelta(days=360)
+
+response = httpx.get(
+ f"https://dev-titiler-cmr.delta-backend.com/timeseries/bbox/{bbox_str}.gif",
+ params={
+ "concept_id": "C1996881146-POCLOUD",
+ "datetime": "/".join(dt.isoformat() for dt in [start_datetime, end_datetime]),
+ "step": "P1D",
+ "variable": "analysed_sst",
+ "backend": "xarray",
+ "rescale": "273,315",
+ "colormap_name": "viridis",
+ "temporal_mode": "point",
+ },
+ timeout=None,
+)
+
If we increase the length of the time series such that the request exceeds the maximum size, the API will return an error:
+540 / P1D
)500 * 500 * 540 = 1.35e8
(greater than maximum of 1.0e8
!)bounds = (-5, -5, 0, 0)
+bbox_str = ",".join(str(x) for x in bounds)
+
+start_datetime = datetime(year=2011, month=1, day=1, hour=0, minute=0, second=1)
+end_datetime = start_datetime + timedelta(days=540)
+
+response = httpx.get(
+ f"https://dev-titiler-cmr.delta-backend.com/timeseries/bbox/{bbox_str}.gif",
+ params={
+ "concept_id": "C1996881146-POCLOUD",
+ "datetime": "/".join(dt.isoformat() for dt in [start_datetime, end_datetime]),
+ "step": "P1D",
+ "variable": "analysed_sst",
+ "backend": "xarray",
+ "rescale": "273,315",
+ "colormap_name": "viridis",
+ "temporal_mode": "point",
+ },
+ timeout=None,
+)
+
We can get get a successful response for the larger time window if we reduce the temporal resolution:
+540 / P7D
)500 * 500 * 77 = 1.925e7
bounds = (-5, -5, 0, 0)
+bbox_str = ",".join(str(x) for x in bounds)
+
+start_datetime = datetime(year=2011, month=1, day=1, hour=0, minute=0, second=1)
+end_datetime = start_datetime + timedelta(days=540)
+
+response = httpx.get(
+ f"https://dev-titiler-cmr.delta-backend.com/timeseries/bbox/{bbox_str}.gif",
+ params={
+ "concept_id": "C1996881146-POCLOUD",
+ "datetime": "/".join(dt.isoformat() for dt in [start_datetime, end_datetime]),
+ "step": "P7D",
+ "variable": "analysed_sst",
+ "backend": "xarray",
+ "rescale": "273,315",
+ "colormap_name": "viridis",
+ "temporal_mode": "point",
+ },
+ timeout=None,
+)
+
With the weekly temporal resolution we have some room to increase the size of the bounding box!
+540 / P7D
)1000 * 1000 * 77 = 7.7e7
bounds = (-10, -10, 0, 0)
+bbox_str = ",".join(str(x) for x in bounds)
+
+start_datetime = datetime(year=2011, month=1, day=1, hour=0, minute=0, second=1)
+end_datetime = start_datetime + timedelta(days=540)
+
+response = httpx.get(
+ f"https://dev-titiler-cmr.delta-backend.com/timeseries/bbox/{bbox_str}.gif",
+ params={
+ "concept_id": "C1996881146-POCLOUD",
+ "datetime": "/".join(dt.isoformat() for dt in [start_datetime, end_datetime]),
+ "step": "P7D",
+ "variable": "analysed_sst",
+ "backend": "xarray",
+ "rescale": "273,315",
+ "colormap_name": "viridis",
+ "temporal_mode": "point",
+ },
+ timeout=None,
+)
+
If we double the AOI size again, we will break exceed the request size limit:
+540 / P7D
)2000 * 2000 * 77 = 3.08e8
(greater than maximum of 1e8
bounds = (-20, -20, 0, 0)
+bbox_str = ",".join(str(x) for x in bounds)
+
+start_datetime = datetime(year=2011, month=1, day=1, hour=0, minute=0, second=1)
+end_datetime = start_datetime + timedelta(days=540)
+
+response = httpx.get(
+ f"https://dev-titiler-cmr.delta-backend.com/timeseries/bbox/{bbox_str}.gif",
+ params={
+ "concept_id": "C1996881146-POCLOUD",
+ "datetime": "/".join(dt.isoformat() for dt in [start_datetime, end_datetime]),
+ "step": "P7D",
+ "variable": "analysed_sst",
+ "backend": "xarray",
+ "rescale": "273,315",
+ "colormap_name": "viridis",
+ "temporal_mode": "point",
+ },
+ timeout=None,
+)
+
But if we reduce the temporal resolution from weekly to monthly, it will work!
+540 / P1M
)2000 * 2000 * 18 = 3.08e8
bounds = (-20, -20, 0, 0)
+bbox_str = ",".join(str(x) for x in bounds)
+
+start_datetime = datetime(year=2011, month=1, day=1, hour=0, minute=0, second=1)
+end_datetime = start_datetime + timedelta(days=540)
+
+response = httpx.get(
+ f"https://dev-titiler-cmr.delta-backend.com/timeseries/bbox/{bbox_str}.gif",
+ params={
+ "concept_id": "C1996881146-POCLOUD",
+ "datetime": "/".join(dt.isoformat() for dt in [start_datetime, end_datetime]),
+ "step": "P1M",
+ "variable": "analysed_sst",
+ "backend": "xarray",
+ "rescale": "273,315",
+ "colormap_name": "viridis",
+ "temporal_mode": "point",
+ },
+ timeout=None,
+)
+
However, there is a maximum image size that we can read with the xarray
backend, so we cannot increase the bounding box indefinitely. The limit imposed on the API at this time is 5.6e7
pixels (7500 x 7500 pixels). In the case of MUR-SST, that is a bounding box of roughly 75 x 75 degrees.
The size of the area of interest increases the response time, especially for requests for higher resolution images.
-for dataset, df in ba.dfs["bbox"].items():
- ba.plot_line_with_error_bars(
- df=df.sort_values(["bbox_size", "num_timepoints"]),
- color="bbox_dims",
- facet_row="img_dims",
- title=f"{dataset}: runtime by bbox size and image dimensions"
- ).show()
-
---------------------------------------------------------------------------- -NameError Traceback (most recent call last) -Cell In[5], line 1 -----> 1 for dataset, df in ba.dfs["bbox"].items(): - 2 ba.plot_line_with_error_bars( - 3 df=df.sort_values(["bbox_size", "num_timepoints"]), - 4 color="bbox_dims", - 5 facet_row="img_dims", - 6 title=f"{dataset}: runtime by bbox size and image dimensions" - 7 ).show() - -NameError: name 'ba' is not defined-
P1D
) to weekly (P7D
) or greater (P10D
)