-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
New xagg prototype up (#1 ) that gets at some of what I've been thinking about for aggregations. The lambda takes 6-to-7 minutes and $2-to-$3 to run a continental aggregation function on atl06 for an orbital cycle.
My hope is that the design pattern is someone portable and can be generalized. Here's the high-level overview of what happens:
- Spacio-temporal metadata query. The atl06 example hits NASA CMR, but this is broadly compatible with anything that conforms to a STAC API
- Definition of what to fetch (i.e., which columns); this is in a python function right now, but would probably be abstracted to a yml template. This is also where we might add in (for example) a reader specification (current example uses h5coro).
- Definition of what aggregation to do. This is defined in the same python file, and could either be a script, function, or similar.
- Definition of input / output grid. Input grid defines the workers; output grid is self explanatory. This is implemented using healpix for both, but could be generalized to arbitrary user defined grids later on.
- What to write to. This is currently a stub (i.e., parquet files), but should be more grid like (i.e., to zarr or similar with partial spatial chunked writes)
In principle, we could (for minimal work) swap out ICESat-2 for anything in the CMR that has a data model such that we're reading h5 files with per observation lat/lon data-- that's a huge amount of science products already.
A few bigger picture items:
- What do we write too? We currently use parquet/geoparquet, but Zarr seems like a promising candidate.
- What do we visualize with? I'd love the ability to view this where data actually populated as it was written.
- We had looked at longboard a while ago; it works with with geoparquet and is client side, which is great
- My understanding is that it's based on deck.gl, which doesn't support non-Mercator projections, which is a deal breaker for the polar regions
- How feasible is upstreaming multiple projection types?
- Other options are zarr viewers
- Requires fixed array output size (honestly a reasonable and workable restrictions)
- Seem to have some support for healpix metadata (not required, but nice)
- Some issues with only displaying global grid extent for the healpix example
- We had looked at longboard a while ago; it works with with geoparquet and is client side, which is great
- Can we generalize the pattern to a common interface for both local and cloud deployment?
- What's the cloud backend (cubed?), and how do we handle authentication
- What's the local backend (vaex?)
- How do we integrate with the existing ecosystem
- Do we support custom input and output grids, or just custom output grids, or neither?
Metadata
Metadata
Assignees
Labels
No labels