Skip to content

Comments

Fix hotspots dask performance: streaming 2-pass architecture#855

Merged
brendancol merged 1 commit intomasterfrom
fixes-460-hotspots-performance-with-dask-backed-xarray-dataarray
Feb 20, 2026
Merged

Fix hotspots dask performance: streaming 2-pass architecture#855
brendancol merged 1 commit intomasterfrom
fixes-460-hotspots-performance-with-dask-backed-xarray-dataarray

Conversation

@brendancol
Copy link
Contributor

@brendancol brendancol commented Feb 20, 2026

Summary

  • Rewrites _hotspots_dask_numpy to eliminate the global barrier that forced materialization of the full intermediate convolution output, making hotspots infeasible for datasets larger than RAM
  • Splits computation into two passes: (1) eagerly compute global mean/std as two scalars, (2) fuse convolution + z-score + classification into a single map_overlap call that streams chunk-by-chunk with O(chunk_size) memory
  • Re-enables the zero-std check that was commented out, removes redundant astype calls, and eliminates misuse of map_overlap for a pointwise function

Fixes #460

Impact at scale (e.g. 30 TB raster, 16 GB RAM)

Metric Before After
Source data reads 3x 2x
Intermediate materialization ~30 TB (spill or recompute) 0 (streaming)
Peak memory per worker O(dataset) O(chunk_size)
Feasible with 16 GB RAM? No Yes

Test plan

  • pytest xrspatial/tests/test_focal.py -k hotspots -v — all 3 tests pass (zero-std, numpy, dask_numpy)
  • Profile with large dask array to confirm no intermediate materialization

…mory

Rewrite _hotspots_dask_numpy to eliminate the global barrier that forced
materialization of the full intermediate convolution output. The old
implementation created a task graph where every chunk's z-score depended
on both the per-chunk convolution and global reductions over ALL chunks,
making it infeasible for datasets larger than RAM.

New approach:
- Pass 1: eagerly compute global_mean/global_std (two scalars, single
  co-scheduled graph traversal via da.compute)
- Pass 2: fuse convolution + z-score + classification into one
  map_overlap call, so each chunk streams through independently

Also fixes: redundant astype calls, pointwise map_overlap misuse,
separate nanmean/nanstd accumulation, and re-enables zero-std check.
@brendancol brendancol merged commit 356f73d into master Feb 20, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hotspots performance with Dask-backed xarray DataArray

1 participant