-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-threaded raster strategy #161
base: master
Are you sure you want to change the base?
Conversation
…t error needs confirmation.
…proscribed by the docs.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #161 +/- ##
==========================================
- Coverage 90.71% 90.47% -0.24%
==========================================
Files 85 85
Lines 6664 6751 +87
Branches 628 636 +8
==========================================
+ Hits 6045 6108 +63
- Misses 587 608 +21
- Partials 32 35 +3 ☔ View full report in Codecov by Sentry. |
Thanks for this contribution! I've tested it with some other data locally and am seeing similar performance gains. I have a few questions/notes:
Any details on this? Are you referring to weighted operations, or something else?
Any issues with calling this
I don't expect it to be a major bottleneck, but do you see a reason not to have a
Any chance you're using GEOS < 3.10?
It's essentially handling the locking of the raster dataset to prevent multiple threads from reading at once. But it still may provide benefits here. |
Adds a new strategy to exact extract named
raster-parallel
.The strategy utilises oneAPI TBB to setup a parallel pipeline for finding intersecting features, reading raster data, performing zonal stats and merging stats for final output. Number of 'tokens' (TBB terminology, essentially maximum parallel tasks in flight) is controlled with the
--tokens [number]
command line argument.Implementation
Prior to the parallel pathway, logic is the same as
raster-sequential
where all features are read in and an STR tree is created for doing intersection.For the parallel pipeline:
Finally all features are written out with the same implementation as
raster-sequential
.Parallel considerations
Performance
Performing mean and count on ~1.6M polygons of Western Australia cadastral boundaries against ~25m square pixels of Australian agricultural land use data (with national coverage) done on Ryzen 7700 (8c/16t) with 32GB RAM:
feature-sequential
elapsed time: 7m 42s, maximum memory usage: 2.224 GBraster-sequential
elapsed time: 2m 52s, maximum memory usage: 4.639 GBraster-parallel
with 4 simultaneous tokens, elapsed time: 52s, max memory usage: 5.653 GBraster-parallel
with 8 simultaneous tokens, elapsed time: 37s, max memory usage: 5.823 GBraster-parallel
with 12 simultaneous tokens, elapsed time: 36s, max memory usage: 5.961 GBOther notes
All results were tested against
raster-sequential
outputs to control for any parallel bugs. Nothing major was observed other than occasional floating point errors at the end of its precision. I did not that multiple raster input toraster-sequential
doesn't look like it is working as intended, but out of scope for this PR.I didn't make any changes to the Python bindings or libs at this stage. I note that the actions continually fail on them but not sure why.
Welcome any comments, hope this is something that helps!