Polygonize potential improvements #27

cholmes · 2024-10-22T05:03:10Z

These should perhaps break out into their own issues, but a number of these could likely be done in one chunk of work for the 'polygonize' command proposed in #19 (these suggestions are currently based on #25 where it is its own command).

cholmes · 2024-10-28T05:34:37Z

@m-mohr - could you take on a chunk of these and then we can make additional issues for the ones you don't do.

Top one to me is fiboa / geoparquet output. And could do json too in fiboa, and then make geopackage 'fiboa-like' by default. And make it so if someone says -o fields.parquet then it just automatically does fiboa and geoparquet.

m-mohr · 2024-10-29T19:11:16Z

@cholmes How long should this take following the README instructions?

The top one for sure should probably be the algorithm. This thing is so slow due to O(n³) complexity, that I can't even get it to properly print to the status to terminal on my machine. It always stops to print the status at some point and the machine is busy forever. But that's also the most complex issue, but doesn't make sense to do something else before this works better.

calebrob6 · 2024-10-29T20:12:41Z

What takes O(n³) complexity?

m-mohr · 2024-10-29T21:06:01Z

The three for loops that are nested in the code :-)

            for y in range(0, input_height, polygonization_stride):
                for x in range(0, input_width, polygonization_stride):
                    ...
                    for geom, val in rasterio.features.shapes(mask_window, transform=new_transform):
                        ...

calebrob6 · 2024-10-29T21:13:27Z

Hmmm these aren't all n -- the first two for loops iterate over chunks of the input raster, the third nested for loop iterates over the number of contiguous blocks of 1's in the chunk. The whole thing takes time that depends on the number of contiguous 1's.

rasterio's shapes(..) says -- https://rasterio.readthedocs.io/en/latest/api/rasterio.features.html#rasterio.features.shapes

The amount of memory used by this algorithm is proportional to the number and complexity of polygons produced. This algorithm is most appropriate for simple thematic data. Data with high pixel-to-pixel variability, such as imagery, may produce one polygon per pixel and consume large amounts of memory.

Instead of running rasterio.features.shapes(...) on the entire input at one time (which just blocks for a long time), I split it up so that the user at least gets status updates.

m-mohr · 2024-10-29T21:44:25Z

You are right, my bad. Nevertheless, it's slow ;-) And for whetever reason at some point the CLI progress bar stops updating for me (before 100%). So I'm trying to fix/optimize it - Not sure yet how...

m-mohr · 2024-10-29T21:55:36Z

@calebrob6 I just looked at one of the polygonize outputs. Do theses lines that split the boundaries are a result from the optimizations for the progress reporting? That seems not ideal. They seem to appear pretty regularly in the file.

calebrob6 · 2024-10-29T21:58:21Z

Yep -- I agree not ideal

cholmes · 2024-10-29T22:30:36Z

We talked about it in slack and it's on the list above (just edited it to make it a bit more clear, it was a bit buried) - we should merge those back.

As for performance, we can remove the reprojection as @calebrob6 pointed out that we shouldn't need that for area calculations if projections are utm, and at this point we can assume everything is s2 input. The reprojection definitely made things slower.

But I didn't find it excessively slow, and it'd always complete fit me, so sounds like something else may be going wrong.

m-mohr · 2024-10-30T00:28:03Z

I don't think you can easily merge them back. You already see some examples above where it's not really clear whether they are separate fields or were split due to the window. Also, some shapes still look differently even when merged...

I think we need to get rid of the windows and just work on the full thing unfortunately. That will remove the progressbar for the first minutes though, but that's not a good enough reason to return suboptimal geometries, I think.

I'm having an alternative algorithm in mind, will try that out and I'll also try to optimize the number of operations for each shape. Working in #49

calebrob6 · 2024-10-30T05:32:27Z

I think the main optimization will be to sieve / preprocess the raster input (i.e. model output) that goes into the polygonization step. A 3x3 or 5x5 mode filter may help (but may also merge nearby fields that are separated by a single pixel).

Another thought I had (and reason why I chunked it in the first place) was to run rasterio.features.shapes in parallel for different input chunks.

To partially offset the boundary issues we are seeing with the chunked approach we could buffer each chunk by X pixels (i.e. run on overlapping windows) then merge the results intelligently before simplifying. I haven't thought too closely, but, any polygons that cross the border between chunks (disregarding the buffer in this case) should be mergeable. You could do this very quickly and would offset having to do a large expensive dissolve.

calebrob6 · 2024-10-30T05:37:52Z

Here's a crudely drawn example. In this case we can see a horizontal line pretty clearly that indicates a boundary from polygonizing two chunks. In my proposed method the "top chunk" in this example would extend say 100 pixels down into the bottom chunk and vice versa. I've drawn what a resulting polygon would look like in yellow for the top chunk and blue for the bottom chunk. I'm claiming any pair of overlapping polygons between the set from the top and from the bottom can be merged (dissolved) to fix the issue. You further know which polygons fit this criteria quickly because they will cross the horizontal line that divides the unbuffered chunks (or vertical line in other cases). If this works it'd actually be nice to put in gdal_polygonize :) (FYI this script is basically a gdal_polygonize wrapper at the moment, so it might be worth benchmarking against!)

m-mohr · 2024-10-30T13:20:22Z

Let's take this in multiple steps.

I've started to speed up the code without removing the windowed approach in #50 - See the PR for details. This should help with further work steps as it's just speedier.

I've started to realize that the potential issues that I saw in the files were due to the simplification, but if we resolve the split before simplification, I guess it could be fine. I'm not sure about the buffer approach as it may lead to other issues down the line, e.g. merging fields that don't belong together.

calebrob6 · 2024-10-30T13:39:35Z

I'm not sure about the buffer approach as it may lead to other issues down the line, e.g. merging fields that don't belong together.

I don't think so -- you are not buffering any of the polygons themselves, just the windows in which you are running polygonization over.

I've started to realize that the potential issues that I saw in the files were due to the simplification, but if we resolve the split before simplification, I guess it could be fine

Simplification will cause the fields that are split up by the windowing to not be touching anymore.

m-mohr · 2024-10-30T13:47:30Z

Oh, I didn't realize that you meant to buffer the chunks only. Makes more sense then.

Yes, that's what I meant to say with regards to simplification.

@cholmes What does the following mean?

Logic for warnings specific to when you want to polygonize

cholmes · 2024-10-30T16:49:01Z

@cholmes What does the following mean?

Logic for warnings specific to when you want to polygonize

That was copied from a note @calebrob6 put in slack. I think it likely comes sorta naturally - originally polygonize was a flag on top of inference, so it meant that if you had an error in polygonize then it'd just bubble up with the 'inference' ones. So I take this item to mean just 'raise warnings for users'. Not sure if @calebrob6 had some particular things he wanted to warn about, but we can likely just check it.

cholmes · 2024-10-30T18:30:36Z

Let's close this out when #51 merges, and just break the remaining check marks out into their own issues.

m-mohr · 2024-10-30T19:02:00Z

@cholmes I've opened separate issues for most items, but I'm not sure whether we really need

Alternative polygonization algorithms -> Isn't fixing Polygonize: Geometries split due to windowed approach #54 enough? We only really need another algorithm if we can't solve Polygonize: Geometries split due to windowed approach #54, I think.
Option to reproject? Like to put the output into EPSG:4326 -> Is this really needed or can we just provide a snippet that uses ogr2ogr for CRS transformations?

cholmes · 2024-10-30T20:23:09Z

Alternative polygonization algorithms -> Isn't fixing #54 enough? We only really need another algorithm if we can't solve #54, I think.

I think the idea was that there could be more specialized approaches that are optimized towards field boundaries or even how this dataset is done. Indeed there's likely even approaches of trying to get polygons directly from inference / using ML in the generation of the polygons. I don't think we need to do anything now, just put an issue to say that we may explore others in the future. @calebrob6 may have some specific ideas here.

Option to reproject? Like to put the output into EPSG:4326 -> Is this really needed or can we just provide a snippet that uses ogr2ogr for CRS transformations?

Yeah, probably not really needed. Having an example and just calling out that the output is going to be in a UTM projection would be good though.

cholmes added CLI polygonize inference help wanted Extra attention is needed labels Oct 22, 2024

cholmes mentioned this issue Oct 24, 2024

Polygonize improvements #31

Merged

m-mohr linked a pull request Oct 25, 2024 that will close this issue

Polygonize improvements #31

Merged

cholmes assigned m-mohr Oct 28, 2024

m-mohr closed this as completed in #31 Oct 28, 2024

m-mohr reopened this Oct 28, 2024

m-mohr mentioned this issue Oct 29, 2024

Basic tests for fiboa inference commands #38 #48

Merged

This was linked to pull requests Oct 30, 2024

Further improvements for polygonize command #51

Merged

Speedup polygonize #27 #50

Merged

m-mohr closed this as completed in #50 Oct 30, 2024

m-mohr reopened this Oct 30, 2024

m-mohr closed this as completed in #51 Oct 30, 2024

This was referenced Oct 30, 2024

Polygonize: Create fiboa compliant output #53

Closed

Polygonize: Geometries split due to windowed approach #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polygonize potential improvements #27

Polygonize potential improvements #27

cholmes commented Oct 22, 2024 •

edited by m-mohr

Loading

cholmes commented Oct 28, 2024

m-mohr commented Oct 29, 2024 •

edited

Loading

calebrob6 commented Oct 29, 2024

m-mohr commented Oct 29, 2024 •

edited

Loading

calebrob6 commented Oct 29, 2024 •

edited

Loading

m-mohr commented Oct 29, 2024 •

edited

Loading

m-mohr commented Oct 29, 2024 •

edited

Loading

calebrob6 commented Oct 29, 2024

cholmes commented Oct 29, 2024

m-mohr commented Oct 30, 2024 •

edited

Loading

calebrob6 commented Oct 30, 2024

calebrob6 commented Oct 30, 2024 •

edited

Loading

m-mohr commented Oct 30, 2024 •

edited

Loading

calebrob6 commented Oct 30, 2024

m-mohr commented Oct 30, 2024 •

edited

Loading

cholmes commented Oct 30, 2024

cholmes commented Oct 30, 2024

m-mohr commented Oct 30, 2024 •

edited

Loading

cholmes commented Oct 30, 2024 •

edited

Loading

Polygonize potential improvements #27

Polygonize potential improvements #27

Comments

cholmes commented Oct 22, 2024 • edited by m-mohr Loading

cholmes commented Oct 28, 2024

m-mohr commented Oct 29, 2024 • edited Loading

calebrob6 commented Oct 29, 2024

m-mohr commented Oct 29, 2024 • edited Loading

calebrob6 commented Oct 29, 2024 • edited Loading

m-mohr commented Oct 29, 2024 • edited Loading

m-mohr commented Oct 29, 2024 • edited Loading

calebrob6 commented Oct 29, 2024

cholmes commented Oct 29, 2024

m-mohr commented Oct 30, 2024 • edited Loading

calebrob6 commented Oct 30, 2024

calebrob6 commented Oct 30, 2024 • edited Loading

m-mohr commented Oct 30, 2024 • edited Loading

calebrob6 commented Oct 30, 2024

m-mohr commented Oct 30, 2024 • edited Loading

cholmes commented Oct 30, 2024

cholmes commented Oct 30, 2024

m-mohr commented Oct 30, 2024 • edited Loading

cholmes commented Oct 30, 2024 • edited Loading

cholmes commented Oct 22, 2024 •

edited by m-mohr

Loading

m-mohr commented Oct 29, 2024 •

edited

Loading

m-mohr commented Oct 29, 2024 •

edited

Loading

calebrob6 commented Oct 29, 2024 •

edited

Loading

m-mohr commented Oct 29, 2024 •

edited

Loading

m-mohr commented Oct 29, 2024 •

edited

Loading

m-mohr commented Oct 30, 2024 •

edited

Loading

calebrob6 commented Oct 30, 2024 •

edited

Loading

m-mohr commented Oct 30, 2024 •

edited

Loading

m-mohr commented Oct 30, 2024 •

edited

Loading

m-mohr commented Oct 30, 2024 •

edited

Loading

cholmes commented Oct 30, 2024 •

edited

Loading