-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dissolve using dask-geopandas #313
Comments
Do you need dask-geopandas? Because if you are fine with vanilla geopandas, it will be much easier. And 200k should be perfectly fine. You need to identify connected components and dissolve by a component label. That is tricky in distributed setting. But in a single GeoDataFrame, it is easy with the help of libpysal / (or scipy only). from libpysal import graph
comp_label = graph.Graph.build_contiguity(gdf, rook=False).component_labels
gdf.dissolve(comp_label) If you know that you have a correct polygonal coverage, you can even use much faster coverage union. gdf.dissolve(comp_label, method="coverage") |
Thanks! Yes, I do need Dask since I’ll be processing millions of polygons. I added map_partitions to my function, and it worked. However, now the problem is that it’s taking a long time to transfer it to a GeoPandas DataFrame. |
|
Re your runtime (can't comment on what needs to be in a single partition): Could you try creating a cluster before you call compute? That should help with parallelising things, i.e.
That will properly parallelize things and the url is helpful to observe what's going on |
I have around 200k polygons in a shapefile, and I want to dissolve the polygons that are connected to each other. ArcGIS offers simple techniques to achieve this, but I was wondering if there are quicker ways to do it. I’ve tried the following but it took ages to execute.
The text was updated successfully, but these errors were encountered: