Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreading i overlay, dissolve. Gridlooper #188

Merged
merged 67 commits into from
Feb 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
17c774f
try to fix git deletion
mortewle Aug 22, 2023
b940718
merge
mortewle Aug 22, 2023
a707ace
after git fix
mortewle Aug 22, 2023
55e58b1
imports
mortewle Aug 22, 2023
e99333e
Merge branch 'main' of https://github.com/statisticsnorway/ssb-sgis i…
mortewle Jan 9, 2024
ddb9b9e
merge
mortewle Jan 9, 2024
9d3da70
dont use snap_polygons yet
mortewle Jan 9, 2024
36ea9b1
dont use snap_polygons yet
mortewle Jan 9, 2024
0704d39
dont allow scalar raster
mortewle Jan 9, 2024
4106110
first refactor
mortewle Jan 9, 2024
fc12045
merge origin
mortewle Jan 10, 2024
c74a897
move gradient to base raster class
mortewle Jan 11, 2024
cc7884e
concat argument to gridlooper
mortewle Jan 11, 2024
4d89f69
add decimals to measure control
mortewle Jan 19, 2024
58fb472
pytorch integration
mortewle Jan 21, 2024
3c362c6
move to_bbox
mortewle Jan 22, 2024
2f475e1
rename fs to filesystem
mortewle Jan 22, 2024
1220d50
fix show when kwargs
mortewle Jan 22, 2024
9cd3944
file_client arg
mortewle Jan 22, 2024
0fd3cf2
max rows per intersection
mortewle Jan 22, 2024
ea0353d
only change geom_type if same in df1 and df2
mortewle Jan 22, 2024
5624af7
update
mortewle Jan 22, 2024
52de78a
overlay for chunks
mortewle Jan 22, 2024
d18fcaa
add some cleaning
mortewle Jan 22, 2024
f043f96
add simplify to check correctness because of geos incorrectness
mortewle Jan 22, 2024
b093bb4
clean up mess
mortewle Jan 22, 2024
6c60581
print path
mortewle Jan 22, 2024
f8f3852
version
mortewle Jan 22, 2024
8cba10f
parallel clipping to municipality borders
mortewle Jan 29, 2024
22ea337
snapping working, but slow
mortewle Jan 30, 2024
83781c9
remove unnessecary deps
mortewle Feb 1, 2024
531e92b
torchgeo bbox
mortewle Feb 1, 2024
258be46
messy snap
mortewle Feb 1, 2024
cf2f56a
no biggie
mortewle Feb 1, 2024
274c454
sort indices
mortewle Feb 1, 2024
9005387
black
mortewle Feb 1, 2024
a975da3
fallback: make geometrycollection to single-typed
mortewle Feb 1, 2024
8c6f7a2
try_difference from overlay
mortewle Feb 1, 2024
8473f3b
dont allow multipart by default, return if not len, make_valid
mortewle Feb 1, 2024
e5edc58
parallel intersection chunked
mortewle Feb 1, 2024
87767c5
tests
mortewle Feb 1, 2024
7d2c0fb
raster stillbilde
mortewle Feb 1, 2024
a462df5
update docs
mortewle Feb 1, 2024
c4534b8
docs
mortewle Feb 1, 2024
4e4d439
del
mortewle Feb 1, 2024
d6fc218
get back changes from 2 commits back...
mortewle Feb 1, 2024
e10e88c
Merge branch 'main' of https://github.com/statisticsnorway/ssb-sgis i…
mortewle Feb 1, 2024
59c5535
version
mortewle Feb 1, 2024
e6c945a
update
mortewle Feb 1, 2024
f9eceba
raster
mortewle Feb 1, 2024
337f571
better error message
mortewle Feb 6, 2024
344a18a
multithreading in shapely functions
mortewle Feb 7, 2024
5a78586
dask
mortewle Feb 7, 2024
769ec75
not test cube
mortewle Feb 7, 2024
3eeb5d1
update
mortewle Feb 7, 2024
95fbea7
fix pytest errors
mortewle Feb 8, 2024
0b65e9e
pandas==2.0.3
mortewle Feb 8, 2024
12b87d9
sort values
mortewle Feb 8, 2024
730b6f2
not test raster
mortewle Feb 8, 2024
e9cf879
remove unreachable
mortewle Feb 8, 2024
ae9d4c0
fix raster
mortewle Feb 8, 2024
8e95283
remove equal blocks
mortewle Feb 8, 2024
a183b06
fix cleaning tests
mortewle Feb 8, 2024
db13e40
half fix cube tests
mortewle Feb 8, 2024
a048909
try new yml syntax
mortewle Feb 9, 2024
1a59d19
try new yml syntax
mortewle Feb 9, 2024
33881bd
try new yml syntax
mortewle Feb 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ jobs:
- name: Run pytest with coverage
if: ${{ (matrix.os != 'ubuntu-latest') }}
run: |
poetry run pytest -k "not tests/test_raster.py" --verbose --durations=5
poetry run pytest -k "not raster and not cube" --verbose --durations=5

- name: Run raster pytest with coverage
if: ${{ (matrix.os == 'ubuntu-latest') }}
Expand Down
1 change: 0 additions & 1 deletion docs/reference/io/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,3 @@ Functions for reading and writing geodata.

dapla
read_parquet_url
write_municipality_data
7 changes: 0 additions & 7 deletions docs/reference/io/write_municipality_data.rst

This file was deleted.

7 changes: 0 additions & 7 deletions docs/reference/raster/elevationraster.rst

This file was deleted.

1 change: 0 additions & 1 deletion docs/reference/raster/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,3 @@ Class for raster analysis from image files, arrays and GeoDataFrames.
:maxdepth: 3

raster
elevationraster
Binary file added future-0.18.3-py3-none-any.whl
Binary file not shown.
2,473 changes: 1,325 additions & 1,148 deletions poetry.lock

Large diffs are not rendered by default.

15 changes: 8 additions & 7 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "ssb-sgis"
version = "0.3.11"
version = "0.3.12"
description = "GIS functions used at Statistics Norway."
authors = ["Statistics Norway <ort@ssb.no>"]
license = "MIT"
Expand All @@ -26,18 +26,19 @@ mapclassify = ">=2.5.0"
matplotlib = ">=3.7.0"
networkx = ">=3.0"
numpy = ">=1.24.2"
pandas = ">=1.5.3"
pandas = "2.0.3"
pyarrow = ">=11.0.0"
requests = ">=2.28.2"
scikit-learn = ">=1.2.1"
shapely = ">=2.0.1"
xyzservices = ">=2023.2.0"
jenkspy = ">=0.3.2"
ipython = ">=8.13.2"
rtree = "^1.0.1"
geocoder = "^1.38.1"
rasterio = "^1.3.8"
rtree = ">=1.0.1"
geocoder = ">=1.38.1"
rasterio = ">=1.3.8"
pip = "23.2.1"
dask = ">=2024.1.1"

[tool.poetry.group.dev.dependencies]
black = {extras = ["d", "jupyter"], version = ">=23.1.0"}
Expand Down Expand Up @@ -97,8 +98,8 @@ notebook_metadata_filter = "jupytext.text_representation,-jupytext.text_represen
cell_metadata_filter = "-all"

[tool.pytest.ini_options]
pythonpath = [
"src"
pythonpath = [
"src"
]

[build-system]
Expand Down
25 changes: 20 additions & 5 deletions src/sgis/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
config = {
"n_jobs": 1,
}

import sgis.raster.bands as bands
import sgis.raster.indices as indices
from sgis.raster.raster import Raster, get_shape_from_bounds, get_transform_from_bounds

from .geopandas_tools.bounds import (
Gridlooper,
bounds_to_points,
Expand All @@ -8,7 +16,6 @@
make_grid_from_bbox,
make_ssb_grid,
points_in_bounds,
to_bbox,
)
from .geopandas_tools.buffer_dissolve_explode import (
buff,
Expand All @@ -22,12 +29,17 @@
from .geopandas_tools.cleaning import (
coverage_clean,
remove_spikes,
snap_polygons,
snap_to_mask,
split_and_eliminate_by_longest,
split_by_neighbors,
split_spiky_polygons,
)
from .geopandas_tools.conversion import (
coordinate_array,
from_4326,
to_4326,
to_bbox,
to_gdf,
to_geoseries,
to_shapely,
Expand Down Expand Up @@ -119,13 +131,16 @@
)
from .networkanalysis.traveling_salesman import traveling_salesman_problem
from .parallel.parallel import Parallel
from .raster.elevationraster import ElevationRaster
from .raster.raster import Raster
from .raster.sentinel import Sentinel2
from .raster.cube import DataCube


try:
import sgis.raster.torchgeo as torchgeo
except ImportError:
pass


try:
from .io.dapla_functions import check_files, read_geopandas, write_geopandas
from .io.write_municipality_data import write_municipality_data
except ImportError:
pass
77 changes: 13 additions & 64 deletions src/sgis/geopandas_tools/bounds.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
from shapely.geometry import Polygon

from ..parallel.parallel import Parallel
from .conversion import to_gdf
from .conversion import to_bbox, to_gdf
from .general import clean_clip, is_bbox_like


Expand Down Expand Up @@ -93,17 +93,20 @@ class Gridlooper:
gridsize: int
mask: GeoDataFrame | GeoSeries | Geometry
gridbuffer: int = 0
parallelizer: Parallel | None = None
concat: bool = False
clip: bool = True
keep_geom_type: bool = True
verbose: bool = False
parallelizer: Parallel | None = None

def __post_init__(self):
if not isinstance(self.mask, GeoDataFrame):
self.mask = to_gdf(self.mask)

def run(self, func: Callable, *args, **kwargs):
intersects_mask = lambda df: df.index.isin(df.sjoin(self.mask).index)
def intersects_mask(df):
return df.index.isin(df.sjoin(self.mask).index)

grid: GeoSeries = (
make_grid(self.mask, gridsize=self.gridsize).loc[intersects_mask].geometry
)
Expand All @@ -123,15 +126,19 @@ def run(self, func: Callable, *args, **kwargs):
)
results = self.parallelizer.map(func_with_clip, buffered_grid)
if not self.gridbuffer or not self.clip:
return results
return (
results
if not self.concat
else pd.concat(results, ignore_index=True)
)
out = []
for cell_res, unbuffered in zip(results, grid, strict=True):
out.append(
_clip_back_to_unbuffered_grid(
cell_res, unbuffered, self.keep_geom_type
)
)
return out
return out if not self.concat else pd.concat(out, ignore_index=True)

results = []
for i, (unbuffered, buffered) in enumerate(zip(grid, buffered_grid)):
Expand Down Expand Up @@ -159,7 +166,7 @@ def run(self, func: Callable, *args, **kwargs):
if self.verbose:
print(f"Done with {i+1} of {n} grid cells", end="\r")

return results
return results if not self.concat else pd.concat(results, ignore_index=True)


def gridloop(
Expand Down Expand Up @@ -639,64 +646,6 @@ def bounds_to_points(
return gdf


def to_bbox(
obj: GeoDataFrame | GeoSeries | Geometry | Collection | Mapping,
) -> tuple[float, float, float, float]:
"""Returns 4-length tuple of bounds if possible, else raises ValueError.

Args:
obj: Object to be converted to bounding box. Can be geopandas or shapely
objects, iterables of exactly four numbers or dictionary like/class
with a the keys/attributes "minx", "miny", "maxx", "maxy" or
"xmin", "ymin", "xmax", "ymax".
"""
if isinstance(obj, (GeoDataFrame, GeoSeries)):
return tuple(obj.total_bounds)
if isinstance(obj, Geometry):
return tuple(obj.bounds)
if (
hasattr(obj, "__iter__")
and len(obj) == 4
and all(isinstance(x, numbers.Number) for x in obj)
):
return tuple(obj)

if is_dict_like(obj) and all(x in obj for x in ["minx", "miny", "maxx", "maxy"]):
try:
minx = np.min(obj["minx"])
miny = np.min(obj["miny"])
maxx = np.max(obj["maxx"])
maxy = np.max(obj["maxy"])
except TypeError:
minx = np.min(obj.minx)
miny = np.min(obj.miny)
maxx = np.max(obj.maxx)
maxy = np.max(obj.maxy)
return minx, miny, maxx, maxy
if is_dict_like(obj) and all(x in obj for x in ["xmin", "ymin", "xmax", "ymax"]):
try:
xmin = np.min(obj["xmin"])
ymin = np.min(obj["ymin"])
xmax = np.max(obj["xmax"])
ymax = np.max(obj["ymax"])
except TypeError:
xmin = np.min(obj.xmin)
ymin = np.min(obj.ymin)
xmax = np.max(obj.xmax)
ymax = np.max(obj.ymax)
return xmin, ymin, xmax, ymax
if is_dict_like(obj) and hasattr(obj, "geometry"):
try:
return tuple(GeoSeries(obj["geometry"]).total_bounds)
except Exception:
return tuple(GeoSeries(obj.geometry).total_bounds)
try:
of_length = f" of length {len(obj)}"
except TypeError:
of_length = ""
raise TypeError(f"Cannot convert type {obj.__class__.__name__}{of_length} to bbox")


def get_total_bounds(
*geometries: GeoDataFrame | GeoSeries | Geometry,
) -> tuple[float, float, float, float]:
Expand Down
Loading
Loading