-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use an exact join to initiate Package datasets #708
Milestone
Comments
This might be a blocker to create UgridDatasets: Deltares/xugrid#179, but we presumably can work around it. |
Just as an extra check to see if xr.merge causes any loss of performance in this case as reported in: #736 (comment) . Apparently with nicely overlapping coordinates, merging is not such a slow process and the changes made in this PR should only (very slightly) enhance performance. # %% import
import numpy as np
import xarray as xr
# %% functions
def make_da(nlay, nrow, ncol):
shape = nlay, nrow, ncol
dx = 10.0
dy = -10.0
xmin = 0.0
xmax = dx * ncol
ymin = 0.0
ymax = abs(dy) * nrow
dims = ("layer", "y", "x")
layer = np.arange(1, nlay + 1)
y = np.arange(ymax, ymin, dy) + 0.5 * dy
x = np.arange(xmin, xmax, dx) + 0.5 * dx
coords = {"layer": layer, "y": y, "x": x}
return xr.DataArray(
data=np.ones(shape, dtype=float),
dims=dims,
coords=coords,
)
def make_big_data_dict(nvar=4, nlay=3, nrow=1000, ncol=1000):
varnames = [f"var{i}" for i in np.arange(nvar)]
return {var: make_da(nlay, nrow, ncol) for var in varnames}
def ds_from_merge_exact(d):
return xr.merge([d], join="exact")
def ds_from_merge_default(d):
return xr.merge([d])
def ds_assigned(d):
ds = xr.Dataset()
for key, value in d.items():
ds[key] = value
return ds
# %% Create data dict
d = make_big_data_dict()
# %% Benchmark
# Benchmark in ipython/Jupyter
# %timeit ds_exact = ds_from_merge_exact(d)
## > 424 µs ± 3.53 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# %timeit ds_default = ds_from_merge_default(d)
## > 864 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# %timeit ds = ds_assigned(d)
## > 1.75 ms ± 42.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) |
github-merge-queue bot
pushed a commit
that referenced
this issue
Jan 30, 2024
Fix #708 The changeset is incomplete, as it is not rolled out for all packages yet. Therefore tests will fail, hence the draft status. This to make the review process more focused, reviewers are asked to provide feedback on the approach. Changes: - Variable names are inferred based on what is provided to locals(). This makes it possible to avoid having to manually assign variables in the __init__ of each specific package. - I had to add one function to work around this issue: Deltares/xugrid#179 - Grids are merged with an exact join, this to avoid that xarray joins variable coordinates in an unwanted way when dataarrays are inconsistent, which causes issues like #674 - Refactored the riv unit tests to use pytest cases, so that tests (without the xr.merge) are run for both unstructured and structured data. Update: - Variable names are not inferred with locals() anymore, but instead with a ``pkg_init`` decorator.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We currently initialize the package dataset by first instantiating a
xr.Dataset
orxu.UgridDataset
and then updating this each time by assigning new DataArrays to it. This has the disadvantage that we do not have control over how coordinates should be joined, as the default value forjoin
is used, which is"outer"
:Call to merge_core, no option for join
Definition merge_core with default value
iMOD Python expects its dataset to have exactly joined coordinates. To have more control over this, we should consider initializing package datasets with
xr.merge
, which allows us to set thejoin
method toexact
. This will the following error:It's not entirely perfect, as the name of the erronous variable is not printed (we do have
idomain
, so we know what the coordinates should look like). But it ensures that the thing we implicitly expect as a dataset is created correctly. Otherwise we run into issues like:#674
UPDATE (03/01/2024):
As discussed with @Manangka
xr.merge
creates a Dataset and handles settings (scalar variables) gracefully. This should be the first thing done upon package initialization.imod.merge
function which typedispatches UgridDataArray to work around issue xugrid.merge does not support merging list of dicts wheares xarray.merge does xugrid#179. But better to resolve this inxugrid
upstream.to_dataset
might do this (but xarray also call this behind the scenes when assigning a DataArray to a dataset, so might not?)The text was updated successfully, but these errors were encountered: