Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use an exact join to initiate Package datasets #708

Closed
JoerivanEngelen opened this issue Dec 21, 2023 · 2 comments · Fixed by #722
Closed

Use an exact join to initiate Package datasets #708

JoerivanEngelen opened this issue Dec 21, 2023 · 2 comments · Fixed by #722
Assignees

Comments

@JoerivanEngelen
Copy link
Contributor

JoerivanEngelen commented Dec 21, 2023

We currently initialize the package dataset by first instantiating a xr.Dataset or xu.UgridDataset and then updating this each time by assigning new DataArrays to it. This has the disadvantage that we do not have control over how coordinates should be joined, as the default value for join is used, which is "outer" :

Call to merge_core, no option for join

Definition merge_core with default value

iMOD Python expects its dataset to have exactly joined coordinates. To have more control over this, we should consider initializing package datasets with xr.merge, which allows us to set the join method to exact. This will the following error:

ValueError: cannot align objects with join='exact' where index/labels/sizes 
are not equal along these coordinates (dimensions): 'layer' ('layer',)

It's not entirely perfect, as the name of the erronous variable is not printed (we do have idomain, so we know what the coordinates should look like). But it ensures that the thing we implicitly expect as a dataset is created correctly. Otherwise we run into issues like:

#674

UPDATE (03/01/2024):
As discussed with @Manangka

  • The call to xr.merge creates a Dataset and handles settings (scalar variables) gracefully. This should be the first thing done upon package initialization.
  • We can create a imod.merge function which typedispatches UgridDataArray to work around issue xugrid.merge does not support merging list of dicts wheares xarray.merge does xugrid#179. But better to resolve this in xugrid upstream.
  • We should avoid copies of DataArrays are made senselessly. to_dataset might do this (but xarray also call this behind the scenes when assigning a DataArray to a dataset, so might not?)
@JoerivanEngelen
Copy link
Contributor Author

JoerivanEngelen commented Dec 21, 2023

This might be a blocker to create UgridDatasets: Deltares/xugrid#179, but we presumably can work around it.

@JoerivanEngelen JoerivanEngelen moved this from 🤝 Accepted to 📝Refined in iMOD Suite Jan 3, 2024
@JoerivanEngelen JoerivanEngelen moved this from 📝Refined to 🏗 In Progress in iMOD Suite Jan 4, 2024
@JoerivanEngelen JoerivanEngelen self-assigned this Jan 4, 2024
@JoerivanEngelen JoerivanEngelen moved this from 🏗 In Progress to 🧐 In Review in iMOD Suite Jan 5, 2024
@JoerivanEngelen
Copy link
Contributor Author

JoerivanEngelen commented Jan 10, 2024

Just as an extra check to see if xr.merge causes any loss of performance in this case as reported in: #736 (comment) . Apparently with nicely overlapping coordinates, merging is not such a slow process and the changes made in this PR should only (very slightly) enhance performance.

# %% import
import numpy as np
import xarray as xr


# %% functions
def make_da(nlay, nrow, ncol):
    shape = nlay, nrow, ncol

    dx = 10.0
    dy = -10.0
    xmin = 0.0
    xmax = dx * ncol
    ymin = 0.0
    ymax = abs(dy) * nrow
    dims = ("layer", "y", "x")

    layer = np.arange(1, nlay + 1)
    y = np.arange(ymax, ymin, dy) + 0.5 * dy
    x = np.arange(xmin, xmax, dx) + 0.5 * dx
    coords = {"layer": layer, "y": y, "x": x}

    return xr.DataArray(
        data=np.ones(shape, dtype=float),
        dims=dims,
        coords=coords,
    )


def make_big_data_dict(nvar=4, nlay=3, nrow=1000, ncol=1000):
    varnames = [f"var{i}" for i in np.arange(nvar)]
    return {var: make_da(nlay, nrow, ncol) for var in varnames}


def ds_from_merge_exact(d):
    return xr.merge([d], join="exact")


def ds_from_merge_default(d):
    return xr.merge([d])


def ds_assigned(d):
    ds = xr.Dataset()
    for key, value in d.items():
        ds[key] = value

    return ds


# %% Create data dict

d = make_big_data_dict()

# %% Benchmark
# Benchmark in ipython/Jupyter

# %timeit ds_exact = ds_from_merge_exact(d)
## > 424 µs ± 3.53 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# %timeit ds_default = ds_from_merge_default(d)
## > 864 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# %timeit ds = ds_assigned(d)
## > 1.75 ms ± 42.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

github-merge-queue bot pushed a commit that referenced this issue Jan 30, 2024
Fix #708

The changeset is incomplete, as it is not rolled out for all packages
yet. Therefore tests will fail, hence the draft status. This to make the
review process more focused, reviewers are asked to provide feedback on
the approach.

Changes:
- Variable names are inferred based on what is provided to locals().
This makes it possible to avoid having to manually assign variables in
the __init__ of each specific package.
- I had to add one function to work around this issue:
Deltares/xugrid#179
- Grids are merged with an exact join, this to avoid that xarray joins
variable coordinates in an unwanted way when dataarrays are
inconsistent, which causes issues like
#674
- Refactored the riv unit tests to use pytest cases, so that tests
(without the xr.merge) are run for both unstructured and structured
data.

Update:
- Variable names are not inferred with locals() anymore, but instead
with a ``pkg_init`` decorator.
@github-project-automation github-project-automation bot moved this from 🧐 In Review to ✅ Done in iMOD Suite Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

1 participant