Issue #708 Refactor dataset initialization #722

JoerivanEngelen · 2024-01-05T13:30:47Z

The changeset is incomplete, as it is not rolled out for all packages yet. Therefore tests will fail, hence the draft status. This to make the review process more focused, reviewers are asked to provide feedback on the approach.

Changes:

Variable names are inferred based on what is provided to locals(). This makes it possible to avoid having to manually assign variables in the init of each specific package.
I had to add one function to work around this issue: xugrid.merge does not support merging list of dicts wheares xarray.merge does xugrid#179
Grids are merged with an exact join, this to avoid that xarray joins variable coordinates in an unwanted way when dataarrays are inconsistent, which causes issues like The SpecificStorage class returns the wrong error message for a mismatch in dims #674
Refactored the riv unit tests to use pytest cases, so that tests (without the xr.merge) are run for both unstructured and structured data.

Update:

Variable names are not inferred with locals() anymore, but instead with a pkg_init decorator.

luitjansl

interesting! I see advantages to the approach. As put out in my comments, I would hope that it can be combined with a reduction of metadata in the source files.

imod/mf6/pkgbase.py

imod/mf6/riv.py

imod/mf6/pkgbase.py

JoerivanEngelen

Thanks! I for now implemented the wrapper idea, I think it nicely cleans up the code. My only slight doubt is the disadvantage which my initial implemention also had: it appears as arguments / keyword arguments are unused.

imod/mf6/pkgbase.py

imod/mf6/riv.py

JoerivanEngelen · 2024-01-10T14:02:40Z

Just as an extra check to see if xr.merge causes any loss of performance in this case as reported in: #736 (comment) . Apparently with nicely overlapping coordinates, merging is not such a slow process and the changes made in this PR should only (very slightly) enhance performance.

# %% import
import numpy as np
import xarray as xr


# %% functions
def make_da(nlay, nrow, ncol):
    shape = nlay, nrow, ncol

    dx = 10.0
    dy = -10.0
    xmin = 0.0
    xmax = dx * ncol
    ymin = 0.0
    ymax = abs(dy) * nrow
    dims = ("layer", "y", "x")

    layer = np.arange(1, nlay + 1)
    y = np.arange(ymax, ymin, dy) + 0.5 * dy
    x = np.arange(xmin, xmax, dx) + 0.5 * dx
    coords = {"layer": layer, "y": y, "x": x}

    return xr.DataArray(
        data=np.ones(shape, dtype=float),
        dims=dims,
        coords=coords,
    )


def make_big_data_dict(nvar=4, nlay=3, nrow=1000, ncol=1000):
    varnames = [f"var{i}" for i in np.arange(nvar)]
    return {var: make_da(nlay, nrow, ncol) for var in varnames}


def ds_from_merge_exact(d):
    return xr.merge([d], join="exact")


def ds_from_merge_default(d):
    return xr.merge([d])


def ds_assigned(d):
    ds = xr.Dataset()
    for key, value in d.items():
        ds[key] = value

    return ds


# %% Create data dict

d = make_big_data_dict()

# %% Benchmark
# Benchmark in ipython/Jupyter

# %timeit ds_exact = ds_from_merge_exact(d)
## > 424 µs ± 3.53 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# %timeit ds_default = ds_from_merge_default(d)
## > 864 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# %timeit ds = ds_assigned(d)
## > 1.75 ms ± 42.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

imod/typing/grid.py

imod/mf6/pkgbase.py

imod/mf6/riv.py

luitjansl

looks good, some questions and comments

imod/mf6/pkgbase.py

imod/mf6/boundary_condition.py

imod/mf6/riv.py

This reverts commit 29e4362.

This reverts commit 10c6991.

JoerivanEngelen · 2024-01-23T16:01:42Z

UPDATE:

Though the pkg_init decorator made everything look nice and clean, I reverted using it as it had three downsides:

Upon failure the decorator threw cryptic errors messages
It appears as if arguments are unused (also causing linting errors)
It complicates our codebase more, potentially scaring off potential new developers

Next to that, I had stackoverflow errors for certain tests, but those probably could be fixed with some work.

For now, I think we are better off forwarding arguments in dictionaries, as I couldn't really find a solution which would fix everything immediately.

Relevant stackoverflow issues:

Based on these stackoverflow answers, I see three potential solutions, all which I'm not fully happy with:

1. Explicitly

class River
    def __init__(
        self,
        stage,
        conductance,
        ...
    ):
        dict_dataset = {
            "stage": stage,
            "conductance": conductance,
            ...
        }
        super(type(self), self).__init__(dict_dataset)

This has the advantage that its explicit and easy to follow, but there is code duplication.

2. Implicitly

class River
    def __init__(
        self,
        stage,
        conductance,
        ...
    ):
        super(type(self), self).__init__(locals())

This can be made fancier and more precise by using the inspect module instead of locals(), similar to the pkg_init decorator. Regardless of inspect or locals(), there is no duplicate code, but it appears as the arguments are unused.

3. With kwargs

class River
    def __init__(
        self,
        **kwargs,
    ):
        super(type(self), self).__init__(kwargs)

This is the more classic pythonic approach, Matplotlib also applies this for some functions. However, there is no input checking anymore and it is hard to tell which arguments are expected in an IDE (e.g. autocomplete), which only can be inferred from the docstring. Also as @Manangka mentioned: We cannot do any type hinting in this case.

luitjansl

looks good

luitjansl · 2024-01-30T14:26:19Z

imod/mf6/boundary_condition.py

+            # Remove vars inplace
+            del self.dataset["concentration"]
+            del self.dataset["concentration_boundary_type"]
+        else:


if concentration is not present in allargs.keys(), should we do the expand_transient_auxiliary_variables?

No this was purely required if concentration was present. (At least that was the old behaviour). Changing that behaviour is out of scope here.

but now expand_transient_auxiliary_variables(self) is in the else branch. It will do it if concentration is not present in allargs.keys()

JoerivanEngelen added 5 commits January 4, 2024 16:03

start trying out some first ways to fetch variables from arguments

18ea908

Move logic to get variable names to separate method

0cdaf2b

Fix typing

55c4ddf

Refactor cases to add unstructured tests

55a55b7

Move merge at init to pkgbase

3263023

JoerivanEngelen requested review from luitjansl and Manangka January 5, 2024 13:30

Merge branch 'master' into 708_refactor_dataset_init

afae3a5

luitjansl reviewed Jan 5, 2024

View reviewed changes

imod/mf6/pkgbase.py Outdated Show resolved Hide resolved

imod/mf6/riv.py Outdated Show resolved Hide resolved

Manangka requested changes Jan 8, 2024

View reviewed changes

imod/mf6/riv.py Outdated Show resolved Hide resolved

imod/mf6/pkgbase.py Outdated Show resolved Hide resolved

imod/mf6/pkgbase.py Outdated Show resolved Hide resolved

imod/mf6/pkgbase.py Outdated Show resolved Hide resolved

JoerivanEngelen added 6 commits January 9, 2024 15:24

Stricter typechecking and improve docstring

1f0e11a

Move merging of dicts to typing.grid module

acb0937

Add first version with wrapped pkg_init

962b5d4

Clarify decorator and remove unused method

9d8e3c5

Move concentration checking to BoundaryCondition __init__

d3d52b8

explicitly name argument

c71081d

JoerivanEngelen commented Jan 10, 2024

View reviewed changes

imod/mf6/pkgbase.py Outdated Show resolved Hide resolved

imod/mf6/pkgbase.py Outdated Show resolved Hide resolved

imod/mf6/pkgbase.py Outdated Show resolved Hide resolved

imod/mf6/pkgbase.py Outdated Show resolved Hide resolved

imod/mf6/riv.py Outdated Show resolved Hide resolved

Update changelog

bb68a38

Manangka requested changes Jan 11, 2024

View reviewed changes

imod/typing/grid.py Show resolved Hide resolved

imod/mf6/pkgbase.py Outdated Show resolved Hide resolved

imod/mf6/riv.py Show resolved Hide resolved

Manangka approved these changes Jan 11, 2024

View reviewed changes

luitjansl approved these changes Jan 16, 2024

View reviewed changes

imod/mf6/pkgbase.py Outdated Show resolved Hide resolved

imod/mf6/pkgbase.py Outdated Show resolved Hide resolved

imod/mf6/pkgbase.py Outdated Show resolved Hide resolved

imod/mf6/boundary_condition.py Outdated Show resolved Hide resolved

imod/mf6/riv.py Outdated Show resolved Hide resolved

JoerivanEngelen added 6 commits January 17, 2024 13:26

Remove unused constant

5aa03e9

Start applying new decorator to packages

10c6991

Revert change

29e4362

Revert "Revert change"

3aefd98

This reverts commit 29e4362.

Revert "Start applying new decorator to packages"

b3b8772

This reverts commit 10c6991.

Change package init to just forwarding arguments as dictionary

6e9d78e

remove wrapper

6c5d89a

JoerivanEngelen added 12 commits January 25, 2024 16:54

Merge master into 708_refactor_dataset_init

e750f70

Properly align coords in fixture

4563b14

Fix erronous reference to self.dataset before its instantiation.

602480d

Move _check_options after self.dataset has been instantiatiated.

4465de5

Properly copy dx and dy coord before initiating dataset

a48b265

Add TODO

deb913a

Check if has structured grid instead of xr.DataArray

c73d5df

Work around xugrid issue 206

c7e671f

Ensure idomain and concentration are aligned

be7d308

Fix typo

d499ccf

Align timeseries with outer join

3c78419

align timeseries before initializing OutletWeir

b6e0112

JoerivanEngelen changed the title ~~Draft: Issue #708 Refactor dataset initialization~~ Issue #708 Refactor dataset initialization Jan 29, 2024

JoerivanEngelen added 6 commits January 30, 2024 14:22

Align times before inserting initializing class

178a142

merge master into 708_refactor_dataset_init

21fd00f

Update call to proper function

dae088a

Initialize gwfgwt following new method

5380e1f

Remove unused nr_indices variable

62333c9

run black

dabc7a5

Manangka approved these changes Jan 30, 2024

View reviewed changes

luitjansl approved these changes Jan 30, 2024

View reviewed changes

JoerivanEngelen added 2 commits January 30, 2024 16:06

Add type annotation to __init__ and make allargs required arg

c713a12

Update initialization of dummy object

ee7b515

JoerivanEngelen enabled auto-merge January 30, 2024 15:13

JoerivanEngelen added this pull request to the merge queue Jan 30, 2024

Merged via the queue into master with commit da5b8f1 Jan 30, 2024
3 checks passed

JoerivanEngelen deleted the 708_refactor_dataset_init branch January 30, 2024 18:09

Huite mentioned this pull request Feb 14, 2024

simulation.regrid_like seems excessively slow + a hypothesis #844

Open

luitjansl mentioned this pull request Feb 16, 2024

Issue #772 set repeat stress #850

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue #708 Refactor dataset initialization #722

Issue #708 Refactor dataset initialization #722

JoerivanEngelen commented Jan 5, 2024 •

edited

Loading

luitjansl left a comment

JoerivanEngelen left a comment •

edited

Loading

JoerivanEngelen commented Jan 10, 2024 •

edited

Loading

luitjansl left a comment

JoerivanEngelen commented Jan 23, 2024 •

edited

Loading

luitjansl left a comment

luitjansl Jan 30, 2024

JoerivanEngelen Jan 30, 2024 •

edited

Loading

luitjansl Jan 30, 2024

Issue #708 Refactor dataset initialization #722

Issue #708 Refactor dataset initialization #722

Conversation

JoerivanEngelen commented Jan 5, 2024 • edited Loading

luitjansl left a comment

Choose a reason for hiding this comment

JoerivanEngelen left a comment • edited Loading

Choose a reason for hiding this comment

JoerivanEngelen commented Jan 10, 2024 • edited Loading

luitjansl left a comment

Choose a reason for hiding this comment

JoerivanEngelen commented Jan 23, 2024 • edited Loading

luitjansl left a comment

Choose a reason for hiding this comment

luitjansl Jan 30, 2024

Choose a reason for hiding this comment

JoerivanEngelen Jan 30, 2024 • edited Loading

Choose a reason for hiding this comment

luitjansl Jan 30, 2024

Choose a reason for hiding this comment

JoerivanEngelen commented Jan 5, 2024 •

edited

Loading

JoerivanEngelen left a comment •

edited

Loading

JoerivanEngelen commented Jan 10, 2024 •

edited

Loading

JoerivanEngelen commented Jan 23, 2024 •

edited

Loading

JoerivanEngelen Jan 30, 2024 •

edited

Loading