Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #708 Refactor dataset initialization #722

Merged
merged 41 commits into from
Jan 30, 2024

Conversation

JoerivanEngelen
Copy link
Contributor

@JoerivanEngelen JoerivanEngelen commented Jan 5, 2024

Fix #708

The changeset is incomplete, as it is not rolled out for all packages yet. Therefore tests will fail, hence the draft status. This to make the review process more focused, reviewers are asked to provide feedback on the approach.

Changes:

Update:

  • Variable names are not inferred with locals() anymore, but instead with a pkg_init decorator.

Copy link
Contributor

@luitjansl luitjansl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting! I see advantages to the approach. As put out in my comments, I would hope that it can be combined with a reduction of metadata in the source files.

imod/mf6/pkgbase.py Outdated Show resolved Hide resolved
imod/mf6/riv.py Outdated Show resolved Hide resolved
imod/mf6/riv.py Outdated Show resolved Hide resolved
imod/mf6/pkgbase.py Outdated Show resolved Hide resolved
imod/mf6/pkgbase.py Outdated Show resolved Hide resolved
imod/mf6/pkgbase.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@JoerivanEngelen JoerivanEngelen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I for now implemented the wrapper idea, I think it nicely cleans up the code. My only slight doubt is the disadvantage which my initial implemention also had: it appears as arguments / keyword arguments are unused.

imod/mf6/pkgbase.py Outdated Show resolved Hide resolved
imod/mf6/pkgbase.py Outdated Show resolved Hide resolved
imod/mf6/pkgbase.py Outdated Show resolved Hide resolved
imod/mf6/pkgbase.py Outdated Show resolved Hide resolved
imod/mf6/riv.py Outdated Show resolved Hide resolved
@JoerivanEngelen
Copy link
Contributor Author

JoerivanEngelen commented Jan 10, 2024

Just as an extra check to see if xr.merge causes any loss of performance in this case as reported in: #736 (comment) . Apparently with nicely overlapping coordinates, merging is not such a slow process and the changes made in this PR should only (very slightly) enhance performance.

# %% import
import numpy as np
import xarray as xr


# %% functions
def make_da(nlay, nrow, ncol):
    shape = nlay, nrow, ncol

    dx = 10.0
    dy = -10.0
    xmin = 0.0
    xmax = dx * ncol
    ymin = 0.0
    ymax = abs(dy) * nrow
    dims = ("layer", "y", "x")

    layer = np.arange(1, nlay + 1)
    y = np.arange(ymax, ymin, dy) + 0.5 * dy
    x = np.arange(xmin, xmax, dx) + 0.5 * dx
    coords = {"layer": layer, "y": y, "x": x}

    return xr.DataArray(
        data=np.ones(shape, dtype=float),
        dims=dims,
        coords=coords,
    )


def make_big_data_dict(nvar=4, nlay=3, nrow=1000, ncol=1000):
    varnames = [f"var{i}" for i in np.arange(nvar)]
    return {var: make_da(nlay, nrow, ncol) for var in varnames}


def ds_from_merge_exact(d):
    return xr.merge([d], join="exact")


def ds_from_merge_default(d):
    return xr.merge([d])


def ds_assigned(d):
    ds = xr.Dataset()
    for key, value in d.items():
        ds[key] = value

    return ds


# %% Create data dict

d = make_big_data_dict()

# %% Benchmark
# Benchmark in ipython/Jupyter

# %timeit ds_exact = ds_from_merge_exact(d)
## > 424 µs ± 3.53 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# %timeit ds_default = ds_from_merge_default(d)
## > 864 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# %timeit ds = ds_assigned(d)
## > 1.75 ms ± 42.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

imod/typing/grid.py Show resolved Hide resolved
imod/mf6/pkgbase.py Outdated Show resolved Hide resolved
imod/mf6/riv.py Show resolved Hide resolved
Copy link
Contributor

@luitjansl luitjansl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, some questions and comments

imod/mf6/pkgbase.py Outdated Show resolved Hide resolved
imod/mf6/pkgbase.py Outdated Show resolved Hide resolved
imod/mf6/pkgbase.py Outdated Show resolved Hide resolved
imod/mf6/boundary_condition.py Outdated Show resolved Hide resolved
imod/mf6/riv.py Outdated Show resolved Hide resolved
@JoerivanEngelen
Copy link
Contributor Author

JoerivanEngelen commented Jan 23, 2024

UPDATE:

Though the pkg_init decorator made everything look nice and clean, I reverted using it as it had three downsides:

  • Upon failure the decorator threw cryptic errors messages
  • It appears as if arguments are unused (also causing linting errors)
  • It complicates our codebase more, potentially scaring off potential new developers

Next to that, I had stackoverflow errors for certain tests, but those probably could be fixed with some work.

For now, I think we are better off forwarding arguments in dictionaries, as I couldn't really find a solution which would fix everything immediately.

Relevant stackoverflow issues:

Based on these stackoverflow answers, I see three potential solutions, all which I'm not fully happy with:

1. Explicitly

class River
    def __init__(
        self,
        stage,
        conductance,
        ...
    ):
        dict_dataset = {
            "stage": stage,
            "conductance": conductance,
            ...
        }
        super(type(self), self).__init__(dict_dataset)

This has the advantage that its explicit and easy to follow, but there is code duplication.

2. Implicitly

class River
    def __init__(
        self,
        stage,
        conductance,
        ...
    ):
        super(type(self), self).__init__(locals())

This can be made fancier and more precise by using the inspect module instead of locals(), similar to the pkg_init decorator. Regardless of inspect or locals(), there is no duplicate code, but it appears as the arguments are unused.

3. With kwargs

class River
    def __init__(
        self,
        **kwargs,
    ):
        super(type(self), self).__init__(kwargs)

This is the more classic pythonic approach, Matplotlib also applies this for some functions. However, there is no input checking anymore and it is hard to tell which arguments are expected in an IDE (e.g. autocomplete), which only can be inferred from the docstring. Also as @Manangka mentioned: We cannot do any type hinting in this case.

@JoerivanEngelen JoerivanEngelen changed the title Draft: Issue #708 Refactor dataset initialization Issue #708 Refactor dataset initialization Jan 29, 2024
Copy link
Contributor

@luitjansl luitjansl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

# Remove vars inplace
del self.dataset["concentration"]
del self.dataset["concentration_boundary_type"]
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if concentration is not present in allargs.keys(), should we do the expand_transient_auxiliary_variables?

Copy link
Contributor Author

@JoerivanEngelen JoerivanEngelen Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this was purely required if concentration was present. (At least that was the old behaviour). Changing that behaviour is out of scope here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but now expand_transient_auxiliary_variables(self) is in the else branch. It will do it if concentration is not present in allargs.keys()

@JoerivanEngelen JoerivanEngelen added this pull request to the merge queue Jan 30, 2024
Merged via the queue into master with commit da5b8f1 Jan 30, 2024
3 checks passed
@JoerivanEngelen JoerivanEngelen deleted the 708_refactor_dataset_init branch January 30, 2024 18:09
@luitjansl luitjansl mentioned this pull request Feb 16, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use an exact join to initiate Package datasets
3 participants