Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix in Hazard.get_mdr and impact calculation in case of a single exposure point and MDR values of 0 #948

Merged
merged 3 commits into from
Oct 3, 2024

Conversation

bguillod
Copy link
Collaborator

@bguillod bguillod commented Sep 17, 2024

Changes proposed in this PR:

  • Fixes a bug in the impact calculation for cases with a single exposure point and a zero MDR value.

This PR fixes #933

PR Author Checklist

PR Reviewer Checklist

@bguillod
Copy link
Collaborator Author

bguillod commented Sep 17, 2024

@emanuel-schmid The Test Pipelines fail due to no internet connection - can I safely assume this is an issues on the github configuration side and unrelated to changes in this PR?

@emanuel-schmid
Copy link
Collaborator

@emanuel-schmid The Test Pipelines fail due to no internet connection - can I safely assume this is an issues on the github configuration side and unrelated to changes in this PR?

@bguillod I do assume it's unrelated to changes in this PR, but I couldn't blame the github configuration either. Atm, I suspect the climada data server had a glitch.

@bguillod
Copy link
Collaborator Author

Ah, it works now and tests pass, so yes it must have been something.
I've assigned @peanutfun to this PR, but actually I guess he's off for a while now, correct? Who should we best assign to this PR @emanuel-schmid (beside @aleeciu who should check it indeed solves all the related issues for his cases)?

@chahank
Copy link
Member

chahank commented Sep 20, 2024

Thanks for the fix!

Can you please make a quick check that your modification does not lead to a significant loss of performance for large datasets?

@bguillod
Copy link
Collaborator Author

bguillod commented Sep 23, 2024

@chahank According to @peanutfun it should rather lead to a performance improvement, however we can check this if you want. Do you have a suggested case which uses large datasets which I could run? It also depends what you mean by "large" as I don't have an Euler cluster at hand...

@chahank
Copy link
Member

chahank commented Sep 23, 2024

Yes, as @peanutfun said, it might lead to an improvement too. I think it is enough if you run it for a case of a few thousand TC events and make a small uncertainty analysis (you can just execute the tutorial with one of your large datasets). There is no need to make extended tests, I would just like to have a quick check to be sure that we are not missing something obvious. Thanks!

@bguillod
Copy link
Collaborator Author

Alright, so based on the uncertainty analysis tutorial I created the following test:

#Define the input variable functions
import numpy as np

from climada.entity import ImpactFunc, ImpactFuncSet, Exposures
from climada.util.constants import EXP_DEMO_H5, HAZ_DEMO_H5
from climada.hazard import Hazard
from functools import partial
import scipy as sp
from climada.engine.unsequa import InputVar
from climada.engine.unsequa import CalcImpact
import time

def impf_func(G=1, v_half=84.7, vmin=25.7, k=3, _id=1):

    def xhi(v, v_half, vmin):
        return max([(v - vmin), 0]) / (v_half - vmin)

    def sigmoid_func(v, G, v_half, vmin, k):
        return G * xhi(v, v_half, vmin)**k / (1 + xhi(v, v_half, vmin)**k)

    #In-function imports needed only for parallel computing on Windows
    import numpy as np
    from climada.entity import ImpactFunc, ImpactFuncSet
    intensity_unit = 'm/s'
    intensity = np.linspace(0, 150, num=100)
    mdd = np.repeat(1, len(intensity))
    paa = np.array([sigmoid_func(v, G, v_half, vmin, k) for v in intensity])
    imp_fun = ImpactFunc("TC", _id, intensity, mdd, paa, intensity_unit)
    imp_fun.check()
    impf_set = ImpactFuncSet([imp_fun])
    return impf_set

haz = Hazard.from_hdf5(HAZ_DEMO_H5)
exp_base = Exposures.from_hdf5(EXP_DEMO_H5)

def prep_full_calc(haz, exp_base, impf_func):
    #It is a good idea to assign the centroids to the base exposures in order to avoid repeating this
    # potentially costly operation for each sample.
    exp_base.assign_centroids(haz)
    def exp_base_func(x_exp, exp_base):
        exp = exp_base.copy()
        exp.gdf['value'] *= x_exp
        return exp
    exp_func = partial(exp_base_func, exp_base=exp_base)
    #Define the InputVars


    exp_distr = {"x_exp": sp.stats.beta(10, 1.1)} #This is not really a reasonable distribution but is used
                                                #here to show that you can use any scipy distribution.

    exp_iv = InputVar(exp_func, exp_distr)

    impf_distr = {
        "G": sp.stats.truncnorm(0.5, 1.5),
        "v_half": sp.stats.uniform(35, 65),
        "vmin": sp.stats.uniform(0, 15),
        "k": sp.stats.uniform(1, 4)
        }
    impf_iv = InputVar(impf_func, impf_distr)
    calc_imp = CalcImpact(exp_iv, impf_iv, haz)
    output_imp = calc_imp.make_sample(N=2**7, sampling_kwargs={'skip_values': 2**8})
    return calc_imp, output_imp

def run_unc_calc(calc_imp, output_imp):
    calc_imp.uncertainty(output_imp, rp = [50, 100, 250])

I then did a few tests on the develop branch and on the branch used in this PR by first running:

calc_imp, output_imp = prep_full_calc(haz, exp_base, impf_func)

and then by profiling using this (everything here was run in a jupyter notebook):

%timeit -r 20 run_unc_calc(calc_imp, output_imp)

The results are not very stable (i.e. they change a little bit each time you run it), so sometimes there is a small difference and overall it tends to be slightly longer in this branch compared to develop. For instance, in the final run I did the outcome was:

  1. In the develop branch: 3.19 s ± 13.7 ms per loop (mean ± std. dev. of 20 runs, 1 loop each).
  2. In this branch: 3.2 s ± 21.6 ms per loop (mean ± std. dev. of 20 runs, 1 loop each).

Since it already takes several seconds to run this calculation I did not go on with a larger event set. This seems quite acceptable to me as it fixes a bug. Let me know what you think @chahank.

@chahank
Copy link
Member

chahank commented Sep 25, 2024

Ok, thanks for checking, this looks good then.

@chahank
Copy link
Member

chahank commented Sep 27, 2024

@bguillod did you want to do anything more? Otherwise I would merge. Thanks!

@aleeciu
Copy link
Collaborator

aleeciu commented Sep 27, 2024

I checked the cases that allowed me to find the bug and this fix indeed solves the issue.. thanks @bguillod and @peanutfun

@bguillod
Copy link
Collaborator Author

Great, thanks @aleeciu for checking it resolves your case and @chahank for reviewing! All good from my perspective, so feel free to merge @chahank :-)

@bguillod bguillod merged commit dc7d424 into develop Oct 3, 2024
19 checks passed
@emanuel-schmid emanuel-schmid deleted the bugfix/imp-calc-mdr-zeros branch October 3, 2024 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants