-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New implementation of Knutson scaling incl. fix to negative freqs and double counting #734
Conversation
Thank you for the proposal. Note that we have already had more than one trial to improve on this. So, here are four questions (which you might have solved):
I think overall the question is whether it is useful to do small changes as proposed here, that in appearance fix some issues, but does not clearly improve the model, and even potentially make it worse. |
Thanks for tackling the issues related to future TC projections. I personally think this implementation represents a substantial improvements compared to the current code. It is not perfect, but this is to a wide extent due to the nature of the complex problem and I think the proposed solution is a good pragmatic way to solve the main issues. The implementation as you did here is in my view very much in line with the literature. One concern I would have, however, is related to how Knutson presents changes and then how you implemented them. Knutson gives changes for "intensity above cat X", so we have changes for categories (ordered from highest to lowest category range):
So for example storms of category 3 fall under both criteria (1) and (2) above. Which change should one apply?
That way you would be able to conserve the changes for all category ranges that Knutson is providing. If you can solve that then I think this will bring the implementation to a new level. |
Exactly this method was implemented in the past. But, it did lead to strong inconsistencies with the frequency distributions. A new implementation should be aware of this. This was the proposal of the time. Maybe only changing the frequencies would resolve the problem.
|
I agree with you @chahank that this back-then-proposed implementation you've copy-pasted above would do what I meant for frequencies. Can you please elaborate on what the inconsistencies were in the distribution? Is this discussed somewhere in an issue or an open PR or something?
It might be worth a try. Surely, applying both changing in frequency and intensity must mess up the distribution, but this does not mean that doing only frequency wouldn't. So first I'd like to understand better the inconsistencies you mentioned. Also, applying only statistically significant changes will mess up the distribution - which is why I generally strongly support the proposed addition in this PR to include also non-significant changes. |
Hi both, to me both the proposal by @bguillod and the code posted by @chahank make sense, but happy to hear what inconsistencies you encountered Chahan. I am just not fully sure about this @bguillod :
are you suggesting to calc yet other - cat-specific - frequencies other than those carried by the TropCyclone object which account for all events - regardless their cat - in a basin ? |
No, what I meant is essentially what is implement in the code shared by @chahank, I.e. summing up all frequencies of events by basin and category ranges. On a side-note: find the code extremely hard to understand though. We could definitely improve readability. |
The main inconsistency is that you get a lot of negative frequencies. And just putting them to zero is not satisfying. But, it was quite some time ago. Maybe it is worth trying it out again and maybe it can be made consistent. |
Do you remember for which basin and scenario & year this was the case? |
Unfortunately no, it was quite some time ago. The cleanest would be imho to redo the analysis with the latest version of the random walk algorithm. |
Is it still planed to finish this pull request? |
Yes, but I need to figure out how to address the points you raised above. |
Hi, I updated all tests following the guidelines as well as the advise of testing "meaningfull" data and not just prodiving the hardcoded value. I also used np.testing as much as I could following the advide of @peanutfun. From my side, I think I covered all raised points, please let me know. |
The only instances where I still check for hardcoded values occur when testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvements!
Some small typos were found.
Also, the tutorial file TropCyclone.ipynb
shows 5301 lines of code changed. This is very suboptimal. Please make sure that only those lines that you changed are changed in this file.
climada/hazard/test/test_tc_cc.py
Outdated
ind2 = target_year_ind + smoothing + 1 | ||
|
||
prediction = np.mean(tc_properties[:, ind1:ind2], 1) | ||
predicted_change = ((prediction - baseline) / baseline) * 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While not critical, using a variable name predicted_change
and one predicted_changes
is very confusing and quite impossible to read. For the future, maybe try using clear variable names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated with (hopefully) more informative names: target_predicted_changes
and calculated_predicted_change
Note: Only frequency changes are applied as suggested by Jewson 2022 | ||
(https://doi.org/10.1007/s00477-021-02142-6). Applying only frequency anyway | ||
changes mean intensities and most importantly avoids possible inconsistencies | ||
(including possible counting) that may arise from the application of both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean 'double-counting' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - adjusted
(https://doi.org/10.1007/s00477-021-02142-6). Applying only frequency anyway | ||
changes mean intensities and most importantly avoids possible inconsistencies | ||
(including possible counting) that may arise from the application of both | ||
frequency and intensity changes, as the relatioship between these two is non |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relationship
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Oh, and please update the CHANGELOG, don't hide your great work! https://github.com/CLIMADA-project/climada_python/blob/develop/CHANGELOG.md |
sel_cat05 = np.isin(tc_cc.category, [0, 1, 2, 3, 4, 5]) | ||
sel_cat03 = np.isin(tc_cc.category, [0, 1, 2, 3]) | ||
sel_cat45 = np.isin(tc_cc.category, [4, 5]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, I started using this improvement/fix for the generation of future TCs and discovered that if TropCyclone.category == []
, this method does not complain at all and just return the exact same object.
This happened to me when I wrote/read hdf5 files using a mix of Hazard.write|from_hdf5
and TropCyclone.write|from_hdf5
instead of only TropCyclone.write|from_hdf5
, but this could probably happen in other case and is easily addressed by raising an Error if categories are not set, or at the very least a warning, to inform the user.
Here a suggestion (which can probably be improved):
sel_cat05 = np.isin(tc_cc.category, [0, 1, 2, 3, 4, 5]) | |
sel_cat03 = np.isin(tc_cc.category, [0, 1, 2, 3]) | |
sel_cat45 = np.isin(tc_cc.category, [4, 5]) | |
sel_cat05 = np.isin(tc_cc.category, [0, 1, 2, 3, 4, 5]) | |
sel_cat03 = np.isin(tc_cc.category, [0, 1, 2, 3]) | |
sel_cat45 = np.isin(tc_cc.category, [4, 5]) | |
if sel_cat05.size + sel_cat03.size + sel_cat45.size == 0: | |
raise ValueError("Cyclone categories are missing, cannot apply climate change scenario") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for spotting and also for having started using it. I decided to go for the warning.
Thanks a lot @chahank . I fixed the typos and the point raised by @spjuhel . Regarding the notebook, I don't know what's going on. I literally changed only one line. So even though it seems huge, it's literally almost the same as before. I would think we can now merge this one? |
It seems that changes to the changelog file were already pushed to develop. It was probably my fault. I can revert it if needed. |
Done. |
Everything used to work fine but now I suddenly have a failing test. @emanuel-schmid |
👀 Yes, that worries me too. It has nothing to do with the PR but seems to be a glitch of the climada data server, most likely because of high load. We have been spared from such failures at large in the past, but lately they accumulated. |
Thanks for all the updates. As you can see in the files changes, the notebook has 5,056 additions, 245 deletions. Please make sure that only the lines that need to be changed are changed and make a new commit with the correct changes in the notebook. |
A notebook is basically a json file with the content of the cells (Both input and output). They are potentially subject to a LOT of changes when simply opened with jupyter (as solely opening it changes its content already), even more so if they are run. This is a common hassle with notebook files, for which there is very limited practical solution. If you changed only one line, I suggest you go back to the previous version and change the line directly in the file on GitHub for instance. You can also commit only specific parts of a file: https://stackoverflow.com/questions/1085162/commit-only-part-of-a-files-changes-in-git |
I have now updated the tutorial withouth re-running it, which seems to have done the trick. |
I have solved the merge conflicts. For me it's ready to merge, and I think the changes requested by @chahank and @peanutfun have been addressed - can you please confirm by approving and/or by directly merging so we can close this one? |
In this PR I implement new scaling factors to simulate TC under future climate. The new implementation follows the same logic as the previous one but here new scaling factors are used. The new scaling factors are based on Knutson et al.(2020) and Jewson et al. (2021).
Important to note is that:
I created a jupyter notebook where I show how the results of such implementation lead to outputs which are consistent with the estimates from Knutson 2020 and never lead to negative frequencies. That said, since the test is done under a given baseline period, I cannot rule out the possibility that negative frequencies will never occurr, therefore I am anyway throwing an error when this occur.
Two remaining todos:
PR Author Checklist
develop
)PR Reviewer Checklist