-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add expected value CLI and plugin #1719
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1719 +/- ##
==========================================
- Coverage 98.13% 98.01% -0.12%
==========================================
Files 111 113 +2
Lines 10190 10332 +142
==========================================
+ Hits 10000 10127 +127
- Misses 190 205 +15
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work Tom, it was amazing to see how easily we can switch from thresholds to percentiles.
Something for a reviewer to consider. This PR calculates the mean value of the ensemble members, and the meta-data is updated to say exactly this. An "Expected Value" is a weighted mean, but this PR provides no mechanism for weighting the members. Should the CLI / plugin be renamed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start to adding in the capability to evaluate the mean from an ensemble forecast. There are a couple of points below that need some consideration, but hopefully these should be reasonably straight forward to address.
I think renaming the CLI to ensemble_mean or something similar would be warranted, given (as@MoseleyS highlights) it is the mean being calculated here and strictly speaking not the expectation value.
One thing worth considering is the way multiple ensemble dimensions are handled. Currently the case of two percentile dimensions is identified through a ValueError
, but this case is treated as not being a percentile cube. I appreciate that taking the expected value is ambiguous without knowing which percentile dimension to perform the mean over, but I think this case should raise an exception to highlight this ambiguity. The issue becomes even more complicated when one factors in the possibility of mixed ensemble dimensions (for example, both threshold, realization dims present).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updates, and your consideration of other points and raising associated issues. This all looks good to me now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tom, could you let me know where I can find your new acceptance test data, so I can take a quick look and have it ready to merge in when this PR gets merged.
Co-authored-by: fionaRust <fiona.rust@metoffice.gov.uk>
* Skeleton for expected value * Update style and copyright * Add basic implementation * Add tests for is_percentile * Add expected value tests * Fix imports and tests * Add handling of threshold data via conversion to percentiles * Update tests for threshold calculation * Add acceptance tests * Fix black making flake8 fail * Changes from review comments * Fix unused import * Docstring fix Co-authored-by: fionaRust <fiona.rust@metoffice.gov.uk> Co-authored-by: fionaRust <fiona.rust@metoffice.gov.uk>
Add a plugin and CLI to calculate the expected value from a probability distribution.
The expected value is the mean of random outcomes (eg. ensemble members) and can be used to produce a deterministic "best guess" forecast from a probabilistic forecast as processed by IMPROVER. The expected value will often be similar to the 50th percentile, but may differ, such as in the case of a positively or negatively skewed (asymmetrical) distribution.
The calculation of expected value for threshold probability data added here is a quick to implement method using existing IMPROVER functionality for conversion to percentiles - this has the correct input and output interfaces, but has high memory usage and has an impact on the accuracy of the output data. I expect to add direct calculation from threshold data (via numerical integration over the threshold values) in a future pull request.
Testing: