-
Notifications
You must be signed in to change notification settings - Fork 16
Use the median of the delta instead of min for time freq inference #768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #768 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 16 16
Lines 1705 1706 +1
=========================================
+ Hits 1705 1706 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Thank you @chengzhuzhang for getting to this before me! This PR looks ready for review? If so, I will review and merge if it is good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the time frequency inference logic by replacing the minimum time delta with the median delta, making it more robust against small timestamp irregularities.
- Replaces min() with median() for calculating the time delta.
- Adds a comment to ensure that the time coordinates are sorted before processing.
- Updates conditional checks for "hour", "day", "month", and "year" frequencies.
Comments suppressed due to low confidence (1)
xcdat/temporal.py:2108
- The comment indicates time_coords should be sorted; to avoid unexpected behavior due to unsorted input, consider enforcing or explicitly sorting time_coords before computing diffs.
time_deltas = np.diff(time_coords.values).astype("timedelta64[ns]")
@@ -2087,7 +2087,7 @@ def _infer_freq(time_coords: xr.DataArray) -> Frequency: | |||
"""Infers the time frequency from the coordinates. | |||
|
|||
This method infers the time frequency from the coordinates by | |||
calculating the minimum delta and comparing it against a set of | |||
calculating the median delta and comparing it against a set of | |||
conditionals. | |||
|
|||
The native ``xr.infer_freq()`` method does not work for all cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, in the docstring I cited the native xarray.infer_freq() method not working for all cases, which is why I implemented _infer_freq()
:
The native ``xr.infer_freq()`` method does not work for all cases because the frequency can be irregular (e.g., different hour measurements), which ends up returning None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @chengzhuzhang, this PR looks good to me.
I added direct tests for _infer_freq()
function to ensure it is working as intended.
@tomvothecoder Thank for a review! I think it is ready to merge. |
Adds sparse as a dependency Fixes formatting Updates docstrings [PR]: Enable `skipna` for spatial and temporal mean operations (#655) Co-authored-by: Tom Vo <tomvothecoder@gmail.com> Removes pdb import Fix incorrect dimension used for temporal weights generation (#749) Adds mask creation to create_grid and fixes aligning grid dimension for xesmf Lifting src mask generation Adds create_nan_mask argument to regrid2 Adds docstring to create_mask Add weight threshold option for spatial averaging (#672) - Add parameter `min_weight` to `SpatialAccessor.average()` Replace support section with endorsements (#757) Drop Python 3.9 support and add compatibility for Python 3.13 (#721) Fixes typings Fixes spelling error Fixes black formatting issue Adds scipy dependency Add `.zenodo.json` and `CITATION.cff` to cite core authors (#759) Refactors create_nan_mask Adds tests and fixes mask dimension ordering Fixes variable name Adds create_nan_mask support to xesmf Chunk weights before broadcasting/masking in _group_average (#767) Use the median of the delta instead of min for time freq inference (#768) Co-authored-by: Tom Vo <tomvothecoder@gmail.com> Adds missing test
Adds sparse as a dependency Fixes formatting Updates docstrings [PR]: Enable `skipna` for spatial and temporal mean operations (#655) Co-authored-by: Tom Vo <tomvothecoder@gmail.com> Removes pdb import Fix incorrect dimension used for temporal weights generation (#749) Adds mask creation to create_grid and fixes aligning grid dimension for xesmf Lifting src mask generation Adds create_nan_mask argument to regrid2 Adds docstring to create_mask Add weight threshold option for spatial averaging (#672) - Add parameter `min_weight` to `SpatialAccessor.average()` Replace support section with endorsements (#757) Drop Python 3.9 support and add compatibility for Python 3.13 (#721) Fixes typings Fixes spelling error Fixes black formatting issue Adds scipy dependency Add `.zenodo.json` and `CITATION.cff` to cite core authors (#759) Refactors create_nan_mask Adds tests and fixes mask dimension ordering Fixes variable name Adds create_nan_mask support to xesmf Chunk weights before broadcasting/masking in _group_average (#767) Use the median of the delta instead of min for time freq inference (#768) Co-authored-by: Tom Vo <tomvothecoder@gmail.com> Adds missing test
Description
Using the median of delta is more tolerant when handling small irregularities (e.g., small drift in timestamps).
min() can be too sensitive and lead to incorrect frequency inference (e.g., #760 which has been tested with this PR)
Checklist
If applicable: