Skip to content

Use the median of the delta instead of min for time freq inference #768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 28, 2025

Conversation

chengzhuzhang
Copy link
Collaborator

Description

Using the median of delta is more tolerant when handling small irregularities (e.g., small drift in timestamps).
min() can be too sensitive and lead to incorrect frequency inference (e.g., #760 which has been tested with this PR)

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

Copy link

codecov bot commented May 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (05ad2f2) to head (b3c08bb).

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #768   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           16        16           
  Lines         1705      1706    +1     
=========================================
+ Hits          1705      1706    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tomvothecoder
Copy link
Collaborator

Thank you @chengzhuzhang for getting to this before me! This PR looks ready for review? If so, I will review and merge if it is good to go.

@tomvothecoder tomvothecoder added the type: enhancement New enhancement request label May 28, 2025
@tomvothecoder tomvothecoder moved this from Todo to In Progress in xCDAT Development May 28, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the time frequency inference logic by replacing the minimum time delta with the median delta, making it more robust against small timestamp irregularities.

  • Replaces min() with median() for calculating the time delta.
  • Adds a comment to ensure that the time coordinates are sorted before processing.
  • Updates conditional checks for "hour", "day", "month", and "year" frequencies.
Comments suppressed due to low confidence (1)

xcdat/temporal.py:2108

  • The comment indicates time_coords should be sorted; to avoid unexpected behavior due to unsorted input, consider enforcing or explicitly sorting time_coords before computing diffs.
time_deltas = np.diff(time_coords.values).astype("timedelta64[ns]")

@@ -2087,7 +2087,7 @@ def _infer_freq(time_coords: xr.DataArray) -> Frequency:
"""Infers the time frequency from the coordinates.

This method infers the time frequency from the coordinates by
calculating the minimum delta and comparing it against a set of
calculating the median delta and comparing it against a set of
conditionals.

The native ``xr.infer_freq()`` method does not work for all cases
Copy link
Collaborator

@tomvothecoder tomvothecoder May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, in the docstring I cited the native xarray.infer_freq() method not working for all cases, which is why I implemented _infer_freq():

The native ``xr.infer_freq()`` method does not work for all cases
because the frequency can be irregular (e.g., different hour
measurements), which ends up returning None.

Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @chengzhuzhang, this PR looks good to me.

I added direct tests for _infer_freq() function to ensure it is working as intended.

@chengzhuzhang
Copy link
Collaborator Author

@tomvothecoder Thank for a review! I think it is ready to merge.
@pochedls thanks for the suggestion on this change!

@tomvothecoder tomvothecoder merged commit a5e62ac into main May 28, 2025
10 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in xCDAT Development May 28, 2025
@tomvothecoder tomvothecoder deleted the update_infer_time_freq branch May 28, 2025 17:38
tomvothecoder pushed a commit that referenced this pull request Jun 2, 2025
Adds sparse as a dependency

Fixes formatting

Updates docstrings

[PR]: Enable `skipna` for spatial and temporal mean operations (#655)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

Removes pdb import

Fix incorrect dimension used for temporal weights generation (#749)

Adds mask creation to create_grid and fixes aligning grid dimension for xesmf

Lifting src mask generation

Adds create_nan_mask argument to regrid2

Adds docstring to create_mask

Add weight threshold option for spatial averaging (#672)

- Add parameter `min_weight` to `SpatialAccessor.average()`

Replace support section with endorsements (#757)

 Drop Python 3.9 support and add compatibility for Python 3.13 (#721)

Fixes typings

Fixes spelling error

Fixes black formatting issue

Adds scipy dependency

Add `.zenodo.json` and `CITATION.cff` to cite core authors (#759)

Refactors create_nan_mask

Adds tests and fixes mask dimension ordering

Fixes variable name

Adds create_nan_mask support to xesmf

Chunk weights before broadcasting/masking in _group_average (#767)

Use the median of the delta instead of min for time freq inference  (#768)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

Adds missing test
tomvothecoder pushed a commit that referenced this pull request Jun 2, 2025
Adds sparse as a dependency

Fixes formatting

Updates docstrings

[PR]: Enable `skipna` for spatial and temporal mean operations (#655)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

Removes pdb import

Fix incorrect dimension used for temporal weights generation (#749)

Adds mask creation to create_grid and fixes aligning grid dimension for xesmf

Lifting src mask generation

Adds create_nan_mask argument to regrid2

Adds docstring to create_mask

Add weight threshold option for spatial averaging (#672)

- Add parameter `min_weight` to `SpatialAccessor.average()`

Replace support section with endorsements (#757)

 Drop Python 3.9 support and add compatibility for Python 3.13 (#721)

Fixes typings

Fixes spelling error

Fixes black formatting issue

Adds scipy dependency

Add `.zenodo.json` and `CITATION.cff` to cite core authors (#759)

Refactors create_nan_mask

Adds tests and fixes mask dimension ordering

Fixes variable name

Adds create_nan_mask support to xesmf

Chunk weights before broadcasting/masking in _group_average (#767)

Use the median of the delta instead of min for time freq inference  (#768)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

Adds missing test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New enhancement request
Projects
Status: Done
2 participants