Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1620436 Improved to_datetime to handle all local input cases and improved time series notebook tests #2184

Merged
merged 5 commits into from
Aug 29, 2024

Conversation

sfc-gh-azhan
Copy link
Collaborator

@sfc-gh-azhan sfc-gh-azhan commented Aug 28, 2024

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1620436

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
  3. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

This pull request includes several changes to improve the handling of datetime conversions and related tests in the Snowpark Modin integration. The most important changes include enhancements to the to_datetime function, updates to internal utility messages, and adjustments to test cases for better accuracy and performance.

Improvements to to_datetime function:

  • Improved pd.to_datetime to handle all local input cases. (CHANGELOG.md)
  • Updated to_datetime to handle non-Modin objects using pandas.to_datetime and to return appropriate Modin objects. (src/snowflake/snowpark/modin/pandas/general.py) [1] [2]
  • Refined the handling of timezone-naive and timezone-aware inputs in to_datetime. (src/snowflake/snowpark/modin/pandas/general.py) [1] [2]

Updates to internal utilities:

  • Introduced AUTO_FORMAT_WARNING_MSG for consistent warning messages about Snowflake's automatic format detection. (src/snowflake/snowpark/modin/plugin/_internal/timestamp_utils.py)
  • Updated warning messages to reflect potential mismatches with pandas more clearly. (src/snowflake/snowpark/modin/plugin/utils/warning_message.py)

Adjustments to test cases:

  • Modified test cases to use pd.Index for datetime inputs to ensure consistency and accuracy in results. (tests/integ/modin/tools/test_to_datetime.py) [1] [2] [3]
  • Adjusted SQL query count expectations in tests to reflect the improved performance of to_datetime. (tests/integ/modin/tools/test_to_datetime.py) [1] [2] [3]

These changes enhance the robustness and performance of datetime handling in the Snowpark Modin integration, ensuring better alignment with pandas behavior and more accurate test coverage.

@sfc-gh-azhan sfc-gh-azhan added the NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs label Aug 28, 2024
@sfc-gh-azhan sfc-gh-azhan changed the title Zhan ts notebook 1620436 SNOW-1620436 Improved to_datetime to handle all local input cases and improved time series notebook tests Aug 29, 2024
@sfc-gh-azhan sfc-gh-azhan marked this pull request as ready for review August 29, 2024 18:34
@sfc-gh-azhan sfc-gh-azhan requested a review from a team as a code owner August 29, 2024 18:34
@@ -1784,8 +1781,28 @@ def to_datetime(
# TODO: SNOW-1063345: Modin upgrade - modin.pandas functions in general.py
raise_if_native_pandas_objects(arg)

if arg is None:
return None # same as pandas
if not isinstance(arg, (DataFrame, Series, pd.Index)):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match with what we do today for DatetimeIndex and timedelta.

Copy link
Contributor

@sfc-gh-nkrishna sfc-gh-nkrishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor nit, but LGTM!

tests/notebooks/modin/TimeSeriesTesting.ipynb Outdated Show resolved Hide resolved
tests/notebooks/modin/TimeSeriesTesting.ipynb Outdated Show resolved Hide resolved
@sfc-gh-helmeleegy
Copy link
Contributor

Are we sure we don't want to break up this PR?

@sfc-gh-azhan
Copy link
Collaborator Author

sfc-gh-azhan commented Aug 29, 2024

Are we sure we don't want to break up this PR?

It's only two changes on src:

  1. use pandas to_datetime for local inputs
  2. warning message change

Others are just test updated based on the above two.

I did include most of tests from the pandas time series doc in our notebook in this PR. But should be fine if something can be improved there, we can use followup PRs

@sfc-gh-azhan sfc-gh-azhan merged commit 1cd0149 into main Aug 29, 2024
35 checks passed
@sfc-gh-azhan sfc-gh-azhan deleted the zhan-ts-notebook-1620436 branch August 29, 2024 23:27
@github-actions github-actions bot locked and limited conversation to collaborators Aug 29, 2024
@sfc-gh-helmeleegy
Copy link
Contributor

Are we sure we don't want to break up this PR?

It's only two changes on src:

  1. use pandas to_datetime for local inputs
  2. warning message change

Others are just test updated based on the above two.

I did include most of tests from the pandas time series doc in our notebook in this PR. But should be fine if something can be improved there, we can use followup PRs

This makes sense.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs snowpark-pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants