Skip to content

Commit

Permalink
SNOW-1646704, SNOW-1646706: Add support for Series.dt.tz_convert/tz_l…
Browse files Browse the repository at this point in the history
…ocalize (#2261)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1646704, SNOW-1646706

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   Add support for Series.dt.tz_convert/tz_localize.
  • Loading branch information
sfc-gh-helmeleegy authored Sep 11, 2024
1 parent dce73a5 commit fe51d4d
Show file tree
Hide file tree
Showing 8 changed files with 381 additions and 12 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@
- Added support for string indexing with `Timedelta` objects.
- Added support for `Series.dt.total_seconds` method.
- Added support for `DataFrame.apply(axis=0)`.
- Added support for `Series.dt.tz_convert` and `Series.dt.tz_localize`.

#### Improvements

Expand Down
2 changes: 2 additions & 0 deletions docs/source/modin/series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,8 @@ Series
Series.dt.seconds
Series.dt.microseconds
Series.dt.nanoseconds
Series.dt.tz_convert
Series.dt.tz_localize


.. rubric:: String accessor methods
Expand Down
5 changes: 3 additions & 2 deletions docs/source/modin/supported/series_dt_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,10 @@ the method in the left column.
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``to_pydatetime`` | N | |
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``tz_localize`` | N | |
| ``tz_localize`` | P | ``N`` if `ambiguous` or `nonexistent` are set to a |
| | | non-default value. |
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``tz_convert`` | N | |
| ``tz_convert`` | Y | |
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``normalize`` | Y | |
+-----------------------------+---------------------------------+----------------------------------------------------+
Expand Down
65 changes: 65 additions & 0 deletions src/snowflake/snowpark/modin/plugin/_internal/timestamp_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,17 @@
cast,
convert_timezone,
date_part,
dayofmonth,
hour,
iff,
minute,
month,
second,
timestamp_tz_from_parts,
to_decimal,
to_timestamp_ntz,
trunc,
year,
)
from snowflake.snowpark.modin.plugin._internal.utils import pandas_lit
from snowflake.snowpark.modin.plugin.utils.error_message import ErrorMessage
Expand Down Expand Up @@ -467,3 +475,60 @@ def convert_dateoffset_to_interval(
)
interval_kwargs[new_param] = offset
return Interval(**interval_kwargs)


def tz_localize_column(column: Column, tz: Union[str, dt.tzinfo]) -> Column:
"""
Localize tz-naive to tz-aware.
Args:
tz : str, pytz.timezone, optional
Localize a tz-naive datetime column to tz-aware
Args:
column: the Snowpark datetime column
tz: time zone for time. Corresponding timestamps would be converted to this time zone of the Datetime Array/Index. A tz of None will convert to UTC and remove the timezone information.
Returns:
The column after tz localization
"""
if tz is None:
# If this column is already a TIMESTAMP_NTZ, this cast does nothing.
# If the column is a TIMESTAMP_TZ, the cast drops the timezone and converts
# to TIMESTAMP_NTZ.
return to_timestamp_ntz(column)
else:
if isinstance(tz, dt.tzinfo):
tz_name = tz.tzname(None)
else:
tz_name = tz
return timestamp_tz_from_parts(
year(column),
month(column),
dayofmonth(column),
hour(column),
minute(column),
second(column),
date_part("nanosecond", column),
pandas_lit(tz_name),
)


def tz_convert_column(column: Column, tz: Union[str, dt.tzinfo]) -> Column:
"""
Converts a datetime column to the specified timezone
Args:
column: the Snowpark datetime column
tz: the target timezone
Returns:
The column after conversion to the specified timezone
"""
if tz is None:
return convert_timezone(pandas_lit("UTC"), column)
else:
if isinstance(tz, dt.tzinfo):
tz_name = tz.tzname(None)
else:
tz_name = tz
return convert_timezone(pandas_lit(tz_name), column)
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,8 @@
raise_if_to_datetime_not_supported,
timedelta_freq_to_nanos,
to_snowflake_timestamp_format,
tz_convert_column,
tz_localize_column,
)
from snowflake.snowpark.modin.plugin._internal.transpose_utils import (
clean_up_transpose_result_index_and_labels,
Expand Down Expand Up @@ -16666,7 +16668,7 @@ def dt_tz_localize(
tz: Union[str, tzinfo],
ambiguous: str = "raise",
nonexistent: str = "raise",
) -> None:
) -> "SnowflakeQueryCompiler":
"""
Localize tz-naive to tz-aware.
Args:
Expand All @@ -16678,11 +16680,22 @@ def dt_tz_localize(
BaseQueryCompiler
New QueryCompiler containing values with localized time zone.
"""
ErrorMessage.not_implemented(
"Snowpark pandas doesn't yet support the method 'Series.dt.tz_localize'"
if not isinstance(ambiguous, str) or ambiguous != "raise":
ErrorMessage.parameter_not_implemented_error(
"ambiguous", "Series.dt.tz_localize"
)
if not isinstance(nonexistent, str) or nonexistent != "raise":
ErrorMessage.parameter_not_implemented_error(
"nonexistent", "Series.dt.tz_localize"
)

return SnowflakeQueryCompiler(
self._modin_frame.apply_snowpark_function_to_columns(
lambda column: tz_localize_column(column, tz)
)
)

def dt_tz_convert(self, tz: Union[str, tzinfo]) -> None:
def dt_tz_convert(self, tz: Union[str, tzinfo]) -> "SnowflakeQueryCompiler":
"""
Convert time-series data to the specified time zone.

Expand All @@ -16692,8 +16705,10 @@ def dt_tz_convert(self, tz: Union[str, tzinfo]) -> None:
Returns:
A new QueryCompiler containing values with converted time zone.
"""
ErrorMessage.not_implemented(
"Snowpark pandas doesn't yet support the method 'Series.dt.tz_convert'"
return SnowflakeQueryCompiler(
self._modin_frame.apply_snowpark_function_to_columns(
lambda column: tz_convert_column(column, tz)
)
)

def dt_ceil(
Expand Down
175 changes: 173 additions & 2 deletions src/snowflake/snowpark/modin/plugin/docstrings/series_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -1858,10 +1858,181 @@ def to_pydatetime():
pass

def tz_localize():
pass
"""
Localize tz-naive Datetime Array/Index to tz-aware Datetime Array/Index.
This method takes a time zone (tz) naive Datetime Array/Index object and makes this time zone aware. It does not move the time to another time zone.
This method can also be used to do the inverse – to create a time zone unaware object from an aware object. To that end, pass tz=None.
Parameters
----------
tz : str, pytz.timezone, dateutil.tz.tzfile, datetime.tzinfo or None
Time zone to convert timestamps to. Passing None will remove the time zone information preserving local time.
ambiguous : ‘infer’, ‘NaT’, bool array, default ‘raise’
When clocks moved backward due to DST, ambiguous times may arise. For example in Central European Time (UTC+01), when going from 03:00 DST to 02:00 non-DST, 02:30:00 local time occurs both at 00:30:00 UTC and at 01:30:00 UTC. In such a situation, the ambiguous parameter dictates how ambiguous times should be handled.
- ‘infer’ will attempt to infer fall dst-transition hours based on order
- bool-ndarray where True signifies a DST time, False signifies a non-DST time (note that this flag is only applicable for ambiguous times)
- ‘NaT’ will return NaT where there are ambiguous times
- ‘raise’ will raise an AmbiguousTimeError if there are ambiguous times.
nonexistent : ‘shift_forward’, ‘shift_backward, ‘NaT’, timedelta, default ‘raise’
A nonexistent time does not exist in a particular timezone where clocks moved forward due to DST.
- ‘shift_forward’ will shift the nonexistent time forward to the closest existing time
- ‘shift_backward’ will shift the nonexistent time backward to the closest existing time
- ‘NaT’ will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- ‘raise’ will raise an NonExistentTimeError if there are nonexistent times.
Returns
-------
Same type as self
Array/Index converted to the specified time zone.
Raises
------
TypeError
If the Datetime Array/Index is tz-aware and tz is not None.
See also
--------
DatetimeIndex.tz_convert
Convert tz-aware DatetimeIndex from one time zone to another.
Examples
--------
>>> tz_naive = pd.date_range('2018-03-01 09:00', periods=3)
>>> tz_naive
DatetimeIndex(['2018-03-01 09:00:00', '2018-03-02 09:00:00',
'2018-03-03 09:00:00'],
dtype='datetime64[ns]', freq=None)
Localize DatetimeIndex in US/Eastern time zone:
>>> tz_aware = tz_naive.tz_localize(tz='US/Eastern') # doctest: +SKIP
>>> tz_aware # doctest: +SKIP
DatetimeIndex(['2018-03-01 09:00:00-05:00',
'2018-03-02 09:00:00-05:00',
'2018-03-03 09:00:00-05:00'],
dtype='datetime64[ns, US/Eastern]', freq=None)
With the tz=None, we can remove the time zone information while keeping the local time (not converted to UTC):
>>> tz_aware.tz_localize(None) # doctest: +SKIP
DatetimeIndex(['2018-03-01 09:00:00', '2018-03-02 09:00:00',
'2018-03-03 09:00:00'],
dtype='datetime64[ns]', freq=None)
Be careful with DST changes. When there is sequential data, pandas can infer the DST time:
>>> s = pd.to_datetime(pd.Series(['2018-10-28 01:30:00',
... '2018-10-28 02:00:00',
... '2018-10-28 02:30:00',
... '2018-10-28 02:00:00',
... '2018-10-28 02:30:00',
... '2018-10-28 03:00:00',
... '2018-10-28 03:30:00']))
>>> s.dt.tz_localize('CET', ambiguous='infer') # doctest: +SKIP
0 2018-10-28 01:30:00+02:00
1 2018-10-28 02:00:00+02:00
2 2018-10-28 02:30:00+02:00
3 2018-10-28 02:00:00+01:00
4 2018-10-28 02:30:00+01:00
5 2018-10-28 03:00:00+01:00
6 2018-10-28 03:30:00+01:00
dtype: datetime64[ns, CET]
In some cases, inferring the DST is impossible. In such cases, you can pass an ndarray to the ambiguous parameter to set the DST explicitly
>>> s = pd.to_datetime(pd.Series(['2018-10-28 01:20:00',
... '2018-10-28 02:36:00',
... '2018-10-28 03:46:00']))
>>> s.dt.tz_localize('CET', ambiguous=np.array([True, True, False])) # doctest: +SKIP
0 2018-10-28 01:20:00+02:00
1 2018-10-28 02:36:00+02:00
2 2018-10-28 03:46:00+01:00
dtype: datetime64[ns, CET]
If the DST transition causes nonexistent times, you can shift these dates forward or backwards with a timedelta object or ‘shift_forward’ or ‘shift_backwards’.
>>> s = pd.to_datetime(pd.Series(['2015-03-29 02:30:00',
... '2015-03-29 03:30:00']))
>>> s.dt.tz_localize('Europe/Warsaw', nonexistent='shift_forward') # doctest: +SKIP
0 2015-03-29 03:00:00+02:00
1 2015-03-29 03:30:00+02:00
dtype: datetime64[ns, Europe/Warsaw]
>>> s.dt.tz_localize('Europe/Warsaw', nonexistent='shift_backward') # doctest: +SKIP
0 2015-03-29 01:59:59.999999999+01:00
1 2015-03-29 03:30:00+02:00
dtype: datetime64[ns, Europe/Warsaw]
>>> s.dt.tz_localize('Europe/Warsaw', nonexistent=pd.Timedelta('1h')) # doctest: +SKIP
0 2015-03-29 03:30:00+02:00
1 2015-03-29 03:30:00+02:00
dtype: datetime64[ns, Europe/Warsaw]
"""

def tz_convert():
pass
"""
Convert tz-aware Datetime Array/Index from one time zone to another.
Parameters
----------
tz : str, pytz.timezone, dateutil.tz.tzfile, datetime.tzinfo or None
Time zone for time. Corresponding timestamps would be converted to this time zone of the Datetime Array/Index. A tz of None will convert to UTC and remove the timezone information.
Returns
-------
Array or Index
Raises
------
TypeError
If Datetime Array/Index is tz-naive.
See also
DatetimeIndex.tz
A timezone that has a variable offset from UTC.
DatetimeIndex.tz_localize
Localize tz-naive DatetimeIndex to a given time zone, or remove timezone from a tz-aware DatetimeIndex.
Examples
--------
With the tz parameter, we can change the DatetimeIndex to other time zones:
>>> dti = pd.date_range(start='2014-08-01 09:00',
... freq='h', periods=3, tz='Europe/Berlin') # doctest: +SKIP
>>> dti # doctest: +SKIP
DatetimeIndex(['2014-08-01 09:00:00+02:00',
'2014-08-01 10:00:00+02:00',
'2014-08-01 11:00:00+02:00'],
dtype='datetime64[ns, Europe/Berlin]', freq='h')
>>> dti.tz_convert('US/Central') # doctest: +SKIP
DatetimeIndex(['2014-08-01 02:00:00-05:00',
'2014-08-01 03:00:00-05:00',
'2014-08-01 04:00:00-05:00'],
dtype='datetime64[ns, US/Central]', freq='h')
With the tz=None, we can remove the timezone (after converting to UTC if necessary):
>>> dti = pd.date_range(start='2014-08-01 09:00', freq='h',
... periods=3, tz='Europe/Berlin') # doctest: +SKIP
>>> dti # doctest: +SKIP
DatetimeIndex(['2014-08-01 09:00:00+02:00',
'2014-08-01 10:00:00+02:00',
'2014-08-01 11:00:00+02:00'],
dtype='datetime64[ns, Europe/Berlin]', freq='h')
>>> dti.tz_convert(None) # doctest: +SKIP
DatetimeIndex(['2014-08-01 07:00:00',
'2014-08-01 08:00:00',
'2014-08-01 09:00:00'],
dtype='datetime64[ns]', freq='h')
"""
# TODO (SNOW-1660843): Support tz in pd.date_range and unskip the doctests.

def normalize():
pass
Expand Down
Loading

0 comments on commit fe51d4d

Please sign in to comment.