-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: datetime selector #1822
base: main
Are you sure you want to change the base?
feat: datetime selector #1822
Conversation
""" | ||
return Selector(lambda plx: plx.selectors.all()) | ||
|
||
|
||
def datetime( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, the datetime selector 😂
narwhals/utils.py
Outdated
if "*" in time_zones: | ||
import zoneinfo | ||
|
||
time_zones.extend(list(zoneinfo.available_timezones())) | ||
time_zones.remove("*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't allow "*" to be passed to the Datetime constructor as a time zone, I am explicitly creating all the timezones.
self: Self, | ||
time_unit: TimeUnit | Collection[TimeUnit] | None, | ||
time_zone: str | timezone | Collection[str | timezone | None] | None, | ||
) -> DaskSelector: # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For dask, the selector works, but the cast fails with:
TypeError: Cannot use .astype to convert from timezone-aware dtype to timezone-naive dtype. Use obj.tz_localize(None) or obj.tz_convert('UTC').tz_localize(None) instead.
"pyspark" in str(constructor) | ||
or "duckdb" in str(constructor) | ||
or "dask" in str(constructor) | ||
or ("pyarrow_table" in str(constructor) and PYARROW_VERSION < (12,)) | ||
or ("pyarrow" in str(constructor) and is_windows()) | ||
or ("pandas" in str(constructor) and PANDAS_VERSION < (2,)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are a lot of xfail, but let me go through them:
- pyspark and duckdb: do not implement selectors
- dask: see comment
- pyarrow < 12, does not implement
replace_time_zone
- pyarrow in windows: fails to find UTC timezone
- pandas < 2, does not support
time_units!="ns"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pyspark and duckdb: do not implement selectors
i think we do have these now, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but also no!
I can add the selector but we have very minimal support for datetime dtype in spark and duckdb (read as: no support for timezones)
(dtype == dtypes.Datetime) | ||
and (dtype.time_unit in time_units) # type: ignore[attr-defined] | ||
and ( | ||
dtype.time_zone in time_zones # type: ignore[attr-defined] | ||
or ("*" in time_zones and dtype.time_zone is not None) # type: ignore[attr-defined] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli this might make the trick you were looking for :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, thanks @FBruzzesi ! sorry didn't quite get round to reviewing everything in time for today, we can always include in next week's release (in fact, that's one reason to have regular and short release cycles, so we don't feel like we need to rush things in - if something doesn't make it, the next release is just next week)
"pyspark" in str(constructor) | ||
or "duckdb" in str(constructor) | ||
or "dask" in str(constructor) | ||
or ("pyarrow_table" in str(constructor) and PYARROW_VERSION < (12,)) | ||
or ("pyarrow" in str(constructor) and is_windows()) | ||
or ("pandas" in str(constructor) and PANDAS_VERSION < (2,)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pyspark and duckdb: do not implement selectors
i think we do have these now, right?
(dtype == dtypes.Datetime) | ||
and (dtype.time_unit in time_units) # type: ignore[attr-defined] | ||
and ( | ||
dtype.time_zone in time_zones # type: ignore[attr-defined] | ||
or ("*" in time_zones and dtype.time_zone is not None) # type: ignore[attr-defined] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, nice!
{None} | ||
if time_zone is None | ||
else {str(time_zone)} | ||
if isinstance(time_zone, (str, timezone)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have any test for when it's an instance of timezone
? can we check that timezone.utc
and ZoneInfo.UTC
both work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well.. ZoneInfo
does not work for polars 😱
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
@MarcoGorelli you might have a better idea for how to test this. I checked how polars does it and it is definitly brute forcing 😂
I took the opportunity to:
TimeUnit
alias to use around the codebaseThe actual changes are not too large