Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: rename time zone tests #1830

Merged
merged 4 commits into from
Jan 20, 2025
Merged

chore: rename time zone tests #1830

merged 4 commits into from
Jan 20, 2025

Conversation

FBruzzesi
Copy link
Member

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

@FBruzzesi FBruzzesi added enhancement New feature or request pyspark Issue is related to pyspark backend labels Jan 19, 2025
Comment on lines 54 to 58
if "pyspark" in str(constructor):
data_ = {
col_name: [v.replace(tzinfo=timezone.utc) for v in col_values]
for col_name, col_values in data.items()
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is most likely needed for everyone not based in utc time. Since in pyspark_lazy_constructor we set .config("spark.sql.session.timeZone", "UTC"), the results will not be the same otherwise.

Context: it took me a bit to realize why instead of hours 12 and 2 I was getting 11 and 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we just put the tzinfo to timezone.utc when we define data=?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, and all pandas like backends raise warnings/errors sadly

Comment on lines 127 to 132
dates = {"a": [datetime(2001, 1, 1), None, datetime(2001, 1, 3)]}
dates = {"a": [datetime(2001, 1, 1), datetime(2001, 1, 3)]}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal of None is due to the fact that otherwise pyspark ends up creating a column of struct type. We might need to address this in the constructor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ€” yeah might be good to see if we can address the constructor so it doesn't go via pandas, that might in general be good going forwards

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome thanks!

Comment on lines 54 to 58
if "pyspark" in str(constructor):
data_ = {
col_name: [v.replace(tzinfo=timezone.utc) for v in col_values]
for col_name, col_values in data.items()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we just put the tzinfo to timezone.utc when we define data=?

Comment on lines 127 to 132
dates = {"a": [datetime(2001, 1, 1), None, datetime(2001, 1, 3)]}
dates = {"a": [datetime(2001, 1, 1), datetime(2001, 1, 3)]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ€” yeah might be good to see if we can address the constructor so it doesn't go via pandas, that might in general be good going forwards

@MarcoGorelli MarcoGorelli changed the title feat: add pyspark datetime namespace chore: rename time zone tests Jan 20, 2025
@MarcoGorelli MarcoGorelli merged commit 79098f1 into main Jan 20, 2025
22 of 25 checks passed
@MarcoGorelli MarcoGorelli deleted the feat/pyspark-dt-namespace branch January 20, 2025 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pyspark Issue is related to pyspark backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants