Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-582728: Invalid Pyarrow Timestamps Returned #1108

Closed
sfc-gh-wfateem opened this issue Apr 27, 2022 · 1 comment · Fixed by #1204
Closed

SNOW-582728: Invalid Pyarrow Timestamps Returned #1108

sfc-gh-wfateem opened this issue Apr 27, 2022 · 1 comment · Fixed by #1204

Comments

@sfc-gh-wfateem
Copy link
Collaborator

This was tested using Snowflake Python Connector version 2.7.6 on a Python 3.8 environment:

connection_params={
"user":"***",
"password": "***",
"account": "account",
"database": "***",
}
with snowflake.connector.connect(**connection_params) as connection, connection.cursor() as cursor:
r = cursor.execute(
"select to_timestamp('1400-01-01 01:02:03') as low_ts, to_timestamp('9999-01-01 01:02:03') as high_ts"
)
print(r.fetch_arrow_all())
print(r.fetchall())

Output:

pyarrow.Table
LOW_TS: timestamp[ns] not null
HIGH_TS: timestamp[ns] not null
----
LOW_TS: [[1984-07-22 00:36:36.709551616]]
HIGH_TS: [[1815-03-31 06:58:11.066277376]]
[(datetime.datetime(1400, 1, 1, 1, 2, 3), datetime.datetime(9999, 1, 1, 1, 2, 3))]

This is possible due to nano precision limits in Arrow data and is addressed by PR 1104

@github-actions github-actions bot changed the title Invalid Pyarrow Timestamps Returned SNOW-582728: Invalid Pyarrow Timestamps Returned Apr 27, 2022
@brandon-leapyear
Copy link

brandon-leapyear commented Jul 14, 2022

This is an old work account. Please reference @brandonchinn178 for all future communication


Are there any updates to this? I also see other related open issues:

Minimal repro:

python3 -m venv snowflake-test-venv
snowflake-test-venv/bin/pip install "snowflake-connector-python[pandas]"
snowflake-test-venv/bin/python
from snowflake.connector import connect
from snowflake.connector.pandas_tools import write_pandas
import pandas
from datetime import datetime, timedelta
conn = connect(...)
conn.cursor().execute('CREATE OR REPLACE TABLE "foo" ("a" TIMESTAMP_NTZ)')

test_dates = [
    datetime(1970, 1, 1),
    datetime(2000, 1, 1),
    datetime(2000, 1, 1) + timedelta(microseconds=1),
    datetime(2000, 1, 1) + timedelta(microseconds=10),
    datetime(2000, 1, 1) + timedelta(microseconds=61),
    datetime(2000, 1, 1) + timedelta(seconds=1),
    datetime(2000, 1, 1) + timedelta(days=1),
]
df = pandas.DataFrame({"a": test_dates})
write_pandas(conn, df, "foo")
results = conn.cursor().execute('SELECT "a"::string FROM "foo"').fetchall()
for date, (result,) in zip(test_dates, results):
    print(f"{date:%Y-%m-%dT%H:%M:%S.%f} => {result:>27}")

This outputs:

1970-01-01T00:00:00.000000 =>     1969-12-31 16:00:00.000
2000-01-01T00:00:00.000000 => 30001217-01-27 16:00:00.000
2000-01-01T00:00:00.000001 => 30001217-01-27 16:00:01.000
2000-01-01T00:00:00.000010 => 30001217-01-27 16:00:10.000
2000-01-01T00:00:00.000061 => 30001217-01-27 16:01:01.000
2000-01-01T00:00:01.000000 => 30001217-02-08 05:46:40.000
2000-01-02T00:00:00.000000 => 30003954-12-25 16:00:00.000

which shows that just plain datetime() objects get interpreted as seconds from epoch instead of microseconds from epoch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants