Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-739316: Snowflake Batch Results (get_result_batches()) #1425

Closed
ashishgupta2014 opened this issue Feb 6, 2023 · 10 comments · Fixed by #1702
Closed

SNOW-739316: Snowflake Batch Results (get_result_batches()) #1425

ashishgupta2014 opened this issue Feb 6, 2023 · 10 comments · Fixed by #1702

Comments

@ashishgupta2014
Copy link

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    python-3.8.6

  2. What operating system and processor architecture are you using?

    Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.17

  3. What are the component versions in the environment (pip freeze)?
    aiofiles==0.8.0
    aiohttp==3.8.3
    aiosignal==1.3.1
    anyio==3.6.2
    APScheduler==3.8.1
    asgiref==3.6.0
    asn1crypto==1.5.1
    async-timeout==4.0.2
    asynctest==0.13.0
    attrs==22.2.0
    azure-core==1.26.2
    azure-storage-blob==12.9.0
    backports.zoneinfo==0.2.1
    certifi==2022.12.7
    cffi==1.15.1
    charset-normalizer==2.1.1
    click==8.1.3
    confluent-kafka==1.8.2
    coverage==7.0.5
    cryptography==38.0.4
    defusedxml==0.7.1
    ecs-logging==1.1.0
    elastic-apm==6.7.2
    elementpath==2.5.3
    et-xmlfile==1.1.0
    exceptiongroup==1.1.0
    fastapi==0.75.2
    fastapi-utils==0.2.1
    filelock==3.9.0
    frozenlist==1.3.3
    greenlet==2.0.1
    gunicorn==20.1.0
    h11==0.14.0
    idna==3.4
    iniconfig==2.0.0
    isodate==0.6.1
    Jinja2==3.1.2
    lxml==4.7.1
    MarkupSafe==2.1.1
    msrest==0.7.1
    multidict==6.0.4
    numpy==1.24.1
    oauthlib==3.2.2
    openpyxl==3.0.10
    oscrypto==1.3.0
    packaging==23.0
    pandas==1.5.2
    pluggy==1.0.0
    prometheus-client==0.13.1
    prometheus-fastapi-instrumentator==5.7.1
    psutil==5.9.4
    pyarrow==8.0.0
    pycparser==2.21
    pycryptodomex==3.16.0
    pydantic==1.8.2
    PyJWT==2.6.0
    pyOpenSSL==22.1.0
    pytest==7.2.1
    pytest-asyncio==0.20.3
    pytest-cov==4.0.0
    pytest-env==0.8.1
    pytest-mock==3.10.0
    python-dateutil==2.8.2
    pytz==2021.3
    pytz-deprecation-shim==0.1.0.post0
    PyYAML==5.4.1
    requests==2.28.2
    requests-oauthlib==1.3.1
    schemachange==3.4.2
    six==1.16.0
    sly==0.5
    sniffio==1.3.0
    snowflake-connector-python==2.9.0
    snowflake-sqlalchemy==1.4.4
    SQLAlchemy==1.4.46
    starlette==0.17.1
    testfixtures==6.18.5
    tomli==2.0.1
    treebuilder==0.13
    typing_extensions==4.4.0
    tzdata==2022.7
    tzlocal==4.2
    urllib3==1.26.14
    uvicorn==0.16.0
    yarl==1.8.2

  4. What did you do?

    It's very random in nature difficult to reproduce. General observation is 1 chunk sometime might cause an issue.
    255005: 255005: Failed to read next arrow batch: b'Expected to be able to read 149072 bytes for message body, got 81028'

  5. What did you expect to see?

    It should fetch the batch and start processing the batch data, instead it not able to fetch batch using snowflake connector library

  6. Can you set logging to DEBUG and collect the logs?

255005: 255005: Failed to read next arrow batch: b'Expected to be able to read 149072 bytes for message body, got 81028'
File "/app/source/sql_to_xml_convertor.py", line 114, in sql_result_iterator
df = self.get_concatenated_dataframe(sub_array)
File "/app/source/sql_to_xml_convertor.py", line 63, in get_concatenated_dataframe
pyarrow_list = [arr.to_arrow() for arr in sub_array if arr.rowcount > 0]
File "/app/source/sql_to_xml_convertor.py", line 63, in
pyarrow_list = [arr.to_arrow() for arr in sub_array if arr.rowcount > 0]
File "/opt/venv/lib/python3.8/site-packages/snowflake/connector/result_batch.py", line 660, in to_arrow
val = next(self._get_arrow_iter(connection=connection), None)
File "/opt/venv/lib/python3.8/site-packages/snowflake/connector/result_batch.py", line 646, in _get_arrow_iter
return self._create_iter(iter_unit=IterUnit.TABLE_UNIT, connection=connection)
File "/opt/venv/lib/python3.8/site-packages/snowflake/connector/result_batch.py", line 637, in _create_iter
loaded_data = self._load(response, iter_unit)
File "/opt/venv/lib/python3.8/site-packages/snowflake/connector/result_batch.py", line 562, in _load
iter = PyArrowIterator(
File "src/snowflake/connector/arrow_iterator.pyx", line 138, in snowflake.connector.arrow_iterator.PyArrowIterator.cinit
File "/opt/venv/lib/python3.8/site-packages/snowflake/connector/errors.py", line 282, in errorhandler_wrapper
raise Error.errorhandler_make_exception(

@github-actions github-actions bot changed the title Snowflake Batch Results (get_result_batches()) SNOW-739316: Snowflake Batch Results (get_result_batches()) Feb 6, 2023
@sfc-gh-mkeller
Copy link
Collaborator

Hi @ashishgupta2014 would you be able to provide a minimal script to show how you use ResultBatches? From the stack trace you posted it looks like you are using to_arrow.
We'd like to know where you got the ResultBatch from. Is it ResultBatch from a previously executed query in the current process, or was it pickled and unpickled form another process?

General observation is 1 chunk sometime might cause an issue.

Did you mean that there was only 1 ResultBatch in the ResultSet, or that there's only one out of many batches that causes your issue?

It'd be really helpful in reproducing this if you could post the SQL that led to this (through Snowflake support, or here without sensitive information).

@sfc-gh-mkeller
Copy link
Collaborator

The only workaround that I can think of with this much information is to execute a RESULT_SCAN on the previous query, like: select * from table(result_scan()); right after you run into this Exception. Could you try giving that a go, please?

@ashishgupta2014
Copy link
Author

ashishgupta2014 commented Feb 11, 2023

Hi @sfc-gh-mkeller

The observation os issue arise with single or multiple batches.

Batch Size very data to data. At a time we are combining 15 batches into 1 and send for processing on next step.

I will update result_scan result on my next comment. I hope this gives you little bit of sign what we are doing from our side.

Code snippet

def raw_query_with_batch(query):
        """
        Raw connection to run cursor query by connection pool.
        :param query:
        :return:
        """
        connection = self._engine.raw_connection()
        cursor = connection.cursor()
        cursor.execute(query)
        batches = cursor.get_result_batches()
        return batches

def merge_batches(self, sub_array):
        df = pandas.DataFrame()
        pyarrow_list = [arr.to_arrow() for arr in sub_array if arr.rowcount > 0]
        if len(pyarrow_list) > 0:
            df = pa.concat_tables(pyarrow_list).to_pandas()
        return df

batches = get_snowflake_batches(query)
batch_count = len(batches)
number_of_iteration = batch_count // 15

start_position = 0
for batch in range(number_of_iteration): 
      if batch == number_of_iteration - 1:
           sub_array = batches[start_position:]

     else:
          sub_array = batches[start_position:end_position]
     df = self.merge_batches(sub_array)
     start_position = end_position

@ashishgupta2014
Copy link
Author

@sfc-gh-mkeller. I tried to result_scan the result but unfortunately result was expired.

I am sharing profiling of query_id it may help you.

Screenshot 2023-02-13 at 11 53 44 AM

Screenshot 2023-02-13 at 11 54 02 AM

Screenshot 2023-02-13 at 11 56 03 AM

Screenshot 2023-02-13 at 11 56 43 AM

@sfc-gh-achandrasekaran
Copy link
Contributor

Closing based on conversations with Snowflake support. Please reopen as needed

@dss010101
Copy link

dss010101 commented May 1, 2023

what was the root cause and solution? seeing a similar error. after insertion of rows, then selecting the rows in a different connection/session, i get the same error.

@sfc-gh-stan
Copy link
Contributor

Hi @dss010101 , is this consistently reproducible for you?
From the investigation so far, we suspect this is a flaky failure in ArrowIterator, i.e. when the connector tries to parse the result chunks (from Snowflake) into rows in an Arrow table. As a workaround, could you please try executing a RESULT_SCAN on the query?

@dss010101
Copy link

RESULT_SCAN

i get the following when trying to do a result scan on the query:
SQL execution internal error: Processing aborted due to error 300010:3813700090; incident 8145392.

@sfc-gh-stan
Copy link
Contributor

sfc-gh-stan commented May 1, 2023

@dss010101 Could you please provide the query id's? The original query and the RESULT_SCAN. Just to clarify, I meant running RESULT_SCAN using Python connector instead of from browser.
[Update]: Could you also provide the snowflake deployment you are running against? e.g. the URL.

timostrunk pushed a commit to timostrunk/snowflake-connector-python that referenced this issue Aug 18, 2023
timostrunk pushed a commit to timostrunk/snowflake-connector-python that referenced this issue Aug 18, 2023
timostrunk added a commit to timostrunk/snowflake-connector-python that referenced this issue Aug 18, 2023
@Krenciszek
Copy link

Krenciszek commented Aug 5, 2024

I am pretty sure this is caused by result_batch ignoring timeout settings but rather using a hardcoded value of 7 seconds:
(https://github.com/snowflakedb/snowflake-connector-python/blob/main/src/snowflake/connector/result_batch.py#L37)

A hotfix would be overwriting this value:
from snowflake.connector import result_batch
result_batch.DOWNLOAD_TIMEOUT = 60

Why are the default timeout settings not used here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants