Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecursionError when pickling Table object #2001

Open
adinamarca opened this issue Aug 23, 2024 · 2 comments
Open

RecursionError when pickling Table object #2001

adinamarca opened this issue Aug 23, 2024 · 2 comments
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API.

Comments

@adinamarca
Copy link

adinamarca commented Aug 23, 2024

Hello.

Environment details

  • OS type and version: Latest Mac OS and also in Linux, Ubuntu 24.04 amd/64
  • Python version: 3.12.4
  • pip version: 24.0
  • google-cloud-bigquery version: tested since 3.18.0 up to 3.29.

Steps to reproduce

  1. Get results from any method which returns a RowIterator or _EmptyRowIterator.
  2. Using pickle, dumps the results.
  3. Using pickle, loads the results.

Code example

from os import environ
from pickle import dumps, loads

from google.cloud import bigquery

environ["GOOGLE_APPLICATION_CREDENTIALS"] = (
    "your_path"
)

def query_stackoverflow() -> None:
    client = bigquery.Client()
    results = client.query(
        """
        SELECT
          CONCAT(
            'https://stackoverflow.com/questions/',
            CAST(id as STRING)) as url,
          view_count
        FROM `bigquery-public-data.stackoverflow.posts_questions`
        WHERE tags like '%google-bigquery%'
        ORDER BY view_count DESC
        LIMIT 10"""
    )
    results = results.result()
    results = list(results)

    pickled = dumps(results)
    results = loads(pickled)

query_stackoverflow()

    

Stack trace

Traceback (most recent call last):
  File "/some_path/repo/some_file.py", line 33, in query_stackoverflow
    results = loads(pickled)
              ^^^^^^^^^^^^^^
  File "/some_path/.cache/pypoetry/virtualenvs/some_env3.12/lib/python3.12/site-packages/google/cloud/bigquery/table.py", line 1586, in __getattr__
    value = self._xxx_field_to_index.get(name)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/some_path/.cache/pypoetry/virtualenvs/some_env3.12/lib/python3.12/site-packages/google/cloud/bigquery/table.py", line 1586, in __getattr__
    value = self._xxx_field_to_index.get(name)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/some_path/.cache/pypoetry/virtualenvs/some_env3.12/lib/python3.12/site-packages/google/cloud/bigquery/table.py", line 1586, in __getattr__
    value = self._xxx_field_to_index.get(name)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 995 more times]
RecursionError: maximum recursion depth exceeded

I was using Airflow, and in a task a maximum recursion depth exceeded exception was raised. Also I checked this in my personal computer.

Checking the code and I see that in the "Row" object (which probably is used by Table), there is a recursion when calling __getattr__ because it uses the get method from _xxx_field_to_index, and the get method also uses the same get method from _xxx_field_to_index, which leads to recursion (my guess).

Thanks!

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Aug 23, 2024
@shollyman shollyman assigned chalmerlowe and unassigned yirutang Aug 27, 2024
@bhogan-bdai
Copy link

We're encountering this when saving a Row object when using Metaflow + Argo, which uses the pickling approach described above and results in the above infinite recursion pass.

The framework (in this case Metaflow) determines how the state is saved (pickling). We determine what is saved. Saving a Row result is performant and simple when doing a fan out processing, one branch per row result. Frameworks such as Airflow, Metaflow, and Argo delineate steps across kuberenetes pods, which is why the state is saved between steps.

With this limitation, we need to take additional steps to convert the Row to a dict, which doesn't have such pickling limitations.

What is the priority of this work? It limits the use of Big Query in large data processing applications using popular frameworks.

@adinamarca
Copy link
Author

adinamarca commented Feb 12, 2025

I updated the issue to a reproducible code.

Hope this gets fixed. I'm still having this same issue since 3.18.0, and still in 3.29 version.

The only way to fix the pickling was to avoid having a Row object before attempting serialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API.
Projects
None yet
Development

No branches or pull requests

4 participants