You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OS type and version: Latest Mac OS and also in Linux, Ubuntu 24.04 amd/64
Python version: 3.12.4
pip version: 24.0
google-cloud-bigquery version: tested since 3.18.0 up to 3.29.
Steps to reproduce
Get results from any method which returns a RowIterator or _EmptyRowIterator.
Using pickle, dumps the results.
Using pickle, loads the results.
Code example
fromosimportenvironfrompickleimportdumps, loadsfromgoogle.cloudimportbigqueryenviron["GOOGLE_APPLICATION_CREDENTIALS"] = (
"your_path"
)
defquery_stackoverflow() ->None:
client=bigquery.Client()
results=client.query(
""" SELECT CONCAT( 'https://stackoverflow.com/questions/', CAST(id as STRING)) as url, view_count FROM `bigquery-public-data.stackoverflow.posts_questions` WHERE tags like '%google-bigquery%' ORDER BY view_count DESC LIMIT 10"""
)
results=results.result()
results=list(results)
pickled=dumps(results)
results=loads(pickled)
query_stackoverflow()
Stack trace
Traceback (most recent call last):
File "/some_path/repo/some_file.py", line 33, in query_stackoverflow
results = loads(pickled)
^^^^^^^^^^^^^^
File "/some_path/.cache/pypoetry/virtualenvs/some_env3.12/lib/python3.12/site-packages/google/cloud/bigquery/table.py", line 1586, in __getattr__
value = self._xxx_field_to_index.get(name)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/some_path/.cache/pypoetry/virtualenvs/some_env3.12/lib/python3.12/site-packages/google/cloud/bigquery/table.py", line 1586, in __getattr__
value = self._xxx_field_to_index.get(name)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/some_path/.cache/pypoetry/virtualenvs/some_env3.12/lib/python3.12/site-packages/google/cloud/bigquery/table.py", line 1586, in __getattr__
value = self._xxx_field_to_index.get(name)
^^^^^^^^^^^^^^^^^^^^^^^^
[Previous line repeated 995 more times]
RecursionError: maximum recursion depth exceeded
I was using Airflow, and in a task a maximum recursion depth exceeded exception was raised. Also I checked this in my personal computer.
Checking the code and I see that in the "Row" object (which probably is used by Table), there is a recursion when calling __getattr__ because it uses the get method from _xxx_field_to_index, and the get method also uses the same get method from _xxx_field_to_index, which leads to recursion (my guess).
Thanks!
The text was updated successfully, but these errors were encountered:
We're encountering this when saving a Row object when using Metaflow + Argo, which uses the pickling approach described above and results in the above infinite recursion pass.
The framework (in this case Metaflow) determines how the state is saved (pickling). We determine what is saved. Saving a Row result is performant and simple when doing a fan out processing, one branch per row result. Frameworks such as Airflow, Metaflow, and Argo delineate steps across kuberenetes pods, which is why the state is saved between steps.
With this limitation, we need to take additional steps to convert the Row to a dict, which doesn't have such pickling limitations.
What is the priority of this work? It limits the use of Big Query in large data processing applications using popular frameworks.
Hello.
Environment details
google-cloud-bigquery
version: tested since 3.18.0 up to 3.29.Steps to reproduce
RowIterator
or_EmptyRowIterator
.Code example
Stack trace
I was using Airflow, and in a task a maximum recursion depth exceeded exception was raised. Also I checked this in my personal computer.
Checking the code and I see that in the "Row" object (which probably is used by
Table
), there is a recursion when calling__getattr__
because it uses theget
method from_xxx_field_to_index
, and theget
method also uses the same get method from_xxx_field_to_index
, which leads to recursion (my guess).Thanks!
The text was updated successfully, but these errors were encountered: