Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize NaN, Infinite, -Infinite causing error when saving as PostgreSQL JSON #7339

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yoshiokatsuneo
Copy link
Contributor

@yoshiokatsuneo yoshiokatsuneo commented Feb 19, 2025

What type of PR is this?

  • Bug Fix

Description

When running query that output NaN, Infinite, or -Infinite, redash executor crashes with the error "Out of range float values are not JSON compliant" like below. Redash does not output any rows when the error was occurred.

[2025-02-19 04:48:49,040][PID:593][ERROR][rq.worker] [Job 6cfa1d1c-2a32-48c6-8bf4-0cd8f8ba6f32]: exception raised while executing (redash.tasks.queries.execution.execute_query)
Traceback (most recent call last):
...
  File "/usr/local/lib/python3.10/site-packages/rq/job.py", line 1280, in perform
    self._result = self._execute()
  File "/usr/local/lib/python3.10/site-packages/rq/job.py", line 1317, in _execute
    result = self.func(*self.args, **self.kwargs)
  File "/app/redash/tasks/queries/execution.py", line 309, in execute_query
    ).run()
  File "/app/redash/tasks/queries/execution.py", line 253, in run
    updated_query_ids = models.Query.update_latest_result(query_result)
  File "/app/redash/models/__init__.py", line 738, in update_latest_result
    for q in queries:
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3534, in __iter__
    self.session._autoflush()
...
sqlalchemy.exc.StatementError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(builtins.ValueError) Out of range float values are not JSON compliant
[SQL: INSERT INTO query_results (org_id, data_source_id, query_hash, query, data, runtime, retrieved_at) VALUES (%(org_id)s, %(data_source_id)s, %(query_hash)s, %(query)s, %(data)s, %(runtime)s, %(retrieved_at)s) RETURNING query_results.id]
...

The error occurs because JSON(PostgreSQL JSON type) does not support NaN, Infinite, -Infinite.

This PR fixes the issue by sanitize NaN, Infinite, -Infinite value to None. This is the save behavior that JavaScript JSON.stringify() does.

We can reproduce the issue by running following query on BigQuery or PostgreSQL.

BigQuery:

select 1.0, cast('NaN' as float64), cast('Infinity' as float64), cast('-Infinity' as float64)

PostgreSQL:

select 1.0::float, 'NaN'::float, 'Infinity'::float, '-Infinity'::float

The issue was introduced by #6685 which removes simplejson. Before #6685 , NaN / Inf was converted to null by simplejson with ignore_nan=True. This PR change behavior for NaN/Inf as same as before #6685 .

How is this tested?

  • Unit tests (pytest, jest)
  • Manually

Unit test was added.
Also, we can confirm by running the query on the description.

Before this PR:

image

After this PR:

image

@gaecoli
Copy link
Member

gaecoli commented Feb 20, 2025

I think this operation is unnecessary, NaN is not a numeric type and should not be forced to be stored.

@gaecoli
Copy link
Member

gaecoli commented Feb 20, 2025

Each database or data engine handles it differently, for NaN and others situation, so I think this situation should not be handled. They should abide by their original system internal processing methods.

@yoshiokatsuneo
Copy link
Contributor Author

yoshiokatsuneo commented Feb 20, 2025

@gaecoli
Thank you for your comment.
And, I think this bug should be fix with following reasons:

  • The bug make redash crash. I think crashing redash is bug that should be fixed.
  • I think it is not good idea saying "crash is necessary" as keep as is when it can be fixed.
  • The bug was introduced by removed simplejson #6685 which changes from simplejson to python natvie json. Before the PR, NaN/Inf was converted to null by simplejson with ignore_nan option. This PR fix the issue by changing behavior just before removed simplejson #6685 .

https://simplejson.readthedocs.io/en/latest/

If ignore_nan is true (default: False), then out of range float values (nan, inf, -inf) will be serialized as null in compliance with the ECMA-262 specification. If true, this will override allow_nan.

  • NaN / Inf is not special value but quite common float value, as NaN / Inf is IEEE-754 standard.
  • NaN value is generated by many databases including PostgreSQL, BigQuery, Redshift, Oracle, MongoDB.
  • The problem potentially happens on any database as the issue happens when the query is saved to redash PostgreSQL database. Saving query to PostgreSQL happens for any data sources, as it happens not when the query runs but when query result was save.

@gaecoli
Copy link
Member

gaecoli commented Feb 21, 2025

@arikfr If u have free time, please review this PR! Thank you!

@cj-matteo-cafasso
Copy link

This PR addresses #6992

@gaecoli
Copy link
Member

gaecoli commented Feb 26, 2025

This PR addresses #6992

I looked into this issue in detail and it really should be fixed. @yoshiokatsuneo @cj-matteo-cafasso

Copy link
Member

@arikfr arikfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing this @yoshiokatsuneo !

The error comes from json_dumps passing allow_nan=False in json_dumps which was indeed introduced in the PR mentinoed.

How about we move this logic (of turning NaN/Inf to None) to json_dumps instead of calling it here?

@EugeneChung
Copy link

How about testing orjson? orjson.dumps() converts NaN float to null.
#6992 (comment)

@arikfr
Copy link
Member

arikfr commented Feb 26, 2025

@EugeneChung would you like to open a PR that switches to orjson?

@yoshiokatsuneo
Copy link
Contributor Author

@arikfr

The error comes from json_dumps passing allow_nan=False in json_dumps which was indeed introduced in the PR mentinoed.
How about we move this logic (of turning NaN/Inf to None) to json_dumps instead of calling it here?

Thank you for your comment !
I just tried to create another PR below that changes json_dumps (instead of QueryExecutor.run in this PR).
How about ?

#7348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants