Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numeric overflow handling in Table #7529

Closed
radeusgd opened this issue Aug 9, 2023 · 1 comment · Fixed by #7557
Closed

Numeric overflow handling in Table #7529

radeusgd opened this issue Aug 9, 2023 · 1 comment · Fixed by #7557
Assignees
Labels
-libs Libraries: New libraries to be implemented l-derived-columns

Comments

@radeusgd
Copy link
Member

radeusgd commented Aug 9, 2023

Our backends handle integer overflow (e.g. Long.MAX_VALUE + 1) in completely inconsistent manner.

  • Raw operation in Enso will fall back to BigIntegers and just yield a result 9223372036854775808.
  • In-memory Table currently overflows and yields -9223372036854775808.
  • SQLite will 'truncate' and just return Long.MAX_VALUE again, i.e. 9223372036854775807 + 1 = 9223372036854775807.
  • Postgres will raise an SQL error.

Of course we do not have much control over SQLite and Postgres behaviour.

Still, I think the behaviour for in-memory is the worst of all these - as it overflows and 'wraps around' into negative numbers - that will be confusing to users. I think instead we should report an overflow warning and replace the result with Nothing.

Repro

from Standard.Base import all
from Standard.Table import all
from Standard.Database import all

polyglot java import java.lang.Long

add_z t =
    t.set "[X] + [Y]"

main =
    x = Long.MAX_VALUE
    y = 1

    IO.println (x+y)
    
    t0 = Table.new [["X", [x]], ["Y", [y]]]
    IO.println "in-memory:"
    t0_z = add_z t0
    t0_z.print
    t0_z.info.print

    IO.println "SQLite:"
    t1 = t0.select_into_database_table (Database.connect (SQLite In_Memory)) "t1" primary_key=Nothing
    t1_z = add_z t1
    t1_z.print
    t1_z.info.print

    IO.println "PostgreSQL:"
    postgres_connection = connect_to_postgres
    t2 = t0.select_into_database_table postgres_connection "t1" primary_key=Nothing temporary=True
    t2_z = add_z t2
    t2_z.print
    t2_z.info.print

connect_to_postgres =
    hostname = Environment.get "ENSO_DATABASE_TEST_HOST"
    db_name = Environment.get "ENSO_DATABASE_TEST_DB_NAME"
    db_host_port = hostname.if_nothing "localhost" . split ':'
    db_host = db_host_port.at 0
    db_port = if db_host_port.length == 1 then 5432 else Integer.parse (db_host_port.at 1)
    db_user = Environment.get "ENSO_DATABASE_TEST_DB_USER"
    db_password = Environment.get "ENSO_DATABASE_TEST_DB_PASSWORD"
    case db_name.is_nothing of
        True ->
            IO.println "PostgreSQL test database is not configured. See README.md for instructions."
        False ->
            connection = Panic.rethrow <|
                IO.println "Connecting to PostgreSQL test database at "+hostname
                Database.connect (Postgres db_host db_port db_name credentials=(Credentials.Username_And_Password db_user db_password))
            connection

yields

9223372036854775808
in-memory:
   | X                   | Y | [X] + [Y]
---+---------------------+---+----------------------
 0 | 9223372036854775807 | 1 | -9223372036854775808

   | Column    | Items Count | Value Type
---+-----------+-------------+-------------------
 0 | X         | 1           | (Integer 64 bits)
 1 | Y         | 1           | (Integer 64 bits)
 2 | [X] + [Y] | 1           | (Integer 64 bits)

SQLite:
 X                   | Y | [X] + [Y]
---------------------+---+---------------------
 9223372036854775807 | 1 | 9223372036854775807

   | Column    | Items Count | Value Type
---+-----------+-------------+-------------------
 0 | X         | 1           | (Integer 64 bits)
 1 | Y         | 1           | (Integer 64 bits)
 2 | [X] + [Y] | 1           | (Integer 64 bits)

PostgreSQL:
Connecting to PostgreSQL test database at 172.28.138.202
(Error: There was an SQL error: ERROR: bigint out of range. [Query was: SELECT "t1"."X" AS "X", "t1"."Y" AS "Y", ("t1"."X" + "t1"."Y") AS "[X] + [Y]" FROM "t1" AS "t1" LIMIT 10])

(Error: There was an SQL error: ERROR: bigint out of range. [Query was: SELECT COUNT("t1"."X") AS "X", COUNT("t1"."Y") AS "Y", COUNT("t1"."[X] + [Y]") AS "[X] + [Y]" FROM (SELECT "t1"."X" AS "X", "t1"."Y" AS "Y", ("t1"."X" + "t1"."Y") AS "[X] + [Y]" FROM "t1" AS "t1") AS "t1"])
@github-project-automation github-project-automation bot moved this to ❓New in Issues Board Aug 9, 2023
@radeusgd radeusgd added -libs Libraries: New libraries to be implemented l-derived-columns labels Aug 9, 2023
@jdunkerley
Copy link
Member

We should move to use the checked math arithmetic in the column and error if we overflow.

@jdunkerley jdunkerley moved this from ❓New to 📤 Backlog in Issues Board Aug 15, 2023
@jdunkerley jdunkerley moved this from 📤 Backlog to 👁️ Code review in Issues Board Aug 22, 2023
@mergify mergify bot closed this as completed in #7557 Aug 22, 2023
mergify bot pushed a commit that referenced this issue Aug 22, 2023
… in-memory backend and check for ArithmeticOverflow in LongStorage (#7557)

- Closes #5159
- Now data downloaded from the database can keep the type much closer to the original type (like string length limits or smaller integer types).
- Cast also exposes these types.
- The integers are still all stored as 64-bit Java `long`s, we just check their bounds. Changing underlying storage for memory efficiency may come in the future: #6109
- Fixes #7565
- Fixes #7529 by checking for arithmetic overflow in in-memory integer arithmetic operations that could overflow. Adds a documentation note saying that the behaviour for Database backends is unspecified and depends on particular database.
@github-project-automation github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-libs Libraries: New libraries to be implemented l-derived-columns
Projects
Archived in project
2 participants