Numeric overflow handling in Table #7529

radeusgd · 2023-08-09T10:47:23Z

Our backends handle integer overflow (e.g. Long.MAX_VALUE + 1) in completely inconsistent manner.

Raw operation in Enso will fall back to BigIntegers and just yield a result 9223372036854775808.
In-memory Table currently overflows and yields -9223372036854775808.
SQLite will 'truncate' and just return Long.MAX_VALUE again, i.e. 9223372036854775807 + 1 = 9223372036854775807.
Postgres will raise an SQL error.

Of course we do not have much control over SQLite and Postgres behaviour.

Still, I think the behaviour for in-memory is the worst of all these - as it overflows and 'wraps around' into negative numbers - that will be confusing to users. I think instead we should report an overflow warning and replace the result with Nothing.

Repro

from Standard.Base import all
from Standard.Table import all
from Standard.Database import all

polyglot java import java.lang.Long

add_z t =
    t.set "[X] + [Y]"

main =
    x = Long.MAX_VALUE
    y = 1

    IO.println (x+y)
    
    t0 = Table.new [["X", [x]], ["Y", [y]]]
    IO.println "in-memory:"
    t0_z = add_z t0
    t0_z.print
    t0_z.info.print

    IO.println "SQLite:"
    t1 = t0.select_into_database_table (Database.connect (SQLite In_Memory)) "t1" primary_key=Nothing
    t1_z = add_z t1
    t1_z.print
    t1_z.info.print

    IO.println "PostgreSQL:"
    postgres_connection = connect_to_postgres
    t2 = t0.select_into_database_table postgres_connection "t1" primary_key=Nothing temporary=True
    t2_z = add_z t2
    t2_z.print
    t2_z.info.print

connect_to_postgres =
    hostname = Environment.get "ENSO_DATABASE_TEST_HOST"
    db_name = Environment.get "ENSO_DATABASE_TEST_DB_NAME"
    db_host_port = hostname.if_nothing "localhost" . split ':'
    db_host = db_host_port.at 0
    db_port = if db_host_port.length == 1 then 5432 else Integer.parse (db_host_port.at 1)
    db_user = Environment.get "ENSO_DATABASE_TEST_DB_USER"
    db_password = Environment.get "ENSO_DATABASE_TEST_DB_PASSWORD"
    case db_name.is_nothing of
        True ->
            IO.println "PostgreSQL test database is not configured. See README.md for instructions."
        False ->
            connection = Panic.rethrow <|
                IO.println "Connecting to PostgreSQL test database at "+hostname
                Database.connect (Postgres db_host db_port db_name credentials=(Credentials.Username_And_Password db_user db_password))
            connection

yields

9223372036854775808
in-memory:
   | X                   | Y | [X] + [Y]
---+---------------------+---+----------------------
 0 | 9223372036854775807 | 1 | -9223372036854775808

   | Column    | Items Count | Value Type
---+-----------+-------------+-------------------
 0 | X         | 1           | (Integer 64 bits)
 1 | Y         | 1           | (Integer 64 bits)
 2 | [X] + [Y] | 1           | (Integer 64 bits)

SQLite:
 X                   | Y | [X] + [Y]
---------------------+---+---------------------
 9223372036854775807 | 1 | 9223372036854775807

   | Column    | Items Count | Value Type
---+-----------+-------------+-------------------
 0 | X         | 1           | (Integer 64 bits)
 1 | Y         | 1           | (Integer 64 bits)
 2 | [X] + [Y] | 1           | (Integer 64 bits)

PostgreSQL:
Connecting to PostgreSQL test database at 172.28.138.202
(Error: There was an SQL error: ERROR: bigint out of range. [Query was: SELECT "t1"."X" AS "X", "t1"."Y" AS "Y", ("t1"."X" + "t1"."Y") AS "[X] + [Y]" FROM "t1" AS "t1" LIMIT 10])

(Error: There was an SQL error: ERROR: bigint out of range. [Query was: SELECT COUNT("t1"."X") AS "X", COUNT("t1"."Y") AS "Y", COUNT("t1"."[X] + [Y]") AS "[X] + [Y]" FROM (SELECT "t1"."X" AS "X", "t1"."Y" AS "Y", ("t1"."X" + "t1"."Y") AS "[X] + [Y]" FROM "t1" AS "t1") AS "t1"])

The text was updated successfully, but these errors were encountered:

jdunkerley · 2023-08-15T08:45:22Z

We should move to use the checked math arithmetic in the column and error if we overflow.

… in-memory backend and check for ArithmeticOverflow in LongStorage (#7557) - Closes #5159 - Now data downloaded from the database can keep the type much closer to the original type (like string length limits or smaller integer types). - Cast also exposes these types. - The integers are still all stored as 64-bit Java `long`s, we just check their bounds. Changing underlying storage for memory efficiency may come in the future: #6109 - Fixes #7565 - Fixes #7529 by checking for arithmetic overflow in in-memory integer arithmetic operations that could overflow. Adds a documentation note saying that the behaviour for Database backends is unspecified and depends on particular database.

github-project-automation bot added this to Issues Board Aug 9, 2023

github-project-automation bot moved this to ❓New in Issues Board Aug 9, 2023

radeusgd added -libs Libraries: New libraries to be implemented l-derived-columns labels Aug 9, 2023

jdunkerley assigned radeusgd Aug 15, 2023

jdunkerley moved this from ❓New to 📤 Backlog in Issues Board Aug 15, 2023

jdunkerley moved this from 📤 Backlog to 👁️ Code review in Issues Board Aug 22, 2023

jdunkerley linked a pull request Aug 22, 2023 that will close this issue

Add size-limited strings and varying bit-width integer Value_Types to in-memory backend and check for ArithmeticOverflow in LongStorage #7557

Merged

5 tasks

radeusgd mentioned this issue Aug 22, 2023

Add size-limited strings and varying bit-width integer Value_Types to in-memory backend and check for ArithmeticOverflow in LongStorage #7557

Merged

5 tasks

mergify bot closed this as completed in #7557 Aug 22, 2023

github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numeric overflow handling in Table #7529

Numeric overflow handling in Table #7529

radeusgd commented Aug 9, 2023

jdunkerley commented Aug 15, 2023

Numeric overflow handling in Table #7529

Numeric overflow handling in Table #7529

Comments

radeusgd commented Aug 9, 2023

Repro

jdunkerley commented Aug 15, 2023