Skip to content

Commit

Permalink
Add size-limited strings and varying bit-width integer Value_Types to…
Browse files Browse the repository at this point in the history
… in-memory backend and check for ArithmeticOverflow in LongStorage (#7557)

- Closes #5159
- Now data downloaded from the database can keep the type much closer to the original type (like string length limits or smaller integer types).
- Cast also exposes these types.
- The integers are still all stored as 64-bit Java `long`s, we just check their bounds. Changing underlying storage for memory efficiency may come in the future: #6109
- Fixes #7565
- Fixes #7529 by checking for arithmetic overflow in in-memory integer arithmetic operations that could overflow. Adds a documentation note saying that the behaviour for Database backends is unspecified and depends on particular database.
  • Loading branch information
radeusgd authored Aug 22, 2023
1 parent b3e9ea8 commit 2385f5b
Show file tree
Hide file tree
Showing 65 changed files with 1,289 additions and 305 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -553,6 +553,8 @@
- [Retire `Column_Selector` and allow regex based selection of columns.][7295]
- [`Text.parse_to_table` can take a `Regex`.][7297]
- [Expose `Text.normalize`.][7425]
- [Implemented new value types (various sizes of `Integer` type, fixed-length
and length-limited `Char` type) for the in-memory `Table` backend.][7557]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -786,6 +788,7 @@
[7295]: https://github.com/enso-org/enso/pull/7295
[7297]: https://github.com/enso-org/enso/pull/7297
[7425]: https://github.com/enso-org/enso/pull/7425
[7557]: https://github.com/enso-org/enso/pull/7557

#### Enso Compiler

Expand Down
45 changes: 45 additions & 0 deletions distribution/lib/Standard/Database/0.0.0-dev/src/Data/Column.enso
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,15 @@ type Column
Returns a column containing the result of adding `other` to each element
of `self`. If `other` is a column, the operation is performed pairwise
between corresponding elements of `self` and `other`.

? Arithmetic Overflow

For integer columns, the operation may yield results that will not fit
into the range supported by the column. In such case, the in-memory
backend will replace such results with `Nothing` and report a
`Arithmetic_Overflow` warning. The behaviour in Database backends is
not specified and will depend on the particular database - it may
cause a hard error, the value may be truncated or wrap-around etc.
+ : Column | Any -> Column
+ self other =
op = case Value_Type_Helpers.resolve_addition_kind self other of
Expand All @@ -388,6 +397,15 @@ type Column
Returns a column containing the result of subtracting `other` from each
element of `self`. If `other` is a column, the operation is performed
pairwise between corresponding elements of `self` and `other`.

? Arithmetic Overflow

For integer columns, the operation may yield results that will not fit
into the range supported by the column. In such case, the in-memory
backend will replace such results with `Nothing` and report a
`Arithmetic_Overflow` warning. The behaviour in Database backends is
not specified and will depend on the particular database - it may
cause a hard error, the value may be truncated or wrap-around etc.
- : Column | Any -> Column
- self other =
case Value_Type_Helpers.resolve_subtraction_kind self other of
Expand All @@ -405,6 +423,15 @@ type Column
Returns a column containing the result of multiplying `other` by each
element of `self`. If `other` is a column, the operation is performed
pairwise between corresponding elements of `self` and `other`.

? Arithmetic Overflow

For integer columns, the operation may yield results that will not fit
into the range supported by the column. In such case, the in-memory
backend will replace such results with `Nothing` and report a
`Arithmetic_Overflow` warning. The behaviour in Database backends is
not specified and will depend on the particular database - it may
cause a hard error, the value may be truncated or wrap-around etc.
* : Column | Any -> Column
* self other =
Value_Type_Helpers.check_binary_numeric_op self other <|
Expand All @@ -426,6 +453,15 @@ type Column
- If division by zero occurs, an `Arithmetic_Error` warning is attached
to the result.

? Arithmetic Overflow

For integer columns, the operation may yield results that will not fit
into the range supported by the column. In such case, the in-memory
backend will replace such results with `Nothing` and report a
`Arithmetic_Overflow` warning. The behaviour in Database backends is
not specified and will depend on the particular database - it may
cause a hard error, the value may be truncated or wrap-around etc.

> Example
Divide the elements of one column by the elements of another.

Expand Down Expand Up @@ -493,6 +529,15 @@ type Column
Returns a column containing the result of raising each element of `self`
by `other`.

? Arithmetic Overflow

For integer columns, the operation may yield results that will not fit
into the range supported by the column. In such case, the in-memory
backend will replace such results with `Nothing` and report a
`Arithmetic_Overflow` warning. The behaviour in Database backends is
not specified and will depend on the particular database - it may
cause a hard error, the value may be truncated or wrap-around etc.

> Example
Squares the elements of one column.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from Standard.Base import all

import Standard.Table.Data.Column.Column as Materialized_Column
import Standard.Table.Data.Type.Value_Type.Bits
import Standard.Table.Data.Type.Value_Type.Value_Type
import Standard.Table.Internal.Java_Exports

Expand Down Expand Up @@ -69,27 +70,27 @@ double_fetcher =
Column_Fetcher.Value fetch_value make_builder

## PRIVATE
long_fetcher : Column_Fetcher
long_fetcher =
long_fetcher : Bits -> Column_Fetcher
long_fetcher bits =
fetch_value rs i =
l = rs.getLong i
if rs.wasNull then Nothing else l
make_builder initial_size =
java_builder = Java_Exports.make_long_builder initial_size
java_builder = Java_Exports.make_long_builder initial_size bits=bits
append v =
if v.is_nothing then java_builder.appendNulls 1 else
java_builder.appendLong v
Builder.Value append (seal_java_builder java_builder)
Column_Fetcher.Value fetch_value make_builder

## PRIVATE
text_fetcher : Column_Fetcher
text_fetcher =
text_fetcher : Value_Type -> Column_Fetcher
text_fetcher value_type =
fetch_value rs i =
t = rs.getString i
if rs.wasNull then Nothing else t
make_builder initial_size =
java_builder = Java_Exports.make_string_builder initial_size
java_builder = Java_Exports.make_string_builder initial_size value_type=value_type
make_builder_from_java_object_builder java_builder
Column_Fetcher.Value fetch_value make_builder

Expand Down Expand Up @@ -137,11 +138,9 @@ date_time_fetcher =
default_fetcher_for_value_type : Value_Type -> Column_Fetcher
default_fetcher_for_value_type value_type =
case value_type of
## TODO [RW] once we support varying bit-width in storages, we should specify it
Revisit in #5159.
Value_Type.Integer _ -> long_fetcher
Value_Type.Integer bits -> long_fetcher bits
Value_Type.Float _ -> double_fetcher
Value_Type.Char _ _ -> text_fetcher
Value_Type.Char _ _ -> text_fetcher value_type
Value_Type.Boolean -> boolean_fetcher
Value_Type.Time -> time_fetcher
# We currently don't distinguish timestamps without a timezone on the Enso value side.
Expand Down
48 changes: 46 additions & 2 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Data/Column.enso
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ from project.Internal.Column_Format import all
from project.Internal.Java_Exports import make_date_builder_adapter, make_double_builder, make_long_builder, make_string_builder

polyglot java import org.enso.base.Time_Utils
polyglot java import org.enso.table.data.column.builder.StringBuilder
polyglot java import org.enso.table.data.column.operation.map.MapOperationProblemBuilder
polyglot java import org.enso.table.data.column.storage.Storage as Java_Storage
polyglot java import org.enso.table.data.table.Column as Java_Column
Expand Down Expand Up @@ -388,6 +387,15 @@ type Column
Returns a column with results of adding `other` from each element of
`self`.

? Arithmetic Overflow

For integer columns, the operation may yield results that will not fit
into the range supported by the column. In such case, the in-memory
backend will replace such results with `Nothing` and report a
`Arithmetic_Overflow` warning. The behaviour in Database backends is
not specified and will depend on the particular database - it may
cause a hard error, the value may be truncated or wrap-around etc.

> Example
Add two columns to each other.

Expand Down Expand Up @@ -418,6 +426,15 @@ type Column
Returns a column with results of subtracting `other` from each element of
`self`.

? Arithmetic Overflow

For integer columns, the operation may yield results that will not fit
into the range supported by the column. In such case, the in-memory
backend will replace such results with `Nothing` and report a
`Arithmetic_Overflow` warning. The behaviour in Database backends is
not specified and will depend on the particular database - it may
cause a hard error, the value may be truncated or wrap-around etc.

> Example
Subtract one column from another.

Expand Down Expand Up @@ -461,6 +478,15 @@ type Column
Returns a column containing the result of multiplying each element of
`self` by `other`.

? Arithmetic Overflow

For integer columns, the operation may yield results that will not fit
into the range supported by the column. In such case, the in-memory
backend will replace such results with `Nothing` and report a
`Arithmetic_Overflow` warning. The behaviour in Database backends is
not specified and will depend on the particular database - it may
cause a hard error, the value may be truncated or wrap-around etc.

> Example
Multiply the elements of two columns together.

Expand Down Expand Up @@ -495,6 +521,15 @@ type Column
- If division by zero occurs, an `Arithmetic_Error` warning is attached
to the result.

? Arithmetic Overflow

For integer columns, the operation may yield results that will not fit
into the range supported by the column. In such case, the in-memory
backend will replace such results with `Nothing` and report a
`Arithmetic_Overflow` warning. The behaviour in Database backends is
not specified and will depend on the particular database - it may
cause a hard error, the value may be truncated or wrap-around etc.

> Example
Divide the elements of one column by the elements of another.

Expand Down Expand Up @@ -560,6 +595,15 @@ type Column
Returns a column containing the result of raising each element of `self`
by `other`.

? Arithmetic Overflow

For integer columns, the operation may yield results that will not fit
into the range supported by the column. In such case, the in-memory
backend will replace such results with `Nothing` and report a
`Arithmetic_Overflow` warning. The behaviour in Database backends is
not specified and will depend on the particular database - it may
cause a hard error, the value may be truncated or wrap-around etc.

> Example
Squares the elements of one column.

Expand Down Expand Up @@ -1229,7 +1273,7 @@ type Column
length = self.length
storage = self.java_column.getStorage

builder = StringBuilder.new length
builder = make_string_builder length
0.up_to length . each i->
replaced = do_replace i (storage.getItem i)
builder.append replaced
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from Standard.Base import all
import Standard.Base.Errors.Illegal_Argument.Illegal_Argument

import project.Data.Type.Storage
import project.Internal.Java_Problems
import project.Internal.Parse_Values_Helper
from project.Data.Type.Value_Type import Auto, Bits, Value_Type
Expand Down Expand Up @@ -173,9 +174,10 @@ type Data_Formatter
WhitespaceStrippingParser.new base_parser

## PRIVATE
make_integer_parser self auto_mode=False =
make_integer_parser self auto_mode=False target_type=Value_Type.Integer =
separator = if self.thousand_separator.is_empty then Nothing else self.thousand_separator
NumberParser.createIntegerParser auto_mode.not (auto_mode.not || self.allow_leading_zeros) self.trim_values separator
storage_type = Storage.from_value_type_strict target_type
NumberParser.createIntegerParser storage_type auto_mode.not (auto_mode.not || self.allow_leading_zeros) self.trim_values separator

## PRIVATE
make_decimal_parser self auto_mode=False =
Expand Down Expand Up @@ -221,8 +223,8 @@ type Data_Formatter

## PRIVATE
make_value_type_parser self value_type = case value_type of
# TODO once we implement #5159 we will need to add checks for bounds here and support 16/32-bit ints
Value_Type.Integer Bits.Bits_64 -> self.make_integer_parser
Value_Type.Integer _ ->
self.make_integer_parser target_type=value_type
# TODO once we implement #6109 we can support 32-bit floats
Value_Type.Float Bits.Bits_64 -> self.make_decimal_parser
Value_Type.Boolean -> self.make_boolean_parser
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ from Standard.Table.Errors import Inexact_Type_Coercion

polyglot java import org.enso.table.data.column.builder.Builder
polyglot java import org.enso.table.data.column.storage.type.AnyObjectType
polyglot java import org.enso.table.data.column.storage.type.Bits as Java_Bits
polyglot java import org.enso.table.data.column.storage.type.BooleanType
polyglot java import org.enso.table.data.column.storage.type.DateTimeType
polyglot java import org.enso.table.data.column.storage.type.DateType
Expand All @@ -24,9 +25,9 @@ to_value_type : StorageType -> Value_Type
to_value_type storage_type = case storage_type of
i : IntegerType -> case i.bits.toInteger of
8 -> Value_Type.Byte
b -> Value_Type.Integer (Bits.from_bits b)
b -> Value_Type.Integer (Bits.from_integer b)
f : FloatType ->
bits = Bits.from_bits f.bits.toInteger
bits = Bits.from_integer f.bits.toInteger
Value_Type.Float bits
_ : BooleanType -> Value_Type.Boolean
s : TextType ->
Expand All @@ -40,12 +41,18 @@ to_value_type storage_type = case storage_type of

## PRIVATE
closest_storage_type value_type = case value_type of
# TODO we will want builders and storages with bounds checking, but for now we approximate
Value_Type.Byte -> IntegerType.INT_64
Value_Type.Integer _ -> IntegerType.INT_64
Value_Type.Byte -> IntegerType.INT_8
Value_Type.Integer bits ->
java_bits = Java_Bits.fromInteger bits.to_integer
IntegerType.create java_bits
Value_Type.Float _ -> FloatType.FLOAT_64
Value_Type.Boolean -> BooleanType.INSTANCE
Value_Type.Char _ _ -> TextType.VARIABLE_LENGTH
Value_Type.Char Nothing True -> TextType.VARIABLE_LENGTH
Value_Type.Char Nothing False ->
Error.throw (Illegal_Argument.Error "Value_Type.Char with fixed length must have a non-nothing size")
Value_Type.Char max_length variable_length ->
fixed_length = variable_length.not
TextType.new max_length fixed_length
Value_Type.Date -> DateType.INSTANCE
# We currently will not support storing dates without timezones in in-memory mode.
Value_Type.Date_Time _ -> DateTimeType.INSTANCE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,15 @@ type Bits
Bits_64

## PRIVATE
to_bits : Integer
to_bits self = case self of
to_integer : Integer
to_integer self = case self of
Bits.Bits_16 -> 16
Bits.Bits_32 -> 32
Bits.Bits_64 -> 64

## PRIVATE
from_bits : Integer -> Bits
from_bits bits = case bits of
from_integer : Integer -> Bits
from_integer bits = case bits of
16 -> Bits.Bits_16
32 -> Bits.Bits_32
64 -> Bits.Bits_64
Expand All @@ -33,17 +33,17 @@ type Bits
## PRIVATE
Provides the text representation of the bit-size.
to_text : Text
to_text self = self.to_bits.to_text + " bits"
to_text self = self.to_integer.to_text + " bits"

## PRIVATE
type Bits_Comparator
## PRIVATE
compare : Bits -> Bits -> Ordering
compare x y = Comparable.from x.to_bits . compare x.to_bits y.to_bits
compare x y = Comparable.from x.to_integer . compare x.to_integer y.to_integer

## PRIVATE
hash : Bits -> Integer
hash x = Comparable.from x.to_bits . hash x.to_bits
hash x = Comparable.from x.to_integer . hash x.to_integer

Comparable.from (_:Bits) = Bits_Comparator

Expand Down Expand Up @@ -96,8 +96,9 @@ type Value_Type

Arguments:
- size: the maximum number of characters that can be stored in the
column.
column. It can be nothing to indicate no limit.
- variable_length: whether the size is a maximum or a fixed length.
A fixed length string must have a non-nothing size.
Char size:(Integer|Nothing)=Nothing variable_length:Boolean=True

## Date
Expand Down Expand Up @@ -389,9 +390,9 @@ type Value_Type
constructor_name = Meta.meta self . constructor . name
additional_fields = case self of
Value_Type.Integer size ->
[["bits", size.to_bits]]
[["bits", size.to_integer]]
Value_Type.Float size ->
[["bits", size.to_bits]]
[["bits", size.to_integer]]
Value_Type.Decimal precision scale ->
[["precision", precision], ["scale", scale]]
Value_Type.Char size variable_length ->
Expand Down
Loading

0 comments on commit 2385f5b

Please sign in to comment.