Skip to content

Commit 2385f5b

Browse files
authored
Add size-limited strings and varying bit-width integer Value_Types to in-memory backend and check for ArithmeticOverflow in LongStorage (#7557)
- Closes #5159 - Now data downloaded from the database can keep the type much closer to the original type (like string length limits or smaller integer types). - Cast also exposes these types. - The integers are still all stored as 64-bit Java `long`s, we just check their bounds. Changing underlying storage for memory efficiency may come in the future: #6109 - Fixes #7565 - Fixes #7529 by checking for arithmetic overflow in in-memory integer arithmetic operations that could overflow. Adds a documentation note saying that the behaviour for Database backends is unspecified and depends on particular database.
1 parent b3e9ea8 commit 2385f5b

File tree

65 files changed

+1289
-305
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+1289
-305
lines changed

CHANGELOG.md

+3
Original file line numberDiff line numberDiff line change
@@ -553,6 +553,8 @@
553553
- [Retire `Column_Selector` and allow regex based selection of columns.][7295]
554554
- [`Text.parse_to_table` can take a `Regex`.][7297]
555555
- [Expose `Text.normalize`.][7425]
556+
- [Implemented new value types (various sizes of `Integer` type, fixed-length
557+
and length-limited `Char` type) for the in-memory `Table` backend.][7557]
556558

557559
[debug-shortcuts]:
558560
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
@@ -786,6 +788,7 @@
786788
[7295]: https://github.com/enso-org/enso/pull/7295
787789
[7297]: https://github.com/enso-org/enso/pull/7297
788790
[7425]: https://github.com/enso-org/enso/pull/7425
791+
[7557]: https://github.com/enso-org/enso/pull/7557
789792

790793
#### Enso Compiler
791794

distribution/lib/Standard/Database/0.0.0-dev/src/Data/Column.enso

+45
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,15 @@ type Column
369369
Returns a column containing the result of adding `other` to each element
370370
of `self`. If `other` is a column, the operation is performed pairwise
371371
between corresponding elements of `self` and `other`.
372+
373+
? Arithmetic Overflow
374+
375+
For integer columns, the operation may yield results that will not fit
376+
into the range supported by the column. In such case, the in-memory
377+
backend will replace such results with `Nothing` and report a
378+
`Arithmetic_Overflow` warning. The behaviour in Database backends is
379+
not specified and will depend on the particular database - it may
380+
cause a hard error, the value may be truncated or wrap-around etc.
372381
+ : Column | Any -> Column
373382
+ self other =
374383
op = case Value_Type_Helpers.resolve_addition_kind self other of
@@ -388,6 +397,15 @@ type Column
388397
Returns a column containing the result of subtracting `other` from each
389398
element of `self`. If `other` is a column, the operation is performed
390399
pairwise between corresponding elements of `self` and `other`.
400+
401+
? Arithmetic Overflow
402+
403+
For integer columns, the operation may yield results that will not fit
404+
into the range supported by the column. In such case, the in-memory
405+
backend will replace such results with `Nothing` and report a
406+
`Arithmetic_Overflow` warning. The behaviour in Database backends is
407+
not specified and will depend on the particular database - it may
408+
cause a hard error, the value may be truncated or wrap-around etc.
391409
- : Column | Any -> Column
392410
- self other =
393411
case Value_Type_Helpers.resolve_subtraction_kind self other of
@@ -405,6 +423,15 @@ type Column
405423
Returns a column containing the result of multiplying `other` by each
406424
element of `self`. If `other` is a column, the operation is performed
407425
pairwise between corresponding elements of `self` and `other`.
426+
427+
? Arithmetic Overflow
428+
429+
For integer columns, the operation may yield results that will not fit
430+
into the range supported by the column. In such case, the in-memory
431+
backend will replace such results with `Nothing` and report a
432+
`Arithmetic_Overflow` warning. The behaviour in Database backends is
433+
not specified and will depend on the particular database - it may
434+
cause a hard error, the value may be truncated or wrap-around etc.
408435
* : Column | Any -> Column
409436
* self other =
410437
Value_Type_Helpers.check_binary_numeric_op self other <|
@@ -426,6 +453,15 @@ type Column
426453
- If division by zero occurs, an `Arithmetic_Error` warning is attached
427454
to the result.
428455

456+
? Arithmetic Overflow
457+
458+
For integer columns, the operation may yield results that will not fit
459+
into the range supported by the column. In such case, the in-memory
460+
backend will replace such results with `Nothing` and report a
461+
`Arithmetic_Overflow` warning. The behaviour in Database backends is
462+
not specified and will depend on the particular database - it may
463+
cause a hard error, the value may be truncated or wrap-around etc.
464+
429465
> Example
430466
Divide the elements of one column by the elements of another.
431467

@@ -493,6 +529,15 @@ type Column
493529
Returns a column containing the result of raising each element of `self`
494530
by `other`.
495531

532+
? Arithmetic Overflow
533+
534+
For integer columns, the operation may yield results that will not fit
535+
into the range supported by the column. In such case, the in-memory
536+
backend will replace such results with `Nothing` and report a
537+
`Arithmetic_Overflow` warning. The behaviour in Database backends is
538+
not specified and will depend on the particular database - it may
539+
cause a hard error, the value may be truncated or wrap-around etc.
540+
496541
> Example
497542
Squares the elements of one column.
498543

distribution/lib/Standard/Database/0.0.0-dev/src/Internal/Column_Fetcher.enso

+9-10
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from Standard.Base import all
22

33
import Standard.Table.Data.Column.Column as Materialized_Column
4+
import Standard.Table.Data.Type.Value_Type.Bits
45
import Standard.Table.Data.Type.Value_Type.Value_Type
56
import Standard.Table.Internal.Java_Exports
67

@@ -69,27 +70,27 @@ double_fetcher =
6970
Column_Fetcher.Value fetch_value make_builder
7071

7172
## PRIVATE
72-
long_fetcher : Column_Fetcher
73-
long_fetcher =
73+
long_fetcher : Bits -> Column_Fetcher
74+
long_fetcher bits =
7475
fetch_value rs i =
7576
l = rs.getLong i
7677
if rs.wasNull then Nothing else l
7778
make_builder initial_size =
78-
java_builder = Java_Exports.make_long_builder initial_size
79+
java_builder = Java_Exports.make_long_builder initial_size bits=bits
7980
append v =
8081
if v.is_nothing then java_builder.appendNulls 1 else
8182
java_builder.appendLong v
8283
Builder.Value append (seal_java_builder java_builder)
8384
Column_Fetcher.Value fetch_value make_builder
8485

8586
## PRIVATE
86-
text_fetcher : Column_Fetcher
87-
text_fetcher =
87+
text_fetcher : Value_Type -> Column_Fetcher
88+
text_fetcher value_type =
8889
fetch_value rs i =
8990
t = rs.getString i
9091
if rs.wasNull then Nothing else t
9192
make_builder initial_size =
92-
java_builder = Java_Exports.make_string_builder initial_size
93+
java_builder = Java_Exports.make_string_builder initial_size value_type=value_type
9394
make_builder_from_java_object_builder java_builder
9495
Column_Fetcher.Value fetch_value make_builder
9596

@@ -137,11 +138,9 @@ date_time_fetcher =
137138
default_fetcher_for_value_type : Value_Type -> Column_Fetcher
138139
default_fetcher_for_value_type value_type =
139140
case value_type of
140-
## TODO [RW] once we support varying bit-width in storages, we should specify it
141-
Revisit in #5159.
142-
Value_Type.Integer _ -> long_fetcher
141+
Value_Type.Integer bits -> long_fetcher bits
143142
Value_Type.Float _ -> double_fetcher
144-
Value_Type.Char _ _ -> text_fetcher
143+
Value_Type.Char _ _ -> text_fetcher value_type
145144
Value_Type.Boolean -> boolean_fetcher
146145
Value_Type.Time -> time_fetcher
147146
# We currently don't distinguish timestamps without a timezone on the Enso value side.

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Column.enso

+46-2
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@ from project.Internal.Column_Format import all
3030
from project.Internal.Java_Exports import make_date_builder_adapter, make_double_builder, make_long_builder, make_string_builder
3131

3232
polyglot java import org.enso.base.Time_Utils
33-
polyglot java import org.enso.table.data.column.builder.StringBuilder
3433
polyglot java import org.enso.table.data.column.operation.map.MapOperationProblemBuilder
3534
polyglot java import org.enso.table.data.column.storage.Storage as Java_Storage
3635
polyglot java import org.enso.table.data.table.Column as Java_Column
@@ -388,6 +387,15 @@ type Column
388387
Returns a column with results of adding `other` from each element of
389388
`self`.
390389

390+
? Arithmetic Overflow
391+
392+
For integer columns, the operation may yield results that will not fit
393+
into the range supported by the column. In such case, the in-memory
394+
backend will replace such results with `Nothing` and report a
395+
`Arithmetic_Overflow` warning. The behaviour in Database backends is
396+
not specified and will depend on the particular database - it may
397+
cause a hard error, the value may be truncated or wrap-around etc.
398+
391399
> Example
392400
Add two columns to each other.
393401

@@ -418,6 +426,15 @@ type Column
418426
Returns a column with results of subtracting `other` from each element of
419427
`self`.
420428

429+
? Arithmetic Overflow
430+
431+
For integer columns, the operation may yield results that will not fit
432+
into the range supported by the column. In such case, the in-memory
433+
backend will replace such results with `Nothing` and report a
434+
`Arithmetic_Overflow` warning. The behaviour in Database backends is
435+
not specified and will depend on the particular database - it may
436+
cause a hard error, the value may be truncated or wrap-around etc.
437+
421438
> Example
422439
Subtract one column from another.
423440

@@ -461,6 +478,15 @@ type Column
461478
Returns a column containing the result of multiplying each element of
462479
`self` by `other`.
463480

481+
? Arithmetic Overflow
482+
483+
For integer columns, the operation may yield results that will not fit
484+
into the range supported by the column. In such case, the in-memory
485+
backend will replace such results with `Nothing` and report a
486+
`Arithmetic_Overflow` warning. The behaviour in Database backends is
487+
not specified and will depend on the particular database - it may
488+
cause a hard error, the value may be truncated or wrap-around etc.
489+
464490
> Example
465491
Multiply the elements of two columns together.
466492

@@ -495,6 +521,15 @@ type Column
495521
- If division by zero occurs, an `Arithmetic_Error` warning is attached
496522
to the result.
497523

524+
? Arithmetic Overflow
525+
526+
For integer columns, the operation may yield results that will not fit
527+
into the range supported by the column. In such case, the in-memory
528+
backend will replace such results with `Nothing` and report a
529+
`Arithmetic_Overflow` warning. The behaviour in Database backends is
530+
not specified and will depend on the particular database - it may
531+
cause a hard error, the value may be truncated or wrap-around etc.
532+
498533
> Example
499534
Divide the elements of one column by the elements of another.
500535

@@ -560,6 +595,15 @@ type Column
560595
Returns a column containing the result of raising each element of `self`
561596
by `other`.
562597

598+
? Arithmetic Overflow
599+
600+
For integer columns, the operation may yield results that will not fit
601+
into the range supported by the column. In such case, the in-memory
602+
backend will replace such results with `Nothing` and report a
603+
`Arithmetic_Overflow` warning. The behaviour in Database backends is
604+
not specified and will depend on the particular database - it may
605+
cause a hard error, the value may be truncated or wrap-around etc.
606+
563607
> Example
564608
Squares the elements of one column.
565609

@@ -1229,7 +1273,7 @@ type Column
12291273
length = self.length
12301274
storage = self.java_column.getStorage
12311275

1232-
builder = StringBuilder.new length
1276+
builder = make_string_builder length
12331277
0.up_to length . each i->
12341278
replaced = do_replace i (storage.getItem i)
12351279
builder.append replaced

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Data_Formatter.enso

+6-4
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from Standard.Base import all
22
import Standard.Base.Errors.Illegal_Argument.Illegal_Argument
33

4+
import project.Data.Type.Storage
45
import project.Internal.Java_Problems
56
import project.Internal.Parse_Values_Helper
67
from project.Data.Type.Value_Type import Auto, Bits, Value_Type
@@ -173,9 +174,10 @@ type Data_Formatter
173174
WhitespaceStrippingParser.new base_parser
174175

175176
## PRIVATE
176-
make_integer_parser self auto_mode=False =
177+
make_integer_parser self auto_mode=False target_type=Value_Type.Integer =
177178
separator = if self.thousand_separator.is_empty then Nothing else self.thousand_separator
178-
NumberParser.createIntegerParser auto_mode.not (auto_mode.not || self.allow_leading_zeros) self.trim_values separator
179+
storage_type = Storage.from_value_type_strict target_type
180+
NumberParser.createIntegerParser storage_type auto_mode.not (auto_mode.not || self.allow_leading_zeros) self.trim_values separator
179181

180182
## PRIVATE
181183
make_decimal_parser self auto_mode=False =
@@ -221,8 +223,8 @@ type Data_Formatter
221223

222224
## PRIVATE
223225
make_value_type_parser self value_type = case value_type of
224-
# TODO once we implement #5159 we will need to add checks for bounds here and support 16/32-bit ints
225-
Value_Type.Integer Bits.Bits_64 -> self.make_integer_parser
226+
Value_Type.Integer _ ->
227+
self.make_integer_parser target_type=value_type
226228
# TODO once we implement #6109 we can support 32-bit floats
227229
Value_Type.Float Bits.Bits_64 -> self.make_decimal_parser
228230
Value_Type.Boolean -> self.make_boolean_parser

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Type/Storage.enso

+13-6
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ from Standard.Table.Errors import Inexact_Type_Coercion
99

1010
polyglot java import org.enso.table.data.column.builder.Builder
1111
polyglot java import org.enso.table.data.column.storage.type.AnyObjectType
12+
polyglot java import org.enso.table.data.column.storage.type.Bits as Java_Bits
1213
polyglot java import org.enso.table.data.column.storage.type.BooleanType
1314
polyglot java import org.enso.table.data.column.storage.type.DateTimeType
1415
polyglot java import org.enso.table.data.column.storage.type.DateType
@@ -24,9 +25,9 @@ to_value_type : StorageType -> Value_Type
2425
to_value_type storage_type = case storage_type of
2526
i : IntegerType -> case i.bits.toInteger of
2627
8 -> Value_Type.Byte
27-
b -> Value_Type.Integer (Bits.from_bits b)
28+
b -> Value_Type.Integer (Bits.from_integer b)
2829
f : FloatType ->
29-
bits = Bits.from_bits f.bits.toInteger
30+
bits = Bits.from_integer f.bits.toInteger
3031
Value_Type.Float bits
3132
_ : BooleanType -> Value_Type.Boolean
3233
s : TextType ->
@@ -40,12 +41,18 @@ to_value_type storage_type = case storage_type of
4041

4142
## PRIVATE
4243
closest_storage_type value_type = case value_type of
43-
# TODO we will want builders and storages with bounds checking, but for now we approximate
44-
Value_Type.Byte -> IntegerType.INT_64
45-
Value_Type.Integer _ -> IntegerType.INT_64
44+
Value_Type.Byte -> IntegerType.INT_8
45+
Value_Type.Integer bits ->
46+
java_bits = Java_Bits.fromInteger bits.to_integer
47+
IntegerType.create java_bits
4648
Value_Type.Float _ -> FloatType.FLOAT_64
4749
Value_Type.Boolean -> BooleanType.INSTANCE
48-
Value_Type.Char _ _ -> TextType.VARIABLE_LENGTH
50+
Value_Type.Char Nothing True -> TextType.VARIABLE_LENGTH
51+
Value_Type.Char Nothing False ->
52+
Error.throw (Illegal_Argument.Error "Value_Type.Char with fixed length must have a non-nothing size")
53+
Value_Type.Char max_length variable_length ->
54+
fixed_length = variable_length.not
55+
TextType.new max_length fixed_length
4956
Value_Type.Date -> DateType.INSTANCE
5057
# We currently will not support storing dates without timezones in in-memory mode.
5158
Value_Type.Date_Time _ -> DateTimeType.INSTANCE

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Type/Value_Type.enso

+11-10
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,15 @@ type Bits
1616
Bits_64
1717

1818
## PRIVATE
19-
to_bits : Integer
20-
to_bits self = case self of
19+
to_integer : Integer
20+
to_integer self = case self of
2121
Bits.Bits_16 -> 16
2222
Bits.Bits_32 -> 32
2323
Bits.Bits_64 -> 64
2424

2525
## PRIVATE
26-
from_bits : Integer -> Bits
27-
from_bits bits = case bits of
26+
from_integer : Integer -> Bits
27+
from_integer bits = case bits of
2828
16 -> Bits.Bits_16
2929
32 -> Bits.Bits_32
3030
64 -> Bits.Bits_64
@@ -33,17 +33,17 @@ type Bits
3333
## PRIVATE
3434
Provides the text representation of the bit-size.
3535
to_text : Text
36-
to_text self = self.to_bits.to_text + " bits"
36+
to_text self = self.to_integer.to_text + " bits"
3737

3838
## PRIVATE
3939
type Bits_Comparator
4040
## PRIVATE
4141
compare : Bits -> Bits -> Ordering
42-
compare x y = Comparable.from x.to_bits . compare x.to_bits y.to_bits
42+
compare x y = Comparable.from x.to_integer . compare x.to_integer y.to_integer
4343

4444
## PRIVATE
4545
hash : Bits -> Integer
46-
hash x = Comparable.from x.to_bits . hash x.to_bits
46+
hash x = Comparable.from x.to_integer . hash x.to_integer
4747

4848
Comparable.from (_:Bits) = Bits_Comparator
4949

@@ -96,8 +96,9 @@ type Value_Type
9696

9797
Arguments:
9898
- size: the maximum number of characters that can be stored in the
99-
column.
99+
column. It can be nothing to indicate no limit.
100100
- variable_length: whether the size is a maximum or a fixed length.
101+
A fixed length string must have a non-nothing size.
101102
Char size:(Integer|Nothing)=Nothing variable_length:Boolean=True
102103

103104
## Date
@@ -389,9 +390,9 @@ type Value_Type
389390
constructor_name = Meta.meta self . constructor . name
390391
additional_fields = case self of
391392
Value_Type.Integer size ->
392-
[["bits", size.to_bits]]
393+
[["bits", size.to_integer]]
393394
Value_Type.Float size ->
394-
[["bits", size.to_bits]]
395+
[["bits", size.to_integer]]
395396
Value_Type.Decimal precision scale ->
396397
[["precision", precision], ["scale", scale]]
397398
Value_Type.Char size variable_length ->

0 commit comments

Comments
 (0)