Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column-level lexically-scoped CTE expressions #10826

Merged
merged 72 commits into from
Aug 28, 2024
Merged
Show file tree
Hide file tree
Changes from 57 commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
33b2ff7
expression-level with
GregoryTravis Aug 7, 2024
0a406d9
generate binder
GregoryTravis Aug 7, 2024
5f4406c
.standalone wip
GregoryTravis Aug 7, 2024
7dff97a
Merge branch 'develop' into wip/gmt/10306-round-with
GregoryTravis Aug 8, 2024
267581b
missing pattern field
GregoryTravis Aug 8, 2024
96dd259
removed standalone, try scoping with state hack
GregoryTravis Aug 8, 2024
fe5d66c
use Text key
GregoryTravis Aug 8, 2024
fd55ff7
Merge branch 'develop' into wip/gmt/10306-round-with
GregoryTravis Aug 12, 2024
266de31
storage hack
GregoryTravis Aug 12, 2024
0d60089
Merge branch 'develop' into wip/gmt/10306-round-with
GregoryTravis Aug 13, 2024
ede8c28
round, iif
GregoryTravis Aug 13, 2024
75ffca6
move State hack to base_generator, pg tests pass
GregoryTravis Aug 13, 2024
a1611b1
remove bindee body parens for sqlite
GregoryTravis Aug 13, 2024
fb4def4
Merge branch 'develop' into wip/gmt/10306-round-with
GregoryTravis Aug 14, 2024
b7be792
short table name generator
GregoryTravis Aug 14, 2024
899b223
Revert "short table name generator"
GregoryTravis Aug 14, 2024
e042b85
with_ctes setting
GregoryTravis Aug 14, 2024
35ed369
dialect flag
GregoryTravis Aug 14, 2024
1672cd8
with names
GregoryTravis Aug 14, 2024
4166db7
is_finite
GregoryTravis Aug 14, 2024
1bf42e8
with for short_circuit_special_floating_point
GregoryTravis Aug 14, 2024
1ed8b3e
Merge branch 'develop' into wip/gmt/10306-round-with
GregoryTravis Aug 15, 2024
9e035e7
cleanup
GregoryTravis Aug 15, 2024
d6bc340
do not use in iif and 10.from-outside-1-2-3, but use on two iifs in r…
GregoryTravis Aug 15, 2024
56fcc7b
remove two unneeded withs
GregoryTravis Aug 15, 2024
291484c
cleanup
GregoryTravis Aug 15, 2024
18c0ff9
docs
GregoryTravis Aug 15, 2024
a8fc1d0
cleanup
GregoryTravis Aug 15, 2024
ce5c8ad
sql_expression docs
GregoryTravis Aug 15, 2024
5b2a413
wip
GregoryTravis Aug 15, 2024
153bbef
Merge branch 'develop' into wip/gmt/10306-round-with
GregoryTravis Aug 16, 2024
9e7adb9
traverse
GregoryTravis Aug 16, 2024
aa24fe8
wip
GregoryTravis Aug 16, 2024
5b75c09
do not need not handled
GregoryTravis Aug 16, 2024
03daa23
IR_Spec
GregoryTravis Aug 16, 2024
6f4bcbb
Merge branch 'develop' into wip/gmt/10306-round-with
GregoryTravis Aug 19, 2024
2d91beb
IR_Spec passes
GregoryTravis Aug 19, 2024
ddb433d
test count and traverse
GregoryTravis Aug 19, 2024
f02388b
pg and sl passing
GregoryTravis Aug 19, 2024
a7db8b0
shorten binders
GregoryTravis Aug 19, 2024
4e5819b
SQL_Generator
GregoryTravis Aug 19, 2024
3eb36b9
Merge branch 'develop' into wip/gmt/10306-round-with
GregoryTravis Aug 20, 2024
5c75ffe
avoid table name conflicts
GregoryTravis Aug 20, 2024
a02a84f
in-mem is_finite
GregoryTravis Aug 20, 2024
2c71060
test failure
GregoryTravis Aug 20, 2024
c8f30a7
rename to let
GregoryTravis Aug 20, 2024
928c909
tests
GregoryTravis Aug 20, 2024
f65f822
docs
GregoryTravis Aug 20, 2024
b3df24e
converted other rounds
GregoryTravis Aug 20, 2024
e1f2294
Merge branch 'develop' into wip/gmt/10306-round-with
GregoryTravis Aug 21, 2024
bb95c21
fix rounding
GregoryTravis Aug 21, 2024
64bc4cd
ir spec test round from the outside
GregoryTravis Aug 21, 2024
04dee0e
combos tests
GregoryTravis Aug 21, 2024
045b9dc
wip
GregoryTravis Aug 21, 2024
3915c48
fix is_finite test
GregoryTravis Aug 21, 2024
1bbf85a
wip
GregoryTravis Aug 21, 2024
e2c2c99
fmt
GregoryTravis Aug 21, 2024
69efbbd
Merge branch 'develop' into wip/gmt/10306-round-with
GregoryTravis Aug 23, 2024
3500f74
private
GregoryTravis Aug 23, 2024
8017546
doc traverse
GregoryTravis Aug 23, 2024
4c033cf
switch args, require name
GregoryTravis Aug 23, 2024
0c3300f
wip
GregoryTravis Aug 23, 2024
680e6ba
fix indentation
GregoryTravis Aug 23, 2024
b8007ea
wip
GregoryTravis Aug 23, 2024
e9e28bf
docs
GregoryTravis Aug 23, 2024
8cf7210
Merge branch 'develop' into wip/gmt/10306-round-with
GregoryTravis Aug 26, 2024
45e45cd
snowflake test failures
GregoryTravis Aug 26, 2024
d5ff38e
Merge branch 'develop' into wip/gmt/10306-round-with
GregoryTravis Aug 27, 2024
647aedf
review
GregoryTravis Aug 27, 2024
8460c82
is_finite True
GregoryTravis Aug 27, 2024
b8c9ca4
fix snowflake tests
GregoryTravis Aug 27, 2024
7b3a823
Merge branch 'develop' into wip/gmt/10306-round-with
GregoryTravis Aug 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,11 @@ type Redshift_Dialect
_ = statement
False

## PRIVATE
Specifies if the Database backend supports WITH clauses in nested queries.
supports_nested_with_clause : Boolean
supports_nested_with_clause self = True

## PRIVATE
supports_separate_nan : Boolean
supports_separate_nan self = True
Expand Down
46 changes: 46 additions & 0 deletions distribution/lib/Standard/Base/0.0.0-dev/src/Runtime/Ref.enso
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,49 @@ type Ref
# => 10
modify : (Any -> Any) -> Any
modify self fun = self.put (fun self.get)

## GROUP Calculations
ICON edit
Temporarily change the value of this mutable reference during the
execution of an action.

Returns the value of the action.

Arguments:
- new_value: the value to set during the execution of the action
- action: the action to execute with the modified value set

> Example
Execute an action with a temporarily incremented value.

r = Ref.new 10
r.with_value 11 <|
r.get == 11 # True
r.get == 10 # True
with_value self (new_value : Any) (~action : Any) =
self.with_modification (_ -> new_value) action

## GROUP Calculations
ICON edit
Temporarily change the value of this mutable reference during the
execution of an action, using a modification function.

Returns the value of the action.

Arguments:
- modifier: the function used to modify the value during the execution of
the action
- action: the action to execute with the modified value set

> Example
Execute an action with a temporarily incremented value.

r = Ref.new 10
r.with_modification (_+1) <|
r.get == 11 # True
r.get == 10 # True
with_modification self (modifier : Any -> Any) (~action : Any) =
old_value = self.modify modifier
r = action
self.put old_value
r
183 changes: 152 additions & 31 deletions distribution/lib/Standard/Database/0.0.0-dev/src/DB_Column.enso
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,106 @@ type DB_Column
to_sql : SQL_Statement
to_sql self = self.to_table.to_sql

## PRIVATE
Column-level manual CTE factoring.

Calling `let` on a column wraps it as a CTE (common table
expression), using a SQL `WITH ... AS` clause. More specifically, it
takes a callback that receives a "reference" to the CTE; the callback
then returns an arbitrary column that uses the reference. The `let`
call itself returns the full `WITH` clause, containing both the CTE and
the callback return value.

Using `let` can reduce the number of duplicates of a column expression
in the final generated SQL, replacing them with references to a single
CTE bound by the `WITH` clause.

`let` acts like a kind of "let binding". It works by giving a lexically
scoped name to the query generated by `self`, and then generating the
query returned by the callback inside of this scope. See the examples
below to see how the generated SQL is structured.

Internally, `let` generates a unique name for the CTE, and creates a
"reference" column which refers to that unique name. This
"reference" column is passed to the callback, which can compute values
based on the reference and any other values. Finally, the return value of
the callback is wrapped in the binding `WITH ... AS` clause, which is
returned from the original call to `let`.

`let` is only available in database backends that support `WITH` clauses
inside expressions. For database backends that only support a single
`WITH` claus at the top level, `let` returns `self` unchanged.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`WITH` claus at the top level, `let` returns `self` unchanged.
`WITH` clause at the top level, `let` returns `self` unchanged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Semantically, the following expressions will always have the same value:

1. f column
2. column.let f

? When to use `let`

`let` can make queries shorter and/or simpler by eliminating
duplicates. However, the `WITH` clause itself, including the bound CTE
table name, also takes up space, so if the `self` argument to `let`
isn't very large, `let` can actually make the query longer.

For this reason, it is generally better to apply `let` to values
before passing them into a library method, rather than to apply it to
the value inside the library method itself.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh I do not understand this paragraph.

After reading the code I understand that you mean that e.g. with iif it was not a good idea to use let as part of iif, but instead if you know that a big expression is being passed as the branching argument to iif, you are using let at the call-site.

But I couldn't figure it out from this paragraph too easily. I think I'd rewrite it to maybe just say sth like 'For this reason, it makes sense to use let only on expressions that at the same time: are expected to be relatively large (results of complex transformations already) and are going to be repeated multiple times in the query.'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


> Example
Remove duplicates of a large column expression from a query.

## Without CTEs

column = table_builder [['x', [1.3, 4.7, -1.3, -4.7]]] . at "x"
rounded = column.round
large = rounded * rounded
large.to_sql
## =>

-- Two copies of the complex rounding query
SELECT ... [complex rounding query] * [complex rounding query]
... FROM ...

## With CTEs

not_so_large = column.round.let rounded->
rounded * rounded
not_so_large.to_sql
## =>

-- One copy of the complex rounding query
SELECT ... (WITH temp_table as ([complex rounding query])
temp_table.x * temp_table.x)
... FROM ...

> Example
Use multiple CTEs in a query.

(column_a * column_b).let product_a_b->
(product_a_b * 10).let times_ten->
times_ten + product_a_b + 100

> Example
Give names to the CTE table names.

(column_a * column_b).let name="product_a_b" product_a_b->
(product_a_b * 10).let name="times_ten" times_ten->
times_ten + product_a_b + 100
let : (DB_Column -> DB_Column) -> Text -> DB_Column
let self (callback : (DB_Column -> DB_Column)) (name : Text = "let") -> DB_Column =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd reverse the order of arguments and make the name required. It will make the queries more readable if helpful names are always provided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing Arguments: section?

What if we rename name to name_hint and use it as suggested prefix in generate_random_table_name? Then the generated names would contain it, making it easier to discern these names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched the arguments and added Arguments:.

I originally used the name to generate the table name, but since the table name is replaced in Base_Generator, I am not only using the name there.

use_ctes = self.connection.dialect.supports_nested_with_clause

if use_ctes.not then callback self else
binder = self.connection.base_connection.table_naming_helper.generate_random_table_name

ref_expression = SQL_Expression.Let_Ref name binder self.expression
ref_column = DB_Column.Value binder self.connection self.sql_type_reference ref_expression self.context
inner_value = callback ref_column

let_expression = SQL_Expression.Let name binder self.expression inner_value.expression
DB_Column.Value inner_value.internal_name inner_value.connection inner_value.sql_type_reference let_expression inner_value.context

## PRIVATE
Sets up an operation of arbitrary arity.

Expand Down Expand Up @@ -849,13 +949,18 @@ type DB_Column
new_name = self.naming_helper.function_name "round" [self]
scale = 10 ^ decimal_places
scaled = self * scale
round_base = scaled.floor . rename "rb"
round_midpoint = (round_base + 0.5) / scale
even_is_up = (self >= 0).iif ((scaled.truncate % 2) != 0) ((scaled.truncate % 2) == 0)
half_goes_up = if use_bankers then even_is_up else self >= 0
do_round_up = half_goes_up.iif (self >= round_midpoint) (self > round_midpoint)
result = do_round_up.iif ((round_base + 1.0) / scale) (round_base / scale)
result.rename new_name

scaled.let name="scaled" scaled->
round_base = scaled.floor . rename "rb"
round_base.let name="round_base" round_base->
round_midpoint = (round_base + 0.5) / scale
round_midpoint.let name="round_midpoint" round_midpoint->
((scaled.truncate % 2) == 0).let name="scaled_truncate_mod_2_equals_0" scaled_truncate_mod_2_equals_0->
even_is_up = (self >= 0).iif scaled_truncate_mod_2_equals_0.not scaled_truncate_mod_2_equals_0
half_goes_up = if use_bankers then even_is_up else self >= 0
do_round_up = half_goes_up.let x-> x.iif (self >= round_midpoint) (self > round_midpoint)
result = do_round_up.let x-> x.iif ((round_base + 1.0) / scale) (round_base / scale)
result.rename new_name

## PRIVATE
Round a float-like column.
Expand All @@ -864,15 +969,19 @@ type DB_Column
# Construct a constant Decimal column.
k x = self.const x . cast Value_Type.Decimal
new_name = self.naming_helper.function_name "round" [self]
scale = k 10 ^ decimal_places
scale = 10 ^ decimal_places
scaled = self * scale
round_base = scaled.floor . rename "rb"
round_midpoint = (round_base + k 0.5).decimal_div scale
even_is_up = (self >= k 0).iif ((scaled.truncate.decimal_mod (k 2)) != k 0) ((scaled.truncate.decimal_mod (k 2)) == k 0)
half_goes_up = if use_bankers then even_is_up else self >= k 0
do_round_up = half_goes_up.iif (self >= round_midpoint) (self > round_midpoint)
result = do_round_up.iif ((round_base + k 1).decimal_div scale) (round_base.decimal_div scale)
result.rename new_name
scaled.let name="scaled" scaled->
round_base = scaled.floor . rename "rb"
round_base.let name="round_base" round_base->
round_midpoint = (round_base + k 0.5).decimal_div scale
round_midpoint.let name="round_midpoint" round_midpoint->
((scaled.truncate.decimal_mod (k 2)) != k 0).let name="scaled_truncate_mod_2_equals_0" scaled_truncate_mod_2_equals_0->
even_is_up = (self >= k 0).iif scaled_truncate_mod_2_equals_0.not scaled_truncate_mod_2_equals_0
half_goes_up = if use_bankers then even_is_up else self >= k 0
do_round_up = half_goes_up.let x-> x.iif (self >= round_midpoint) (self > round_midpoint)
result = do_round_up.let x-> x.iif ((round_base + k 1).decimal_div scale) (round_base.decimal_div scale)
result.rename new_name

## PRIVATE
Round an integer column.
Expand All @@ -882,20 +991,22 @@ type DB_Column
scale = 10 ^ -decimal_places
halfway = scale.div 2
remainder = self % scale
scaled_down = (self / scale).truncate . cast Value_Type.Integer
result_unnudged = scaled_down * scale

if_non_neg =
half_goes_up = if use_bankers then (scaled_down % 2) != 0 else self >= 0
round_up = half_goes_up.iif (remainder >= halfway) (remainder > halfway)
round_up.iif (result_unnudged + scale) result_unnudged
if_neg =
half_goes_up = if use_bankers then (scaled_down % 2) == 0 else self >= 0
round_up = half_goes_up.iif (remainder < -halfway) (remainder <= -halfway)
round_up.iif (result_unnudged - scale) result_unnudged

result = (self >= 0).iif if_non_neg if_neg
result.cast Value_Type.Float . rename new_name
remainder.let name="remainer" remainder->
scaled_down = (self / scale).truncate . cast Value_Type.Integer
scaled_down.let name="scaled_down" scaled_down->
result_unnudged = scaled_down * scale
result_unnudged.let name="result_unnudged" result_unnudged->
if_non_neg =
half_goes_up = if use_bankers then (scaled_down % 2) != 0 else self >= 0
round_up = half_goes_up.let x-> x.iif (remainder >= halfway) (remainder > halfway)
round_up.let x-> x.iif (result_unnudged + scale) result_unnudged
if_neg =
half_goes_up = if use_bankers then (scaled_down % 2) == 0 else self >= 0
round_up = half_goes_up.let x-> x.iif (remainder < -halfway) (remainder <= -halfway)
round_up.let x-> x.iif (result_unnudged - scale) result_unnudged

result = (self >= 0).let x-> x.iif if_non_neg if_neg
result.cast Value_Type.Float . rename new_name

## ALIAS int
GROUP Standard.Base.Rounding
Expand Down Expand Up @@ -1082,6 +1193,16 @@ type DB_Column
new_name = self.naming_helper.function_name "is_infinite" [self]
self.make_unary_op "IS_INF" new_name

## GROUP Standard.Base.Math
ICON math
Returns a column of booleans, with `True` items at the positions where
this column contains a non-infinite, non-NaN floating point value. This
is only applicable to double columns.
is_finite : DB_Column
is_finite self = Value_Type.expect_numeric self <|
new_name = self.naming_helper.function_name "is_finite" [self]
self.make_unary_op "IS_FINITE" new_name

## PRIVATE
Returns a column of booleans, with `True` items at the positions where
this column contains an empty string or `Nothing`.
Expand Down Expand Up @@ -1973,9 +2094,9 @@ type DB_Column
name. If the column is not floating point, just return the expression.
short_circuit_special_floating_point : DB_Column -> DB_Column
short_circuit_special_floating_point self exp =
self_is_nan = if self.connection.dialect.supports_separate_nan then self.is_nan else self.is_nothing
if self.value_type.is_floating_point.not then exp else
((self_is_nan || self.is_infinite).iif self exp).rename exp.name
(self.is_finite.not.iif self exp).rename exp.name
Copy link
Contributor Author

@GregoryTravis GregoryTravis Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally also used with inside iif, since iif duplicates its boolean argument. I took this out, because it was applying the CTE transform to expressions that did not need it, thus making queries larger and more complicated. (As noted in the with documentation, with should be used in downstream logic such as round, rather than inside core library methods like iif.)

I kept it in short_circuit_special_floating_point because it is used in many library methods and generates significant overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with should be used in downstream logic such as round, rather than inside core library methods like iif

I don't think I understand - what is the difference here between round and iif? Both are regular library methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that iif is used in many other methods, while round is usually called directly by the user. If iif always used a CTE, then it would often be used on very small inputs, which would result in a larger query, and there would be no easy way to turn it off. (It could definitely be an optional parameter to iif or any other method.)

Also, round is large and complicated, and even for small inputs, it generates many duplicates, so it will always be useful to use with in several places within it. (But not all places -- I tried many combinations, out of curiosity, and some of them made it worse so I removed those.) So that's what I mean when I say round is "downstream" from core library code -- it's larger, more specific, and is not used in many methods throughout the library, unlike iif.

I guess the rule is always "use CTEs if it makes the query smaller". For a method like iif, there is no way for iif to know whether it will always help, but for round I think it always will.

We could also examine the "size" of the input to determine if it's necessary -- I think that eventually we will probably want to do that. But for now, the best approach is to experiment by adding with calls and checking the result. A more by-hand process, but acceptable for complex methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that sounds good.


## PRIVATE
Helper for case case_sensitivity based text operations
make_text_case_op left op other case_sensitivity new_name =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,12 @@ type Dialect
_ = statement
Unimplemented.throw "This is an interface only."

## PRIVATE
Specifies if the Database backend supports WITH clauses in nested queries.
supports_nested_with_clause : Boolean
supports_nested_with_clause self =
Unimplemented.throw "This is an interface only."

## PRIVATE
Specifies if the Database distinguishes a separate `NaN` value for
floating point columns. Some databases will not be able to distinguish
Expand Down
Loading
Loading