-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column-level lexically-scoped CTE expressions #10826
Changes from 57 commits
33b2ff7
0a406d9
5f4406c
7dff97a
267581b
96dd259
fe5d66c
fd55ff7
266de31
0d60089
ede8c28
75ffca6
a1611b1
fb4def4
b7be792
899b223
e042b85
35ed369
1672cd8
4166db7
1bf42e8
1ed8b3e
9e035e7
d6bc340
56fcc7b
291484c
18c0ff9
a8fc1d0
ce5c8ad
5b2a413
153bbef
9e7adb9
aa24fe8
5b75c09
03daa23
6f4bcbb
2d91beb
ddb433d
f02388b
a7db8b0
4e5819b
3eb36b9
5c75ffe
a02a84f
2c71060
c8f30a7
928c909
f65f822
b3df24e
e1f2294
bb95c21
64bc4cd
04dee0e
045b9dc
3915c48
1bbf85a
e2c2c99
69efbbd
3500f74
8017546
4c033cf
0c3300f
680e6ba
b8007ea
e9e28bf
8cf7210
45e45cd
d5ff38e
647aedf
8460c82
b8c9ca4
7b3a823
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -209,6 +209,106 @@ type DB_Column | |
to_sql : SQL_Statement | ||
to_sql self = self.to_table.to_sql | ||
|
||
## PRIVATE | ||
Column-level manual CTE factoring. | ||
|
||
Calling `let` on a column wraps it as a CTE (common table | ||
expression), using a SQL `WITH ... AS` clause. More specifically, it | ||
takes a callback that receives a "reference" to the CTE; the callback | ||
then returns an arbitrary column that uses the reference. The `let` | ||
call itself returns the full `WITH` clause, containing both the CTE and | ||
the callback return value. | ||
|
||
Using `let` can reduce the number of duplicates of a column expression | ||
in the final generated SQL, replacing them with references to a single | ||
CTE bound by the `WITH` clause. | ||
|
||
`let` acts like a kind of "let binding". It works by giving a lexically | ||
scoped name to the query generated by `self`, and then generating the | ||
query returned by the callback inside of this scope. See the examples | ||
below to see how the generated SQL is structured. | ||
|
||
Internally, `let` generates a unique name for the CTE, and creates a | ||
"reference" column which refers to that unique name. This | ||
"reference" column is passed to the callback, which can compute values | ||
based on the reference and any other values. Finally, the return value of | ||
the callback is wrapped in the binding `WITH ... AS` clause, which is | ||
returned from the original call to `let`. | ||
|
||
`let` is only available in database backends that support `WITH` clauses | ||
inside expressions. For database backends that only support a single | ||
`WITH` claus at the top level, `let` returns `self` unchanged. | ||
|
||
Semantically, the following expressions will always have the same value: | ||
|
||
1. f column | ||
2. column.let f | ||
|
||
? When to use `let` | ||
|
||
`let` can make queries shorter and/or simpler by eliminating | ||
duplicates. However, the `WITH` clause itself, including the bound CTE | ||
table name, also takes up space, so if the `self` argument to `let` | ||
isn't very large, `let` can actually make the query longer. | ||
|
||
For this reason, it is generally better to apply `let` to values | ||
before passing them into a library method, rather than to apply it to | ||
the value inside the library method itself. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tbh I do not understand this paragraph. After reading the code I understand that you mean that e.g. with But I couldn't figure it out from this paragraph too easily. I think I'd rewrite it to maybe just say sth like 'For this reason, it makes sense to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
|
||
> Example | ||
Remove duplicates of a large column expression from a query. | ||
|
||
## Without CTEs | ||
|
||
column = table_builder [['x', [1.3, 4.7, -1.3, -4.7]]] . at "x" | ||
rounded = column.round | ||
large = rounded * rounded | ||
large.to_sql | ||
## => | ||
|
||
-- Two copies of the complex rounding query | ||
SELECT ... [complex rounding query] * [complex rounding query] | ||
... FROM ... | ||
|
||
## With CTEs | ||
|
||
not_so_large = column.round.let rounded-> | ||
rounded * rounded | ||
not_so_large.to_sql | ||
## => | ||
|
||
-- One copy of the complex rounding query | ||
SELECT ... (WITH temp_table as ([complex rounding query]) | ||
temp_table.x * temp_table.x) | ||
... FROM ... | ||
|
||
> Example | ||
Use multiple CTEs in a query. | ||
|
||
(column_a * column_b).let product_a_b-> | ||
(product_a_b * 10).let times_ten-> | ||
times_ten + product_a_b + 100 | ||
|
||
> Example | ||
Give names to the CTE table names. | ||
|
||
(column_a * column_b).let name="product_a_b" product_a_b-> | ||
(product_a_b * 10).let name="times_ten" times_ten-> | ||
times_ten + product_a_b + 100 | ||
let : (DB_Column -> DB_Column) -> Text -> DB_Column | ||
let self (callback : (DB_Column -> DB_Column)) (name : Text = "let") -> DB_Column = | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I'd reverse the order of arguments and make the name required. It will make the queries more readable if helpful names are always provided. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we are missing What if we rename There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I switched the arguments and added Arguments:. I originally used the name to generate the table name, but since the table name is replaced in Base_Generator, I am not only using the name there. |
||
use_ctes = self.connection.dialect.supports_nested_with_clause | ||
|
||
if use_ctes.not then callback self else | ||
binder = self.connection.base_connection.table_naming_helper.generate_random_table_name | ||
|
||
ref_expression = SQL_Expression.Let_Ref name binder self.expression | ||
ref_column = DB_Column.Value binder self.connection self.sql_type_reference ref_expression self.context | ||
inner_value = callback ref_column | ||
|
||
let_expression = SQL_Expression.Let name binder self.expression inner_value.expression | ||
DB_Column.Value inner_value.internal_name inner_value.connection inner_value.sql_type_reference let_expression inner_value.context | ||
|
||
## PRIVATE | ||
Sets up an operation of arbitrary arity. | ||
|
||
|
@@ -849,13 +949,18 @@ type DB_Column | |
new_name = self.naming_helper.function_name "round" [self] | ||
scale = 10 ^ decimal_places | ||
scaled = self * scale | ||
round_base = scaled.floor . rename "rb" | ||
round_midpoint = (round_base + 0.5) / scale | ||
even_is_up = (self >= 0).iif ((scaled.truncate % 2) != 0) ((scaled.truncate % 2) == 0) | ||
half_goes_up = if use_bankers then even_is_up else self >= 0 | ||
do_round_up = half_goes_up.iif (self >= round_midpoint) (self > round_midpoint) | ||
result = do_round_up.iif ((round_base + 1.0) / scale) (round_base / scale) | ||
result.rename new_name | ||
|
||
scaled.let name="scaled" scaled-> | ||
round_base = scaled.floor . rename "rb" | ||
round_base.let name="round_base" round_base-> | ||
round_midpoint = (round_base + 0.5) / scale | ||
round_midpoint.let name="round_midpoint" round_midpoint-> | ||
((scaled.truncate % 2) == 0).let name="scaled_truncate_mod_2_equals_0" scaled_truncate_mod_2_equals_0-> | ||
even_is_up = (self >= 0).iif scaled_truncate_mod_2_equals_0.not scaled_truncate_mod_2_equals_0 | ||
half_goes_up = if use_bankers then even_is_up else self >= 0 | ||
do_round_up = half_goes_up.let x-> x.iif (self >= round_midpoint) (self > round_midpoint) | ||
result = do_round_up.let x-> x.iif ((round_base + 1.0) / scale) (round_base / scale) | ||
result.rename new_name | ||
|
||
## PRIVATE | ||
Round a float-like column. | ||
|
@@ -864,15 +969,19 @@ type DB_Column | |
# Construct a constant Decimal column. | ||
k x = self.const x . cast Value_Type.Decimal | ||
new_name = self.naming_helper.function_name "round" [self] | ||
scale = k 10 ^ decimal_places | ||
scale = 10 ^ decimal_places | ||
scaled = self * scale | ||
round_base = scaled.floor . rename "rb" | ||
round_midpoint = (round_base + k 0.5).decimal_div scale | ||
even_is_up = (self >= k 0).iif ((scaled.truncate.decimal_mod (k 2)) != k 0) ((scaled.truncate.decimal_mod (k 2)) == k 0) | ||
half_goes_up = if use_bankers then even_is_up else self >= k 0 | ||
do_round_up = half_goes_up.iif (self >= round_midpoint) (self > round_midpoint) | ||
result = do_round_up.iif ((round_base + k 1).decimal_div scale) (round_base.decimal_div scale) | ||
result.rename new_name | ||
scaled.let name="scaled" scaled-> | ||
round_base = scaled.floor . rename "rb" | ||
round_base.let name="round_base" round_base-> | ||
round_midpoint = (round_base + k 0.5).decimal_div scale | ||
round_midpoint.let name="round_midpoint" round_midpoint-> | ||
((scaled.truncate.decimal_mod (k 2)) != k 0).let name="scaled_truncate_mod_2_equals_0" scaled_truncate_mod_2_equals_0-> | ||
even_is_up = (self >= k 0).iif scaled_truncate_mod_2_equals_0.not scaled_truncate_mod_2_equals_0 | ||
half_goes_up = if use_bankers then even_is_up else self >= k 0 | ||
do_round_up = half_goes_up.let x-> x.iif (self >= round_midpoint) (self > round_midpoint) | ||
result = do_round_up.let x-> x.iif ((round_base + k 1).decimal_div scale) (round_base.decimal_div scale) | ||
result.rename new_name | ||
|
||
## PRIVATE | ||
Round an integer column. | ||
|
@@ -882,20 +991,22 @@ type DB_Column | |
scale = 10 ^ -decimal_places | ||
halfway = scale.div 2 | ||
remainder = self % scale | ||
scaled_down = (self / scale).truncate . cast Value_Type.Integer | ||
result_unnudged = scaled_down * scale | ||
|
||
if_non_neg = | ||
half_goes_up = if use_bankers then (scaled_down % 2) != 0 else self >= 0 | ||
round_up = half_goes_up.iif (remainder >= halfway) (remainder > halfway) | ||
round_up.iif (result_unnudged + scale) result_unnudged | ||
if_neg = | ||
half_goes_up = if use_bankers then (scaled_down % 2) == 0 else self >= 0 | ||
round_up = half_goes_up.iif (remainder < -halfway) (remainder <= -halfway) | ||
round_up.iif (result_unnudged - scale) result_unnudged | ||
|
||
result = (self >= 0).iif if_non_neg if_neg | ||
result.cast Value_Type.Float . rename new_name | ||
remainder.let name="remainer" remainder-> | ||
scaled_down = (self / scale).truncate . cast Value_Type.Integer | ||
scaled_down.let name="scaled_down" scaled_down-> | ||
result_unnudged = scaled_down * scale | ||
result_unnudged.let name="result_unnudged" result_unnudged-> | ||
if_non_neg = | ||
half_goes_up = if use_bankers then (scaled_down % 2) != 0 else self >= 0 | ||
round_up = half_goes_up.let x-> x.iif (remainder >= halfway) (remainder > halfway) | ||
round_up.let x-> x.iif (result_unnudged + scale) result_unnudged | ||
if_neg = | ||
half_goes_up = if use_bankers then (scaled_down % 2) == 0 else self >= 0 | ||
round_up = half_goes_up.let x-> x.iif (remainder < -halfway) (remainder <= -halfway) | ||
round_up.let x-> x.iif (result_unnudged - scale) result_unnudged | ||
|
||
result = (self >= 0).let x-> x.iif if_non_neg if_neg | ||
result.cast Value_Type.Float . rename new_name | ||
|
||
## ALIAS int | ||
GROUP Standard.Base.Rounding | ||
|
@@ -1082,6 +1193,16 @@ type DB_Column | |
new_name = self.naming_helper.function_name "is_infinite" [self] | ||
self.make_unary_op "IS_INF" new_name | ||
|
||
## GROUP Standard.Base.Math | ||
ICON math | ||
Returns a column of booleans, with `True` items at the positions where | ||
this column contains a non-infinite, non-NaN floating point value. This | ||
is only applicable to double columns. | ||
is_finite : DB_Column | ||
is_finite self = Value_Type.expect_numeric self <| | ||
new_name = self.naming_helper.function_name "is_finite" [self] | ||
self.make_unary_op "IS_FINITE" new_name | ||
|
||
## PRIVATE | ||
Returns a column of booleans, with `True` items at the positions where | ||
this column contains an empty string or `Nothing`. | ||
|
@@ -1973,9 +2094,9 @@ type DB_Column | |
name. If the column is not floating point, just return the expression. | ||
short_circuit_special_floating_point : DB_Column -> DB_Column | ||
short_circuit_special_floating_point self exp = | ||
self_is_nan = if self.connection.dialect.supports_separate_nan then self.is_nan else self.is_nothing | ||
if self.value_type.is_floating_point.not then exp else | ||
((self_is_nan || self.is_infinite).iif self exp).rename exp.name | ||
(self.is_finite.not.iif self exp).rename exp.name | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I originally also used I kept it in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't think I understand - what is the difference here between There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The difference is that Also, I guess the rule is always "use CTEs if it makes the query smaller". For a method like We could also examine the "size" of the input to determine if it's necessary -- I think that eventually we will probably want to do that. But for now, the best approach is to experiment by adding There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, that sounds good. |
||
|
||
## PRIVATE | ||
Helper for case case_sensitivity based text operations | ||
make_text_case_op left op other case_sensitivity new_name = | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done