-
Notifications
You must be signed in to change notification settings - Fork 84
Add string_agg
function, partition_by
and order_by
for analytic functions, and partition_by
and limit
for string_agg
#1568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Probably will not merge without implementing WN-0011, assuming it is accepted. |
Fixes #1560 |
string_agg
functionstring_agg
function, partition_by
and order_by
for analytic functions, and partition_by
and limit
for string_agg
…te ordering with multiple order_bys to generate correct SQL
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes:
string_agg
function with two overloads:string_agg(field)
andstring_agg(field, separator)
. Ifseparator
is not specified, it will be,
. Also adds astring_agg_distinct
which is the same asstring_agg
, but distinct.string_agg
, the ability to specify anorder_by
andlimit
for an aggregate function, similar to the way you specify filters for an aggregate expression:string_agg(field, sep) { order_by: field, limit: 10 }
. These are under experimentsfunction_order_by
andaggregate_limit
respectively. (These names were chosen becauseorder_by
also applies to window functions, whereaslimit
only applies to aggregates.)order_by
andpartition_by
in the same way as above. This is under the experimentpartition_by
.For both aggregates and window functions,
order_by
is a comma-required-separated list of expressions followed by an optionalasc
ordesc
. This is different from a query'sorder_by
, which is a comma-optional-separated list of field names or integers (also withasc
/desc
). This difference comes from the fact thatorder_by
in a query needs to be a field in the output space, whereasorder_by
in an aggregate or window function can be an expression.Specifically,
order_by
in an aggregate (i.e.string_agg
) can be any input-space expression, andorder_by
in an analytic function must either be an aggregate expression or an output field reference. These requirements are correctly detected by the parser.partition_by
is a comma-required-separated list of non-dotted identifiers (referring to fields in the output space). The comma is required because forselect
views, expressions should actually be allowed inpartition_by
, so commas will be required down the line anyway.limit
must be a literal integer.Known limitations:
string_agg_distinct
with anorder_by
, the order by must be exactly the same as the first parameter; this is currently a SQL error not caught by Malloylimit
is only allowed for BQ, and there's some funkiness with the way functions are parsed/compiled to make sure there's a good error message in the other dialects. There is an existing similar issue that if an overload doesn't exist for one dialect, it gets a gross internal error. We need to give this some more thought.Known bugs:
string_agg
'sorder_by
currently doesn't get added to the array agg unnest, meaningorder_by
doesn't work properly when there's a fanoutFuture changes:
string_agg
functions to use a custom fragment to produce a custom SQL compile with the following rules:string_agg
with noorder_by
and no fanout just uses theSTRING_AGG
functionstring_agg
has a fanout or anorder_by
, ordering expression and/or the fanout distinct key are put into aconcat("MAGIC_PREFIX", ordering_expression, "MAGIC SEPARATOR", distinct_key, "MAGIC_SUFFIX", thing_you_wanted_to_string_agg)
, then that expression isSTRING_AGG(DISTINCT)
ed, then theMAGIC_PREFIX.*?MAGIC_SUFFIX
es are removed withREGEXP_REPLACE
. This allows BigQuery to do fanoutstring_agg
calls and removes the bug withstring_agg
'sorder_by
not being added to the array agg unnest.{ order_by: asc }
( and{ order_by: desc }
) forstring_agg_distinct
to solve the issue of "the order by must be exactly the same as the first parameter". Forstring_agg
and analytic functions,order_by: asc
is an error, and forstring_agg_distinct
,order_by: anything_else
is an error.Fixes #1557
Fixes #1560