Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column-level lexically-scoped CTE expressions #10826

Merged
merged 72 commits into from
Aug 28, 2024
Merged

Conversation

GregoryTravis
Copy link
Contributor

@GregoryTravis GregoryTravis commented Aug 15, 2024

Pull Request Description

This implements DB_Column.with, which uses WITH ... AS SQL clauses to remove duplicates in the generated SQL.

After a discussion with @radeusgd, we concluded that we will probably want a more complete CTE implementation, so this one is useful for now to deal with big queries (like round).

Important Notes

Still to do in this PR:

  • Rename with to let (or something similar)
  • tests
  • documentation
  • remove State hack by moving query generation into a class and using a Ref field for scoping

Results on round_float:

--- SQL length in characters (unprettified) SQL length in lines (prettified)
Without CTEs 13193 851
With CTEs 3644 187

Compare the SQL:

without-ctes.sql.txt
with-ctes.sql.txt

Update, with name shortening:

--- SQL length in characters (unprettified) SQL length in lines (prettified)
Without CTEs 13193 853
With CTEs 2427 176

without-cte.txt
with-cte.txt

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.


rename self (name : Text) (binder : Text) =
binders = self.binders_ref.get
if binders.contains_key binder then binders.get binder else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick as that's probably not a bottleneck, but what if we binders.get binder if_missing=Nothing? We could avoid accessing the hashmap twice 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 😅

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really really great. I really appreciate the comprehensive documentation.

We should remember to run Snowflake tests before integrating.

new (existing_tables : Vector Text) -> Let_Binder_Renamer =
Let_Binder_Renamer.Value (Hashset.from_vector existing_tables) (Ref.new Dictionary.empty) (Ref.new 0)

rename self (name : Text) (binder : Text) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the consensus is all methods should have ## comment, probably with PRIVATE here.

Also I'd appreciate a short note on the logic here - I assume binder is the unique identifier and when we first encounter it, we try to find a shorter name based on name for it, and if we encounter the same identifier again we return the already assigned name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. I added docs.
Just curious -- if the type is PRIVATE, do the methods have to be PRIVATE too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh I don't know, hence I add PRIVATE just to be sure.

I think they are not private, so if you somehow got a value of this private type, they probably would show up in CB on it. But that seems harmless as long as they do not show up in CB search 'in general' (so e.g. in the default mode where you have no value selected).

I think we just had this convention to mark methods as PRIVATE unless meant for users, but it was from time before we got private types so it's a question of how we want to amend that convention for that case.

Also, I'm not sure how this plays with @jdunkerley's code linter/formatter. I assume it may 'warn' about these non-PRIVATE functions, but again I'm not sure.

@@ -212,6 +212,11 @@ type DB_Column
## PRIVATE
Column-level manual CTE factoring.

Arguments:
- name: A prefix to use for the generated let binder.
- callback: A functiont that receives the let reference and returns a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- callback: A functiont that receives the let reference and returns a
- callback: A function that receives the let reference and returns a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -707,15 +717,23 @@ default_fetch_types_query dialect expression context where_filter_always_false_l

## PRIVATE
Helper class for shortening the binder names generated for WITH clauses.

Each time `rename` is called on an identifier it hasn't seen before, it
generates a new, short identifier. The short identifier consits of the `name`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
generates a new, short identifier. The short identifier consits of the `name`
generates a new, short identifier. The short identifier consists of the `name`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for the comprehensive docs

Comment on lines 1101 to 1102
_ : Float -> x.is_finite
_ -> False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ : Float -> x.is_finite
_ -> False
_ : Float -> x.is_finite
_ : Number -> True
_ : Decimal -> True
_ -> False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually changing to:

            _ : Float   -> x.is_finite
            _           -> True

Because we're already using expect_numeric, and only Float can be infinite.

@GregoryTravis GregoryTravis added CI: Ready to merge This PR is eligible for automatic merge and removed CI: Ready to merge This PR is eligible for automatic merge labels Aug 27, 2024
group_builder.teardown <|
connection_data.teardown

if connection_data.connection.dialect.supports_nested_with_clause then
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@radeusgd I'm using the dialect for this. It could also be a test flag, but then we would have this flag in both the test config and the dialect, which seems redundant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 that seems very reasonable

@GregoryTravis
Copy link
Contributor Author

Snowflake CI run successful.

@GregoryTravis GregoryTravis added the CI: Ready to merge This PR is eligible for automatic merge label Aug 28, 2024
@mergify mergify bot merged commit 8260a95 into develop Aug 28, 2024
36 checks passed
@mergify mergify bot deleted the wip/gmt/10306-round-with branch August 28, 2024 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants