From fdf144dedd3cb5be488cea5df1bee813f315b921 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Fri, 1 Mar 2024 00:30:40 +0000 Subject: [PATCH] build based on 346449f --- dev/.documenter-siteinfo.json | 2 +- dev/examples/index.html | 2 +- dev/guide/index.html | 2 +- dev/index.html | 2 +- dev/reference/index.html | 2 +- dev/test/clauses/index.html | 2 +- dev/test/index.html | 2 +- dev/test/nodes/index.html | 2 +- dev/test/other/index.html | 2 +- dev/two-kinds-of-sql-query-builders/index.html | 2 +- 10 files changed, 10 insertions(+), 10 deletions(-) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 87b4cc92..63c2b381 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.10.1","generation_timestamp":"2024-02-16T00:21:13","documenter_version":"1.2.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.1","generation_timestamp":"2024-03-01T00:30:36","documenter_version":"1.2.1"}} \ No newline at end of file diff --git a/dev/examples/index.html b/dev/examples/index.html index 957e3ac6..eb49013c 100644 --- a/dev/examples/index.html +++ b/dev/examples/index.html @@ -825,4 +825,4 @@ 10 │ 95538 2009-03-30 2009-09-02 11 │ 107680 2009-06-07 2009-07-30 12 │ 110862 2008-09-07 2010-06-07 -=# +=# diff --git a/dev/guide/index.html b/dev/guide/index.html index 0530c5c6..d73acd7b 100644 --- a/dev/guide/index.html +++ b/dev/guide/index.html @@ -758,4 +758,4 @@ 5 │ 438438 Acute myocardial infarction of a… Condition SNOMED ⋯ 6 │ 444406 Acute subendocardial infarction Condition SNOMED 6 columns omitted -=# +=# diff --git a/dev/index.html b/dev/index.html index 830f7735..c7a29ed6 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,2 +1,2 @@ -Home · FunSQL.jl
+Home · FunSQL.jl
diff --git a/dev/reference/index.html b/dev/reference/index.html index 11e42602..a72d5523 100644 --- a/dev/reference/index.html +++ b/dev/reference/index.html @@ -1035,4 +1035,4 @@ WHERE ("cr"."relationship_id" = 'Subsumes') ) SELECT * -FROM "essential_hypertension"source +FROM "essential_hypertension"source diff --git a/dev/test/clauses/index.html b/dev/test/clauses/index.html index fe7c8426..f0433e07 100644 --- a/dev/test/clauses/index.html +++ b/dev/test/clauses/index.html @@ -1186,4 +1186,4 @@ #=> SELECT * FROM "condition_occurrence" -=# +=# diff --git a/dev/test/index.html b/dev/test/index.html index b3ce205f..4dd53118 100644 --- a/dev/test/index.html +++ b/dev/test/index.html @@ -1,2 +1,2 @@ -Test Suite · FunSQL.jl
+Test Suite · FunSQL.jl
diff --git a/dev/test/nodes/index.html b/dev/test/nodes/index.html index 9a371b7f..6c4f8631 100644 --- a/dev/test/nodes/index.html +++ b/dev/test/nodes/index.html @@ -3037,4 +3037,4 @@ │ GROUP BY "visit_occurrence_1"."person_id" │ ) AS "visit_group_1" ON ("person_2"."person_id" = "visit_group_1"."person_id")""") └ @ FunSQL … -=# +=# diff --git a/dev/test/other/index.html b/dev/test/other/index.html index 5797c08f..c42ce80c 100644 --- a/dev/test/other/index.html +++ b/dev/test/other/index.html @@ -183,4 +183,4 @@ pack(sql, Dict("YEAR" => 1950)) #-> Any[1950]

pack can also be applied to a regular string, in which case it returns the parameters unchanged.

pack("SELECT * FROM person WHERE year_of_birth >= ?", (1950,))
-#-> (1950,)
+#-> (1950,) diff --git a/dev/two-kinds-of-sql-query-builders/index.html b/dev/two-kinds-of-sql-query-builders/index.html index b748cca1..82fa3255 100644 --- a/dev/two-kinds-of-sql-query-builders/index.html +++ b/dev/two-kinds-of-sql-query-builders/index.html @@ -41,4 +41,4 @@ having orderby limit -end

Individual slots of this structure are populated by the corresponding pipeline nodes.

"Where" node acting on the syntax tree

This explains why the pipeline is insensitive to the order of the nodes. Indeed, as long as the content of the slots stays the same, it makes no difference in what order the slots are populated.

Pipeline is insensitive to the order of the nodes

This method of incrementally constructing a composite structure is known as the builder pattern. We can call the query builders that employ this pattern syntax-oriented.

Both data-oriented and syntax-oriented query builders are compositional: the difference is in the nature of the information processed by the units of composition. Data-oriented query builders incrementally refine the query output; syntax-oriented query builders incrementally assemble the SQL syntax tree. Their interfaces look almost identical, but their methods of operation are fundamentally different.

But which one is better? Syntax-oriented query builders have two definite advantages: they are easy to implement and they could support the full range of SQL features. Indeed, the interface of a syntax-oriented query builder is just a collection of builders for the SQL syntax tree. How complete the representation of the syntax tree determines how well various SQL features are supported.

On the other hand, syntax-oriented query builders are harder to use. As they directly represent the SQL grammar, they inherit all of its deficiencies. In particular, the rigid clause order makes it difficult to assemble complex data processing pipelines, especially when the arrangement of pipeline nodes is not predetermined.

A data-oriented query builder directly represents data processing nodes, which makes assembling data processing pipelines much more straightforward—as long as we can find the necessary nodes among those offered by the builder. But where does the builder get its collection of data processing nodes? And how can we tell if this collection is complete?

One way to implement a data-oriented query builder is to adapt a general-purpose query framework. Indeed, this is the origin of EF/LINQ, which is adapted from LINQ, and dbplyr, which is adapted from dplyr. The query framework determines what processing nodes are available and how they operate. In principle, any query framework could be adapted to SQL databases by introducing just one new node, a node that loads the content of a database table. If we place this node at the beginning of a pipeline and make the rest of it out of regular nodes, we obtain a pipeline that processes data from a SQL database. However, this pipeline will be very inefficient compared to a SQL engine, which can use indexes to avoid loading the entire table into memory and thus can process the same data much faster. This is why EF/LINQ and dbplyr generate a SQL query that replaces the pipeline as a whole. The pipeline itself no longer runs directly, but now serves as a specification, with the assumption that if it were to run, it would produce the same output as the SQL query. This method of transforming a general-purpose query framework to a SQL query builder is called SQL pushdown.

However, SQL pushdown has a serious limitation. A general-purpose query framework is not designed with SQL compatibility in mind. For this reason, some of the pipelines assembled within this framework cannot be converted to SQL. Even worse, many useful SQL queries have no equivalent pipelines and thus cannot be generated using SQL pushdown. Indeed, SQL accumulated a wide range of features and capabilities since it first appeared in 1974. The first revision of the SQL standard, SQL-86, already supported Cartesian products, filtering, grouping, aggregation, and correlated subqueries. The next revision, SQL-92, added many join types and introduced query nesting. SQL:1999 greatly expanded its analytical capabilities by adding two types of queries: recursive queries, for processing hierarchical data, and data cube queries, which generalize histograms, cross-tabulations, roll-ups, drill-downs, and sub-totals. The follow-up revision, SQL:2003, added support for aggregate functions over a running window. Admittedly, SQL is a quintessential enterprise abomination, a hodgepodge of features added to support every imaginable use case, but with inadequate syntax, weird gaps in functionality, and no regards to internal consistency. Nevertheless, the breadth of SQL's capabilities has not been matched by any other query framework, including LINQ or dplyr. So when we generate SQL queries using EF/LINQ or dbplyr, a large subset of these capabilities remains inaccessible.

FunSQL is a data-oriented query builder created specifically to expose full expressive power of SQL. Unlike EF/LINQ and dbplyr, FunSQL was not adapted from an existing query framework, but was carefully designed from scratch to match SQL's capabilities. These capabilities include, for example, support for correlated subqueries and lateral joins (with Bind node), aggregate and window functions (using Group and Partition nodes), as well as recursive queries (with Iterate node). This comprehensive support for SQL capabilities makes FunSQL the only SQL query builder suitable for assembling complex data processing pipelines. Moreover, even though FunSQL pipelines cannot be run directly, every FunSQL node has a well-defined data processing semantics, which means that, in principle, FunSQL could be developed into a full-blown query framework. This potentially opens a path for replacing SQL with an equally powerful, but a more coherent and expressive query language.

+end

Individual slots of this structure are populated by the corresponding pipeline nodes.

"Where" node acting on the syntax tree

This explains why the pipeline is insensitive to the order of the nodes. Indeed, as long as the content of the slots stays the same, it makes no difference in what order the slots are populated.

Pipeline is insensitive to the order of the nodes

This method of incrementally constructing a composite structure is known as the builder pattern. We can call the query builders that employ this pattern syntax-oriented.

Both data-oriented and syntax-oriented query builders are compositional: the difference is in the nature of the information processed by the units of composition. Data-oriented query builders incrementally refine the query output; syntax-oriented query builders incrementally assemble the SQL syntax tree. Their interfaces look almost identical, but their methods of operation are fundamentally different.

But which one is better? Syntax-oriented query builders have two definite advantages: they are easy to implement and they could support the full range of SQL features. Indeed, the interface of a syntax-oriented query builder is just a collection of builders for the SQL syntax tree. How complete the representation of the syntax tree determines how well various SQL features are supported.

On the other hand, syntax-oriented query builders are harder to use. As they directly represent the SQL grammar, they inherit all of its deficiencies. In particular, the rigid clause order makes it difficult to assemble complex data processing pipelines, especially when the arrangement of pipeline nodes is not predetermined.

A data-oriented query builder directly represents data processing nodes, which makes assembling data processing pipelines much more straightforward—as long as we can find the necessary nodes among those offered by the builder. But where does the builder get its collection of data processing nodes? And how can we tell if this collection is complete?

One way to implement a data-oriented query builder is to adapt a general-purpose query framework. Indeed, this is the origin of EF/LINQ, which is adapted from LINQ, and dbplyr, which is adapted from dplyr. The query framework determines what processing nodes are available and how they operate. In principle, any query framework could be adapted to SQL databases by introducing just one new node, a node that loads the content of a database table. If we place this node at the beginning of a pipeline and make the rest of it out of regular nodes, we obtain a pipeline that processes data from a SQL database. However, this pipeline will be very inefficient compared to a SQL engine, which can use indexes to avoid loading the entire table into memory and thus can process the same data much faster. This is why EF/LINQ and dbplyr generate a SQL query that replaces the pipeline as a whole. The pipeline itself no longer runs directly, but now serves as a specification, with the assumption that if it were to run, it would produce the same output as the SQL query. This method of transforming a general-purpose query framework to a SQL query builder is called SQL pushdown.

However, SQL pushdown has a serious limitation. A general-purpose query framework is not designed with SQL compatibility in mind. For this reason, some of the pipelines assembled within this framework cannot be converted to SQL. Even worse, many useful SQL queries have no equivalent pipelines and thus cannot be generated using SQL pushdown. Indeed, SQL accumulated a wide range of features and capabilities since it first appeared in 1974. The first revision of the SQL standard, SQL-86, already supported Cartesian products, filtering, grouping, aggregation, and correlated subqueries. The next revision, SQL-92, added many join types and introduced query nesting. SQL:1999 greatly expanded its analytical capabilities by adding two types of queries: recursive queries, for processing hierarchical data, and data cube queries, which generalize histograms, cross-tabulations, roll-ups, drill-downs, and sub-totals. The follow-up revision, SQL:2003, added support for aggregate functions over a running window. Admittedly, SQL is a quintessential enterprise abomination, a hodgepodge of features added to support every imaginable use case, but with inadequate syntax, weird gaps in functionality, and no regards to internal consistency. Nevertheless, the breadth of SQL's capabilities has not been matched by any other query framework, including LINQ or dplyr. So when we generate SQL queries using EF/LINQ or dbplyr, a large subset of these capabilities remains inaccessible.

FunSQL is a data-oriented query builder created specifically to expose full expressive power of SQL. Unlike EF/LINQ and dbplyr, FunSQL was not adapted from an existing query framework, but was carefully designed from scratch to match SQL's capabilities. These capabilities include, for example, support for correlated subqueries and lateral joins (with Bind node), aggregate and window functions (using Group and Partition nodes), as well as recursive queries (with Iterate node). This comprehensive support for SQL capabilities makes FunSQL the only SQL query builder suitable for assembling complex data processing pipelines. Moreover, even though FunSQL pipelines cannot be run directly, every FunSQL node has a well-defined data processing semantics, which means that, in principle, FunSQL could be developed into a full-blown query framework. This potentially opens a path for replacing SQL with an equally powerful, but a more coherent and expressive query language.