diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 01598373..32f5d803 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.11.1","generation_timestamp":"2024-12-01T00:47:11","documenter_version":"1.8.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.11.2","generation_timestamp":"2025-01-01T00:41:23","documenter_version":"1.8.0"}} \ No newline at end of file diff --git a/dev/examples/index.html b/dev/examples/index.html index 73762f52..2b82ac01 100644 --- a/dev/examples/index.html +++ b/dev/examples/index.html @@ -822,4 +822,4 @@ 10 │ 95538 2009-03-30 2009-09-02 11 │ 107680 2009-06-07 2009-07-30 12 │ 110862 2008-09-07 2010-06-07 -=# +=# diff --git a/dev/guide/index.html b/dev/guide/index.html index 4b2e9de0..aa47f7d9 100644 --- a/dev/guide/index.html +++ b/dev/guide/index.html @@ -758,4 +758,4 @@ 5 │ 438438 Acute myocardial infarction of a… Condition SNOMED ⋯ 6 │ 444406 Acute subendocardial infarction Condition SNOMED 6 columns omitted -=# +=# diff --git a/dev/index.html b/dev/index.html index 779e8a2a..be249e9b 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,2 +1,2 @@ -Home · FunSQL.jl

FunSQL.jl

FunSQL is a Julia library for compositional construction of SQL queries.

Table of Contents

+Home · FunSQL.jl

FunSQL.jl

FunSQL is a Julia library for compositional construction of SQL queries.

Table of Contents

diff --git a/dev/reference/index.html b/dev/reference/index.html index f9b6fd86..71120da8 100644 --- a/dev/reference/index.html +++ b/dev/reference/index.html @@ -1056,4 +1056,4 @@ WHERE ("cr"."relationship_id" = 'Subsumes') ) SELECT * -FROM "essential_hypertension"source +FROM "essential_hypertension"source diff --git a/dev/test/clauses/index.html b/dev/test/clauses/index.html index b9a12ccd..186d6f93 100644 --- a/dev/test/clauses/index.html +++ b/dev/test/clauses/index.html @@ -1224,4 +1224,4 @@ #=> SELECT * FROM "condition_occurrence" -=# +=# diff --git a/dev/test/index.html b/dev/test/index.html index 0f0d60a8..89ff540c 100644 --- a/dev/test/index.html +++ b/dev/test/index.html @@ -1,2 +1,2 @@ -Test Suite · FunSQL.jl

Test Suite

+Test Suite · FunSQL.jl

Test Suite

diff --git a/dev/test/nodes/index.html b/dev/test/nodes/index.html index 830b1a1d..20314a1a 100644 --- a/dev/test/nodes/index.html +++ b/dev/test/nodes/index.html @@ -3359,4 +3359,4 @@ │ ) AS "visit_group_1" ON ("person_2"."person_id" = "visit_group_1"."person_id")""", │ columns = [SQLColumn(:person_id), SQLColumn(:max_visit_start_date)]) └ @ FunSQL … -=# +=# diff --git a/dev/test/other/index.html b/dev/test/other/index.html index c4889343..f70388af 100644 --- a/dev/test/other/index.html +++ b/dev/test/other/index.html @@ -273,4 +273,4 @@ pack(sql, Dict("YEAR" => 1950)) #-> Any[1950]

pack can also be applied to a regular string, in which case it returns the parameters unchanged.

pack("SELECT * FROM person WHERE year_of_birth >= ?", (1950,))
-#-> (1950,)
+#-> (1950,) diff --git a/dev/two-kinds-of-sql-query-builders/index.html b/dev/two-kinds-of-sql-query-builders/index.html index 3b301018..3c20ef75 100644 --- a/dev/two-kinds-of-sql-query-builders/index.html +++ b/dev/two-kinds-of-sql-query-builders/index.html @@ -41,4 +41,4 @@ having orderby limit -end

Individual slots of this structure are populated by the corresponding pipeline nodes.

"Where" node acting on the syntax tree

This explains why the pipeline is insensitive to the order of the nodes. Indeed, as long as the content of the slots stays the same, it makes no difference in what order the slots are populated.

Pipeline is insensitive to the order of the nodes

This method of incrementally constructing a composite structure is known as the builder pattern. We can call the query builders that employ this pattern syntax-oriented.

Both data-oriented and syntax-oriented query builders are compositional: the difference is in the nature of the information processed by the units of composition. Data-oriented query builders incrementally refine the query output; syntax-oriented query builders incrementally assemble the SQL syntax tree. Their interfaces look almost identical, but their methods of operation are fundamentally different.

But which one is better? Syntax-oriented query builders have two definite advantages: they are easy to implement and they could support the full range of SQL features. Indeed, the interface of a syntax-oriented query builder is just a collection of builders for the SQL syntax tree. How complete the representation of the syntax tree determines how well various SQL features are supported.

On the other hand, syntax-oriented query builders are harder to use. As they directly represent the SQL grammar, they inherit all of its deficiencies. In particular, the rigid clause order makes it difficult to assemble complex data processing pipelines, especially when the arrangement of pipeline nodes is not predetermined.

A data-oriented query builder directly represents data processing nodes, which makes assembling data processing pipelines much more straightforward—as long as we can find the necessary nodes among those offered by the builder. But where does the builder get its collection of data processing nodes? And how can we tell if this collection is complete?

One way to implement a data-oriented query builder is to adapt a general-purpose query framework. Indeed, this is the origin of EF/LINQ, which is adapted from LINQ, and dbplyr, which is adapted from dplyr. The query framework determines what processing nodes are available and how they operate. In principle, any query framework could be adapted to SQL databases by introducing just one new node, a node that loads the content of a database table. If we place this node at the beginning of a pipeline and make the rest of it out of regular nodes, we obtain a pipeline that processes data from a SQL database. However, this pipeline will be very inefficient compared to a SQL engine, which can use indexes to avoid loading the entire table into memory and thus can process the same data much faster. This is why EF/LINQ and dbplyr generate a SQL query that replaces the pipeline as a whole. The pipeline itself no longer runs directly, but now serves as a specification, with the assumption that if it were to run, it would produce the same output as the SQL query. This method of transforming a general-purpose query framework to a SQL query builder is called SQL pushdown.

However, SQL pushdown has a serious limitation. A general-purpose query framework is not designed with SQL compatibility in mind. For this reason, some of the pipelines assembled within this framework cannot be converted to SQL. Even worse, many useful SQL queries have no equivalent pipelines and thus cannot be generated using SQL pushdown. Indeed, SQL accumulated a wide range of features and capabilities since it first appeared in 1974. The first revision of the SQL standard, SQL-86, already supported Cartesian products, filtering, grouping, aggregation, and correlated subqueries. The next revision, SQL-92, added many join types and introduced query nesting. SQL:1999 greatly expanded its analytical capabilities by adding two types of queries: recursive queries, for processing hierarchical data, and data cube queries, which generalize histograms, cross-tabulations, roll-ups, drill-downs, and sub-totals. The follow-up revision, SQL:2003, added support for aggregate functions over a running window. Admittedly, SQL is a quintessential enterprise abomination, a hodgepodge of features added to support every imaginable use case, but with inadequate syntax, weird gaps in functionality, and no regards to internal consistency. Nevertheless, the breadth of SQL's capabilities has not been matched by any other query framework, including LINQ or dplyr. So when we generate SQL queries using EF/LINQ or dbplyr, a large subset of these capabilities remains inaccessible.

FunSQL is a data-oriented query builder created specifically to expose full expressive power of SQL. Unlike EF/LINQ and dbplyr, FunSQL was not adapted from an existing query framework, but was carefully designed from scratch to match SQL's capabilities. These capabilities include, for example, support for correlated subqueries and lateral joins (with Bind node), aggregate and window functions (using Group and Partition nodes), as well as recursive queries (with Iterate node). This comprehensive support for SQL capabilities makes FunSQL the only SQL query builder suitable for assembling complex data processing pipelines. Moreover, even though FunSQL pipelines cannot be run directly, every FunSQL node has a well-defined data processing semantics, which means that, in principle, FunSQL could be developed into a full-blown query framework. This potentially opens a path for replacing SQL with an equally powerful, but a more coherent and expressive query language.

+end

Individual slots of this structure are populated by the corresponding pipeline nodes.

"Where" node acting on the syntax tree

This explains why the pipeline is insensitive to the order of the nodes. Indeed, as long as the content of the slots stays the same, it makes no difference in what order the slots are populated.

Pipeline is insensitive to the order of the nodes

This method of incrementally constructing a composite structure is known as the builder pattern. We can call the query builders that employ this pattern syntax-oriented.

Both data-oriented and syntax-oriented query builders are compositional: the difference is in the nature of the information processed by the units of composition. Data-oriented query builders incrementally refine the query output; syntax-oriented query builders incrementally assemble the SQL syntax tree. Their interfaces look almost identical, but their methods of operation are fundamentally different.

But which one is better? Syntax-oriented query builders have two definite advantages: they are easy to implement and they could support the full range of SQL features. Indeed, the interface of a syntax-oriented query builder is just a collection of builders for the SQL syntax tree. How complete the representation of the syntax tree determines how well various SQL features are supported.

On the other hand, syntax-oriented query builders are harder to use. As they directly represent the SQL grammar, they inherit all of its deficiencies. In particular, the rigid clause order makes it difficult to assemble complex data processing pipelines, especially when the arrangement of pipeline nodes is not predetermined.

A data-oriented query builder directly represents data processing nodes, which makes assembling data processing pipelines much more straightforward—as long as we can find the necessary nodes among those offered by the builder. But where does the builder get its collection of data processing nodes? And how can we tell if this collection is complete?

One way to implement a data-oriented query builder is to adapt a general-purpose query framework. Indeed, this is the origin of EF/LINQ, which is adapted from LINQ, and dbplyr, which is adapted from dplyr. The query framework determines what processing nodes are available and how they operate. In principle, any query framework could be adapted to SQL databases by introducing just one new node, a node that loads the content of a database table. If we place this node at the beginning of a pipeline and make the rest of it out of regular nodes, we obtain a pipeline that processes data from a SQL database. However, this pipeline will be very inefficient compared to a SQL engine, which can use indexes to avoid loading the entire table into memory and thus can process the same data much faster. This is why EF/LINQ and dbplyr generate a SQL query that replaces the pipeline as a whole. The pipeline itself no longer runs directly, but now serves as a specification, with the assumption that if it were to run, it would produce the same output as the SQL query. This method of transforming a general-purpose query framework to a SQL query builder is called SQL pushdown.

However, SQL pushdown has a serious limitation. A general-purpose query framework is not designed with SQL compatibility in mind. For this reason, some of the pipelines assembled within this framework cannot be converted to SQL. Even worse, many useful SQL queries have no equivalent pipelines and thus cannot be generated using SQL pushdown. Indeed, SQL accumulated a wide range of features and capabilities since it first appeared in 1974. The first revision of the SQL standard, SQL-86, already supported Cartesian products, filtering, grouping, aggregation, and correlated subqueries. The next revision, SQL-92, added many join types and introduced query nesting. SQL:1999 greatly expanded its analytical capabilities by adding two types of queries: recursive queries, for processing hierarchical data, and data cube queries, which generalize histograms, cross-tabulations, roll-ups, drill-downs, and sub-totals. The follow-up revision, SQL:2003, added support for aggregate functions over a running window. Admittedly, SQL is a quintessential enterprise abomination, a hodgepodge of features added to support every imaginable use case, but with inadequate syntax, weird gaps in functionality, and no regards to internal consistency. Nevertheless, the breadth of SQL's capabilities has not been matched by any other query framework, including LINQ or dplyr. So when we generate SQL queries using EF/LINQ or dbplyr, a large subset of these capabilities remains inaccessible.

FunSQL is a data-oriented query builder created specifically to expose full expressive power of SQL. Unlike EF/LINQ and dbplyr, FunSQL was not adapted from an existing query framework, but was carefully designed from scratch to match SQL's capabilities. These capabilities include, for example, support for correlated subqueries and lateral joins (with Bind node), aggregate and window functions (using Group and Partition nodes), as well as recursive queries (with Iterate node). This comprehensive support for SQL capabilities makes FunSQL the only SQL query builder suitable for assembling complex data processing pipelines. Moreover, even though FunSQL pipelines cannot be run directly, every FunSQL node has a well-defined data processing semantics, which means that, in principle, FunSQL could be developed into a full-blown query framework. This potentially opens a path for replacing SQL with an equally powerful, but a more coherent and expressive query language.