Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata interface for catalog objects #64

Merged
merged 7 commits into from
Jun 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ version = "0.13.2"

[deps]
DBInterface = "a10d1c49-ce27-4219-8d33-6db1a4562965"
DataAPI = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
LRUCache = "8ac3fa9e-de4c-5943-b1dc-09c6b5f20637"
OrderedCollections = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
Expand All @@ -13,6 +14,7 @@ Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"

[compat]
DBInterface = "2.5"
DataAPI = "1.13"
LRUCache = "1.3"
OrderedCollections = "1.4"
PrettyPrinting = "0.3.2, 0.4"
Expand Down
13 changes: 5 additions & 8 deletions docs/src/examples/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -346,10 +346,7 @@ the definitions programmatically.
!contains(String(c), "source")

q = From(:person) |>
Select(Get.(filter(is_not_source_column, person_table.columns))...)

# q = From(:person) |>
# Select(args = [Get(c) for c in person_table.columns if is_not_source_column(c)])
Select(args = [Get(c) for c in keys(person_table.columns) if is_not_source_column(c)])

display(q)
#=>
Expand Down Expand Up @@ -447,16 +444,16 @@ however we must ensure that all column names are unique.
const visit_occurrence_table = conn.catalog[:visit_occurrence]

q = q |>
Select(Get.(person_table.columns)...,
Get.(visit_occurrence_table.columns, over = Get.visit)...)
Select(Get.(keys(person_table.columns))...,
Get.(keys(visit_occurrence_table.columns), over = Get.visit)...)
#=>
ERROR: FunSQL.DuplicateLabelError: `person_id` is used more than once in:
=#

q = q |>
Select(Get.(person_table.columns)...,
Get.(filter(!in(person_table.columns), visit_occurrence_table.columns),
Select(Get.(keys(person_table.columns))...,
Get.(filter(!in(keys(person_table.columns)), collect(keys(visit_occurrence_table.columns))),
over = Get.visit)...)

render(conn, q) |> print
Expand Down
2 changes: 1 addition & 1 deletion docs/src/reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Pages = ["connections.jl"]
```


## `SQLCatalog` and `SQLTable`
## `SQLCatalog`, `SQLTable`, and `SQLColumn`

```@autodocs
Modules = [FunSQL]
Expand Down
14 changes: 9 additions & 5 deletions docs/src/test/nodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -2277,7 +2277,7 @@ for a `CREATE TABLE AS` or `SELECT INTO` statement.
with_external_handler((tbl, def)) =
println("CREATE TEMP TABLE ",
render(ID(tbl.qualifiers, tbl.name)),
" (", join([render(ID(c)) for c in tbl.columns], ", "), ") AS\n",
" (", join([render(ID(c.name)) for (n, c) in tbl.columns], ", "), ") AS\n",
render(def), ";\n")

q = From(:male) |>
Expand Down Expand Up @@ -3872,7 +3872,8 @@ and determines node types.
│ WithContext(over = Resolved(RowType(:person_id => ScalarType(),
│ :max_visit_start_date => ScalarType()),
│ over = q9))
│ over = q9),
│ catalog = SQLCatalog(dialect = SQLDialect(), cache = nothing))
│ end
└ @ FunSQL …
=#
Expand All @@ -3896,7 +3897,8 @@ produce.
│ q5 = Get.year_of_birth,
│ q6 = Linked([q2, q3, q4, q5], 3, over = q1),
│ WithContext(over = q33)
│ WithContext(over = q33,
│ catalog = SQLCatalog(dialect = SQLDialect(), cache = nothing))
│ end
└ @ FunSQL …
=#
Expand Down Expand Up @@ -3941,7 +3943,8 @@ On the next stage, the query object is converted to a SQL syntax tree.
│ ID(:visit_group_1) |> ID(:person_id)),
│ left = true) |>
│ SELECT(ID(:person_2) |> ID(:person_id),
│ ID(:visit_group_1) |> ID(:max) |> AS(:max_visit_start_date)))
│ ID(:visit_group_1) |> ID(:max) |> AS(:max_visit_start_date)),
│ columns = [SQLColumn(:person_id), SQLColumn(:max_visit_start_date)])
└ @ FunSQL …
=#

Expand Down Expand Up @@ -3976,6 +3979,7 @@ Finally, the SQL tree is serialized into SQL.
│ "visit_occurrence_1"."person_id"
│ FROM "visit_occurrence" AS "visit_occurrence_1"
│ GROUP BY "visit_occurrence_1"."person_id"
│ ) AS "visit_group_1" ON ("person_2"."person_id" = "visit_group_1"."person_id")""")
│ ) AS "visit_group_1" ON ("person_2"."person_id" = "visit_group_1"."person_id")""",
│ columns = [SQLColumn(:person_id), SQLColumn(:max_visit_start_date)])
└ @ FunSQL …
=#
186 changes: 151 additions & 35 deletions docs/src/test/other.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,78 +66,131 @@ by name.
DBInterface.close!(conn)


## `SQLCatalog` and `SQLTable`
## `SQLCatalog`, `SQLTable`, and `SQLColumn`

In FunSQL, tables and table-like entities are represented using `SQLTable`
objects. A collection of `SQLTable` objects is represented as a `SQLCatalog`
objects. Their columns are represented using `SQLColumn` objects.
A collection of `SQLTable` objects is represented as a `SQLCatalog`
object.

using FunSQL: SQLCatalog, SQLTable
using FunSQL: SQLCatalog, SQLColumn, SQLTable

A `SQLTable` constructor takes the table name, a vector of column names,
and, optionally, the name of the table schema and other qualifiers. A name
could be provided either as a `Symbol` or as a `String` value.
A `SQLTable` constructor takes the table name, a vector of columns, and,
optionally, the name of the table schema and other qualifiers. A name
could be provided either as a `Symbol` or as a `String` value. A column
can be specified just by its name.

location = SQLTable(qualifiers = [:public],
name = :location,
columns = [:location_id, :address_1, :address_2,
:city, :state, :zip])
#-> SQLTable(:location, qualifiers = [:public], …)
#-> SQLTable(qualifiers = [:public], :location, …)

person = SQLTable(name = "person",
columns = ["person_id", "year_of_birth", "location_id"])
#-> SQLTable(:person, …)

The table and the column names could be provided as positional arguments.

vocabulary = SQLTable(:vocabulary,
columns = [:vocabulary_id, :vocabulary_name])
#-> SQLTable(:vocabulary, …)

concept = SQLTable("concept", "concept_id", "concept_name", "vocabulary_id")
#-> SQLTable(:concept, …)

A column may have a custom name for use with FunSQL and the original name
for generating SQL queries.

vocabulary = SQLTable(:vocabulary,
:id => SQLColumn(:vocabulary_id),
:name => SQLColumn(:vocabulary_name))
#-> SQLTable(:vocabulary, …)

A `SQLTable` object is displayed as a Julia expression that created
the object.

display(location)
#=>
SQLTable(:location,
qualifiers = [:public],
columns = [:location_id, :address_1, :address_2, :city, :state, :zip])
SQLTable(qualifiers = [:public],
:location,
SQLColumn(:location_id),
SQLColumn(:address_1),
SQLColumn(:address_2),
SQLColumn(:city),
SQLColumn(:state),
SQLColumn(:zip))
=#

display(person)
display(vocabulary)
#=>
SQLTable(:person, columns = [:person_id, :year_of_birth, :location_id])
SQLTable(:vocabulary,
:id => SQLColumn(:vocabulary_id),
:name => SQLColumn(:vocabulary_name))
=#

A `SQLTable` object behaves like a read-only dictionary.

person[:person_id]
#-> SQLColumn(:person_id)

person["person_id"]
#-> SQLColumn(:person_id)

person[1]
#-> SQLColumn(:person_id)

person[:visit_occurrence]
#-> ERROR: KeyError: key :visit_occurrence not found

get(person, :person_id, nothing)
#-> SQLColumn(:person_id)

get(person, "person_id", nothing)
#-> SQLColumn(:person_id)

get(person, :visit_occurrence, missing)
#-> missing

get(() -> missing, person, :visit_occurrence)
#-> missing

length(person)
#-> 3

collect(keys(person))
#-> [:person_id, :year_of_birth, :location_id]

A `SQLCatalog` constructor takes a collection of `SQLTable` objects,
the target dialect, and the size of the query cache.
the target dialect, and the size of the query cache. Just as columns,
a table may have a custom name for use with FunSQL and the original name
for generating SQL.

catalog = SQLCatalog(tables = [person, location, vocabulary, concept],
catalog = SQLCatalog(tables = [person, location, concept, :concept_vocabulary => vocabulary],
dialect = :sqlite,
cache = 128)
#-> SQLCatalog(…4 tables…, dialect = SQLDialect(:sqlite), cache = 128)

display(catalog)
#=>
SQLCatalog(
:concept => SQLTable(:concept,
columns =
[:concept_id, :concept_name, :vocabulary_id]),
:location =>
SQLTable(
:location,
qualifiers = [:public],
columns =
[:location_id, :address_1, :address_2, :city, :state, :zip]),
:person => SQLTable(:person,
columns = [:person_id, :year_of_birth, :location_id]),
:vocabulary => SQLTable(:vocabulary,
columns = [:vocabulary_id, :vocabulary_name]),
dialect = SQLDialect(:sqlite),
cache = 128)
SQLCatalog(SQLTable(:concept,
SQLColumn(:concept_id),
SQLColumn(:concept_name),
SQLColumn(:vocabulary_id)),
:concept_vocabulary => SQLTable(:vocabulary,
:id => SQLColumn(:vocabulary_id),
:name => SQLColumn(
:vocabulary_name)),
SQLTable(qualifiers = [:public],
:location,
SQLColumn(:location_id),
SQLColumn(:address_1),
SQLColumn(:address_2),
SQLColumn(:city),
SQLColumn(:state),
SQLColumn(:zip)),
SQLTable(:person,
SQLColumn(:person_id),
SQLColumn(:year_of_birth),
SQLColumn(:location_id)),
dialect = SQLDialect(:sqlite),
cache = 128)
=#

Number of tables in the catalog affects its representation.
Expand Down Expand Up @@ -191,7 +244,61 @@ The catalog behaves as a read-only `Dict` object.
#-> 4

sort(collect(keys(catalog)))
#-> [:concept, :location, :person, :vocabulary]
#-> [:concept, :concept_vocabulary, :location, :person]

Catalog objects can be assigned arbitrary metadata.

metadata_catalog =
SQLCatalog(SQLTable(:person,
SQLColumn(:person_id, metadata = (; label = "Person ID")),
SQLColumn(:year_of_birth, metadata = (;)),
metadata = (; caption = "Person", is_view = false)),
metadata = (; model = "OMOP"))
#-> SQLCatalog(…1 table…, dialect = SQLDialect(), metadata = …)

display(metadata_catalog)
#=>
SQLCatalog(SQLTable(:person,
SQLColumn(:person_id, metadata = [:label => "Person ID"]),
SQLColumn(:year_of_birth),
metadata = [:caption => "Person", :is_view => false]),
dialect = SQLDialect(),
metadata = [:model => "OMOP"])
=#

FunSQL metadata supports DataAPI metadata interface.

using DataAPI

DataAPI.metadata(metadata_catalog)
#-> Dict("model" => "OMOP")

DataAPI.metadata(metadata_catalog, style = true)
#-> Dict("model" => ("OMOP", :default))

DataAPI.metadata(metadata_catalog, :name, :default)
#-> :default

DataAPI.metadata(metadata_catalog[:person])["caption"]
#-> "Person"

DataAPI.metadata(metadata_catalog[:person], :is_view, true)
#-> false

DataAPI.colmetadata(metadata_catalog[:person])[:person_id]["label"]
#-> "Person ID"

DataAPI.colmetadata(metadata_catalog[:person], 1, :label)
#-> "Person ID"

DataAPI.colmetadata(metadata_catalog[:person], :year_of_birth, :label, "")
#-> ""

DataAPI.metadata(metadata_catalog[:person][:person_id])
#-> Dict("label" => "Person ID")

DataAPI.metadata(metadata_catalog[:person][:person_id], :label, "")
#-> "Person ID"


## `SQLDialect`
Expand Down Expand Up @@ -274,6 +381,15 @@ A completely custom dialect can be specified.
String(sql)
#-> "SELECT * FROM person"

`SQLString` may carry a vector `columns` describing the output columns of
the query.

sql = SQLString("SELECT person_id FROM person", columns = [SQLColumn(:person_id)])
#-> SQLString("SELECT person_id FROM person", columns = […1 column…])

display(sql)
#-> SQLString("SELECT person_id FROM person", columns = [SQLColumn(:person_id)])

When the query has parameters, `SQLString` should include a vector of
parameter names in the order they should appear in `DBInterface.execute` call.

Expand Down
3 changes: 2 additions & 1 deletion src/FunSQL.jl
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ using OrderedCollections: OrderedDict, OrderedSet
using Tables
using DBInterface
using LRUCache
using DataAPI

const SQLLiteralType =
Union{Missing, Bool, Number, AbstractString, Dates.AbstractTime}
Expand All @@ -96,10 +97,10 @@ end

include("dissect.jl")
include("quote.jl")
include("strings.jl")
include("dialects.jl")
include("types.jl")
include("catalogs.jl")
include("strings.jl")
include("clauses.jl")
include("nodes.jl")
include("connections.jl")
Expand Down
Loading
Loading