Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2,892 changes: 1,382 additions & 1,510 deletions demo.ipynb

Large diffs are not rendered by default.

8 changes: 8 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,3 +72,11 @@

html_theme = "sphinx_book_theme"
# html_static_path = ['_static'] # Uncomment when you have custom static files
html_theme_options = {
# "logo": {
# "image_light": "_static/logo-light.svg",
# "image_dark": "_static/logo-dark.svg",
# },
"repository_url": "https://github.com/abdenlab/giql",
"use_repository_button": True,
}
49 changes: 2 additions & 47 deletions docs/dialect/aggregation-operators.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,6 @@ Aggregation operators combine and cluster genomic intervals. These operators are
essential for reducing complex interval data into summarized regions, such as
merging overlapping peaks or identifying clusters of related features.

.. contents::
:local:
:depth: 1

.. _cluster-operator:

CLUSTER
Expand Down Expand Up @@ -149,26 +145,6 @@ Find regions with multiple overlapping features:
INNER JOIN cluster_sizes s ON c.cluster_id = s.cluster_id
WHERE s.size >= 3

Backend Compatibility
~~~~~~~~~~~~~~~~~~~~~

.. list-table::
:header-rows: 1
:widths: 20 20 60

* - Backend
- Support
- Notes
* - DuckDB
- Full
- Efficient window function implementation
* - SQLite
- Full
-
* - PostgreSQL
- Planned
-

Performance Notes
~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -342,31 +318,10 @@ Calculate the total base pairs covered after merging:
SELECT SUM(end - start) AS total_coverage
FROM merged

Backend Compatibility
~~~~~~~~~~~~~~~~~~~~~

.. list-table::
:header-rows: 1
:widths: 20 20 60

* - Backend
- Support
- Notes
* - DuckDB
- Full
-
* - SQLite
- Full
-
* - PostgreSQL
- Planned
-

Performance Notes
~~~~~~~~~~~~~~~~~
Notes
~~~~~

- MERGE is an aggregate operation that processes all matching rows
- For very large datasets, consider filtering by chromosome first
- The operation sorts data internally, so pre-sorting is not required

Related Operators
Expand Down
62 changes: 5 additions & 57 deletions docs/dialect/distance-operators.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,10 @@
Distance and Proximity
Distance and Neighbors
======================

Distance and proximity operators calculate genomic distances and find nearest features.
These operators are essential for proximity analysis, such as finding genes near
regulatory elements or variants near transcription start sites.

.. contents::
:local:
:depth: 1

.. _distance-operator:

DISTANCE
Expand Down Expand Up @@ -97,33 +93,12 @@ Distinguish between overlapping and nearby features:
CROSS JOIN genes g
WHERE p.chrom = g.chrom

Backend Compatibility
~~~~~~~~~~~~~~~~~~~~~

.. list-table::
:header-rows: 1
:widths: 20 20 60

* - Backend
- Support
- Notes
* - DuckDB
- Full
-
* - SQLite
- Full
-
* - PostgreSQL
- Planned
-

Performance Notes
~~~~~~~~~~~~~~~~~
Notes
~~~~~

- Always include ``WHERE a.chrom = b.chrom`` to avoid unnecessary
cross-chromosome comparisons
- For large datasets, consider pre-filtering by region before calculating distances
- Create indexes on chromosome and position columns for better performance

Related Operators
~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -332,39 +307,12 @@ Find nearby same-strand features within distance constraints:
WHERE nearest.distance BETWEEN -10000 AND 10000
ORDER BY peaks.name, ABS(nearest.distance)

Backend Compatibility
~~~~~~~~~~~~~~~~~~~~~

.. list-table::
:header-rows: 1
:widths: 20 20 60

* - Backend
- Support
- Notes
* - DuckDB
- Full
- Efficient lateral join support
* - SQLite
- Partial
- Works but slower for large k values
* - PostgreSQL
- Planned
-

Performance Notes
~~~~~~~~~~~~~~~~~
Notes
~~~~~

- **Chromosome pre-filtering**: NEAREST automatically filters by chromosome for efficiency
- **Use max_distance**: Specifying a maximum distance reduces the search space significantly
- **Limit k**: Only request as many neighbors as you actually need
- **Create indexes**: Add indexes on ``(chrom, start, "end")`` for better performance

.. code-block:: sql

-- Create indexes for better NEAREST performance
CREATE INDEX idx_genes_position
ON genes (chrom, start, "end")

Related Operators
~~~~~~~~~~~~~~~~~
Expand Down
41 changes: 32 additions & 9 deletions docs/dialect/index.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Operators
=========
The GIQL Dialect
================

GIQL extends SQL with operators specifically designed for genomic interval queries.
These operators enable powerful spatial reasoning over genomic coordinates without
Expand All @@ -9,6 +9,29 @@ Operators are organized by functionality. All operators work across supported
database backends (DuckDB, SQLite, with PostgreSQL planned). Each operator page
includes a compatibility table showing backend support status.

Logical genomic range columns
-----------------------------

GIQL allows you to reference *logical* genomic range columns in queries. Such logical
columns do not need to exist explicitly or materially in any of your data sources,
and no specialized composite data types are needed. Rather, a logical genomic range
column or "pseudo-column" can be mapped to physical columns in your source table that
contain the required information and use conventional data types (reference sequence name,
start and end coordinates, optional strand, etc.).

In GIQL queries, you reference a logical genomic range column using a designated name
like ``interval``:

.. code-block:: sql

SELECT * FROM variants WHERE interval INTERSECTS 'chr1:1000-2000'

By providing a :doc:`schema mapping <../transpilation/schema-mapping>` for the genomic
range columns of each of the tables in a GIQL query, the GIQL transpiler can translate
range operations into standard SQL expressions to be consumed by a general-purpose
query engine. Alternatively, a GIQL-aware query engine could use the schema mapping
directly for optimization.

Spatial Relationship Operators
------------------------------

Expand Down Expand Up @@ -97,11 +120,11 @@ Apply operators to multiple ranges simultaneously.
See :doc:`quantifiers` for detailed documentation.


.. toctree::
:maxdepth: 2
:hidden:
.. .. toctree::
.. :maxdepth: 2
.. :hidden:

spatial-operators
distance-operators
aggregation-operators
quantifiers
.. spatial-operators
.. distance-operators
.. aggregation-operators
.. quantifiers
53 changes: 4 additions & 49 deletions docs/dialect/quantifiers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,6 @@ Set quantifiers extend spatial operators to work with multiple ranges simultaneo
They allow you to test whether a genomic position matches any or all of a set of
specified ranges in a single query.

.. contents::
:local:
:depth: 1

.. _any-quantifier:

ANY
Expand Down Expand Up @@ -112,32 +108,11 @@ Query across different chromosomes efficiently:
'chrX:100000-200000'
)

Backend Compatibility
~~~~~~~~~~~~~~~~~~~~~

.. list-table::
:header-rows: 1
:widths: 20 20 60

* - Backend
- Support
- Notes
* - DuckDB
- Full
-
* - SQLite
- Full
-
* - PostgreSQL
- Planned
-

Performance Notes
~~~~~~~~~~~~~~~~~
Notes
~~~~~

- ``ANY`` expands to multiple OR conditions in the generated SQL
- For very large sets of ranges, consider using a separate table and JOIN instead
- The optimizer may benefit from indexes on chromosome and position columns

Related
~~~~~~~
Expand Down Expand Up @@ -238,28 +213,8 @@ features in the intersection of multiple regions):
-- This finds features that overlap BOTH ranges
-- (i.e., features in the intersection: chr1:1500-2000)

Backend Compatibility
~~~~~~~~~~~~~~~~~~~~~

.. list-table::
:header-rows: 1
:widths: 20 20 60

* - Backend
- Support
- Notes
* - DuckDB
- Full
-
* - SQLite
- Full
-
* - PostgreSQL
- Planned
-

Performance Notes
~~~~~~~~~~~~~~~~~
Notes
~~~~~

- ``ALL`` expands to multiple AND conditions in the generated SQL
- Queries with ``ALL`` may be more restrictive, potentially reducing result sets
Expand Down
Loading