Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 48 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,22 @@
# GIQL - Genomic Interval Query Language

A SQL dialect for genomic range queries with multi-database support.
A SQL dialect for genomic range queries. Transpiles to standard SQL.


## Overview

GIQL extends SQL with spatial operators for genomic interval queries. It transpiles to standard SQL that works across multiple database backends including DuckDB and SQLite.
GIQL extends SQL with spatial operators for genomic interval queries. It transpiles GIQL queries into standard SQL that can be executed on any database backend.

GIQL provides a familiar SQL syntax for bioinformatics workflows, allowing you to express complex genomic range operations without writing intricate SQL expressions. Whether you're filtering variants by genomic region, finding overlapping features, or calculating distances between intervals, GIQL makes these operations intuitive and portable across databases.
GIQL provides a familiar SQL syntax for bioinformatics workflows, allowing you to express complex genomic range operations without writing intricate SQL expressions. Whether you're filtering variants by genomic region, finding overlapping features, or calculating distances between intervals, GIQL makes these operations intuitive and portable.

## Features

- **SQL-based**: Familiar SQL syntax with genomic extensions
- **Multi-backend**: Works with DuckDB, SQLite, and more
- **Spatial operators**: INTERSECTS, CONTAINS, WITHIN for range relationships
- **Distance operators**: DISTANCE, NEAREST for proximity queries
- **Aggregation operators**: CLUSTER, MERGE for combining intervals
- **Set quantifiers**: ANY, ALL for multi-range queries
- **Transpilation**: Convert GIQL to standard SQL for debugging or external use
- **Transpilation**: Converts GIQL to standard SQL for execution on any backend

## Installation

Expand Down Expand Up @@ -72,39 +71,50 @@ make html
## Quick Start

```python
from giql import GIQLEngine

# Create engine with DuckDB backend
with GIQLEngine(target_dialect="duckdb") as engine:
# Load genomic data
engine.load_csv("variants", "variants.csv")
engine.register_table_schema(
"variants",
{
"id": "INTEGER",
"chromosome": "VARCHAR",
"start_pos": "BIGINT",
"end_pos": "BIGINT",
},
genomic_column="interval",
)

# Query with genomic operators (returns cursor for streaming)
cursor = engine.execute("""
SELECT * FROM variants
WHERE interval INTERSECTS 'chr1:1000-2000'
""")

# Process results lazily
for row in cursor:
print(row)

# Or just transpile to SQL without executing
sql = engine.transpile("""
SELECT * FROM variants
WHERE interval INTERSECTS 'chr1:1000-2000'
""")
print(sql) # See the generated SQL
from giql import transpile

# Transpile a GIQL query to standard SQL
sql = transpile(
"SELECT * FROM peaks WHERE interval INTERSECTS 'chr1:1000-2000'",
tables=["peaks"],
)
print(sql)
```

With custom column mappings:

```python
from giql import Table, transpile

sql = transpile(
"SELECT * FROM variants WHERE position INTERSECTS 'chr1:1000-2000'",
tables=[
Table(
"variants",
genomic_col="position",
chrom_col="chromosome",
start_col="start_pos",
end_col="end_pos",
)
],
)
```

Execution example with DuckDB:

```python
import duckdb
import oxbow as ox
from giql import transpile

conn = duckdb.connect()
peaks = ox.from_bed("peaks.bed", bed_schema="bed6+4").to_duckdb(conn) # streaming source

sql = transpile(
"SELECT * FROM peaks WHERE interval INTERSECTS 'chr1:1000-2000'",
tables=["peaks"],
)
df = con.execute(sql).fetchdf()
```

## Operators at a Glance
Expand Down
31 changes: 8 additions & 23 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,7 @@ authors = [
{ name = "Conrad Bzura", email = "conradbzura@gmail.com" },
]
dependencies = [
"click>=8.3.0",
"duckdb>=1.4.0",
"oxbow>=0.4.0",
"pandas>=2.0.0",
"psycopg2-binary>=2.9.10",
"sqlglot>=20.0.0",
"sqlparse>=0.4.0",
]
description = "Genomic Interval Query Language - SQL dialect for genomic range queries"
dynamic = ["version"]
Expand All @@ -33,20 +27,16 @@ name = "giql"
readme = "README.md"
requires-python = ">=3.11"

[project.scripts]
giql = "giql.cli:cli"

[project.optional-dependencies]
all = [
"duckdb>=0.9.0",
"mysql-connector-python>=8.0.0",
"psycopg2-binary>=2.9.0",
dev = [
"duckdb>=1.4.0",
"hypothesis>=6.0.0",
"pandas>=2.0.0",
"pybedtools>=0.9.0",
"pytest-cov>=4.0.0",
"pytest>=7.0.0",
"ruff>=0.1.0",
]
dev = ["pytest-cov>=4.0.0", "pytest>=7.0.0", "ruff>=0.1.0", "hypothesis", "pybedtools"]
duckdb = ["duckdb>=0.9.0"]
mysql = ["mysql-connector-python>=8.0.0"]
postgres = ["psycopg2-binary>=2.9.0"]
sqlite = []

[tool.hatch.metadata.hooks.custom]
path = "build-hooks/metadata.py"
Expand Down Expand Up @@ -79,13 +69,8 @@ bedtools = ">=2.31.0"
pybedtools = ">=0.9.0"
pytest = ">=7.0.0"
pytest-cov = ">=4.0.0"
click = ">=8.3.0"
duckdb = ">=1.4.0"
pandas = ">=2.0.0"
pyarrow = ">=19.0.0"
psycopg2-binary = ">=2.9.10"
sqlglot = ">=20.0.0"
pip = "*"
oxbow = ">=0.4.0"
sqlparse = ">=0.4.0"
hypothesis = ">=6.148.2,<7"
15 changes: 9 additions & 6 deletions src/giql/__init__.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,22 @@
"""GIQL - Genomic Interval Query Language.

A SQL dialect for genomic range queries with multi-database support.
A SQL dialect for genomic range queries.

This package provides:
- GIQL dialect extending SQL with spatial operators
- Query engine supporting multiple backends (DuckDB, SQLite)
- GIQL dialect extending SQL with spatial operators (INTERSECTS, CONTAINS, WITHIN)
- CLUSTER and MERGE operations for interval grouping
- NEAREST operator for finding closest intervals
- Range parser for genomic coordinate strings
- Schema management for genomic data
- Transpilation to standard SQL-92 compatible output
"""

from giql.engine import GIQLEngine as GIQLEngine
from giql.table import Table
from giql.transpile import transpile

__version__ = "0.1.0"


__all__ = [
"GIQLEngine",
"Table",
"transpile",
]
Loading