Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 38 additions & 102 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,87 +1,42 @@
# GIQL - Genomic Interval Query Language

A SQL dialect for genomic range queries. Transpiles to standard SQL.
<samp>
<p align="center">
<a href="https://giql.readthedocs.io/">docs</a> |
<a href="https://giql.readthedocs.io/dialect">syntax</a> |
<a href="https://giql.readthedocs.io/transpilation">transpiler</a>
</p>
</samp>

GIQL is an extended SQL dialect that allows you to declaratively express genomic interval operations.

## Overview
The `giql` Python package transpiles GIQL queries into standard SQL syntax for execution on any database or analytics engine.

GIQL extends SQL with spatial operators for genomic interval queries. It transpiles GIQL queries into standard SQL that can be executed on any database backend.

GIQL provides a familiar SQL syntax for bioinformatics workflows, allowing you to express complex genomic range operations without writing intricate SQL expressions. Whether you're filtering variants by genomic region, finding overlapping features, or calculating distances between intervals, GIQL makes these operations intuitive and portable.

## Features

- **SQL-based**: Familiar SQL syntax with genomic extensions
- **Spatial operators**: INTERSECTS, CONTAINS, WITHIN for range relationships
- **Distance operators**: DISTANCE, NEAREST for proximity queries
- **Aggregation operators**: CLUSTER, MERGE for combining intervals
- **Set quantifiers**: ANY, ALL for multi-range queries
- **Transpilation**: Converts GIQL to standard SQL for execution on any backend
> **Note:** This project is in active development — APIs, syntax, and behavior may change.

## Installation

### From PyPI

Install the latest stable release:
To install the transpiler:

```bash
pip install giql
```

Or the latest release candidate:

```bash
pip install --pre giql
```

### From Source

Clone the repository and install locally:

```bash
# Clone the repository
git clone https://github.com/abdenlab/giql.git
cd giql

# Install in development mode
pip install -e .

# Or with development dependencies
pip install -e ".[dev]"
```

### Building Documentation

To build the documentation locally:

```bash
cd docs

# Install documentation dependencies
pip install -r requirements.txt

# Build HTML documentation
make html

# View the documentation
# The built docs will be in docs/_build/html/
# Open docs/_build/html/index.html in your browser
```
## Usage (transpilation)

## Quick Start
The `giql` package transpiles GIQL queries to standard SQL.

```python
from giql import transpile

# Transpile a GIQL query to standard SQL
sql = transpile(
"SELECT * FROM peaks WHERE interval INTERSECTS 'chr1:1000-2000'",
tables=["peaks"],
)
print(sql)
```

With custom column mappings:
Each table referenced in a GIQL query exposes a genomic "pseudo-column" that maps to separate logical chromosome, start, end, and strand columns. You can customize the column mappings.

```python
from giql import Table, transpile
Expand All @@ -98,67 +53,48 @@ sql = transpile(
)
],
)
print(sql)
```

Execution example with DuckDB:
The transpiled SQL can be executed with fast genome-unaware databases or in-memory analytic engines like DuckDB.

You can also use [oxbow](https://oxbow.readthedocs.io) to efficiently stream specialized genomics formats into DuckDB.

```python
import duckdb
import oxbow as ox
from giql import transpile

conn = duckdb.connect()
peaks = ox.from_bed("peaks.bed", bed_schema="bed6+4").to_duckdb(conn) # streaming source

# Load a streaming data source as a DuckDB relation
peaks = ox.from_bed("peaks.bed", bed_schema="bed6+4").to_duckdb(conn)

sql = transpile(
"SELECT * FROM peaks WHERE interval INTERSECTS 'chr1:1000-2000'",
tables=["peaks"],
)

# Execute and return the output as a dataframe
df = con.execute(sql).fetchdf()
```

## Operators at a Glance

### Spatial Relationships

| Operator | Description |
|----------|-------------|
| `INTERSECTS` | Returns true when ranges overlap by at least one base pair |
| `CONTAINS` | Returns true when one range fully contains another |
| `WITHIN` | Returns true when one range is fully within another |

### Distance and Proximity

| Operator | Description |
|----------|-------------|
| `DISTANCE` | Calculate genomic distance between two intervals |
| `NEAREST` | Find k-nearest genomic features |

### Aggregation

| Operator | Description |
|----------|-------------|
| `CLUSTER` | Assign cluster IDs to overlapping intervals |
| `MERGE` | Combine overlapping intervals into unified regions |

### Set Quantifiers

| Quantifier | Description |
|------------|-------------|
| `ANY` | Match if condition holds for any of the specified ranges |
| `ALL` | Match if condition holds for all of the specified ranges |

## Documentation

For complete documentation, build the docs locally (see above) or visit the hosted documentation.
## Development

The documentation includes:
```bash
git clone https://github.com/abdenlab/giql.git
cd giql
uv sync
```

- **Operator Reference**: Detailed documentation for each operator with examples
- **Recipes**: Common query patterns for intersections, distance calculations, and clustering
- **Bedtools Migration Guide**: How to replicate bedtools operations with GIQL
- **Guides**: Performance optimization, multi-backend configuration, and schema mapping
To build the documentation locally:

## Development
```bash
uv run --group docs sphinx-build docs docs/_build
# The built docs will be in docs/_build/html/
```

This project is in active development.
For serve the docs locally with automatic rebuild:
```bash
uv run --group docs sphinx-autobuild docs docs/_build
```
12 changes: 0 additions & 12 deletions docs/api/index.rst

This file was deleted.

3 changes: 2 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
"sphinx.ext.viewcode",
"sphinx.ext.intersphinx",
"sphinx.ext.autosummary",
"sphinx_design",
]

# Napoleon settings
Expand Down Expand Up @@ -69,5 +70,5 @@
# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = "sphinx_rtd_theme"
html_theme = "sphinx_book_theme"
# html_static_path = ['_static'] # Uncomment when you have custom static files
Loading