Analyze analytics SQL to extract referenced tables/columns and time bounds.
BigQuery and Redshift are currently supported.
Base install:
pip install analytics-query-analyzerBigQuery extras (only needed for build_schema):
pip install analytics-query-analyzer[bigquery]Redshift extras (only needed for build_schema):
pip install analytics-query-analyzer[redshift]Extract table/column references from a query.
Schema format follows sqlglot conventions, with nested fields represented as nested dicts.
from analytics_query_analyzer import analyze
from sqlglot import dialects
schema = {
"production": {
"shop": {
"orders": {
"id": "int64",
"ordered_at": "datetime",
"user_id": "int64",
"payment": {
"amount": "int64",
"method": "string",
},
},
},
},
}
sql = """
select
id,
user_id,
payment.amount
from
shop.orders
"""
references = analyze(dialects.BigQuery, sql, schema, "production")
print(references)
# [
# {
# "database": "production",
# "schema": "shop",
# "table": "orders",
# "column": "id",
# },
# {
# "database": "production",
# "schema": "shop",
# "table": "orders",
# "column": "payment.amount",
# },
# {
# "database": "production",
# "schema": "shop",
# "table": "orders",
# "column": "user_id",
# },
# ]Extract time bounds from filters by enabling with_timebounds.
from analytics_query_analyzer import analyze
from sqlglot import dialects
schema = {
"production": {
"shop": {
"orders": {
"id": "int64",
"ordered_at": "datetime",
"user_id": "int64",
},
},
},
}
sql = """
select
*
from
shop.orders
where
ordered_at >= "2025-01-01"
and ordered_at < "2026-01-01"
"""
timebounds = analyze(
dialects.BigQuery,
sql,
schema,
"production",
with_timebounds=True,
)
print(timebounds)
# [
# {
# "database": "production",
# "schema": "shop",
# "table": "orders",
# "column": "id",
# "lower": None,
# "upper": None,
# },
# {
# "database": "production",
# "schema": "shop",
# "table": "orders",
# "column": "ordered_at",
# "lower": "2025-01-01",
# "upper": "2026-01-01",
# },
# {
# "database": "production",
# "schema": "shop",
# "table": "orders",
# "column": "user_id",
# "lower": None,
# "upper": None,
# },
# ]To make current_date() deterministic, pass a provider:
timebounds = analyze(
dialects.BigQuery,
"select * from shop.orders where ordered_at >= current_date()",
schema,
"production",
with_timebounds=True,
current_date_provider=lambda: "2026-01-01",
)Fetch a schema dictionary from BigQuery or Redshift.
from analytics_query_analyzer import build_schema
from sqlglot import dialects
schema = build_schema(dialects.BigQuery, "my_project", "my_schema", "my_table")
print(schema)- Authentication uses Application Default Credentials (ADC).
- When
tableis omitted, it scans all tables in the dataset. - When both
datasetandtableare omitted, it scans all datasets in the project. - The returned
schemacan be passed directly toanalyze.
Fetching from Redshift is also supported:
from analytics_query_analyzer import build_schema
from sqlglot import dialects
schema = build_schema(dialects.Redshift, "my_database", "public", "orders")
print(schema)Redshift authentication supports two modes:
- Set
REDSHIFT_HOST,REDSHIFT_USER, andREDSHIFT_PASSWORD(optionalREDSHIFT_PORT). - Set
REDSHIFT_CLUSTER_IDENTIFIER,REDSHIFT_REGION, andREDSHIFT_DB_USERto use IAM (you can also setREDSHIFT_HOST/REDSHIFT_PORT). - Use
AWS_PROFILEif you want to select a non-default AWS profile.