From d085ca7ae6bcc5ac5a0ae9d8fef480b3ca73d194 Mon Sep 17 00:00:00 2001 From: royendo <67675319+royendo@users.noreply.github.com> Date: Mon, 5 Jan 2026 09:41:06 -0500 Subject: [PATCH 1/6] adding alerts --- docs/docs/build/alerts/_category_.yml | 4 + docs/docs/build/alerts/alerts.md | 280 +++++++++++++++++++ docs/docs/build/models/data-quality-tests.md | 6 + 3 files changed, 290 insertions(+) create mode 100644 docs/docs/build/alerts/_category_.yml create mode 100644 docs/docs/build/alerts/alerts.md diff --git a/docs/docs/build/alerts/_category_.yml b/docs/docs/build/alerts/_category_.yml new file mode 100644 index 00000000000..b8832f544f5 --- /dev/null +++ b/docs/docs/build/alerts/_category_.yml @@ -0,0 +1,4 @@ +position: 44 +label: Alerts +collapsible: true +collapsed: true diff --git a/docs/docs/build/alerts/alerts.md b/docs/docs/build/alerts/alerts.md new file mode 100644 index 00000000000..91e7858c261 --- /dev/null +++ b/docs/docs/build/alerts/alerts.md @@ -0,0 +1,280 @@ +--- +title: Alerts +description: Define alerts as code for automated monitoring and notifications +sidebar_label: Alerts +sidebar_position: 0 +--- + +## Overview + +Alerts in Rill allow you to monitor your data and receive notifications when specific conditions are met. While alerts can be created through the UI, defining them as code in YAML files provides version control, reproducibility, and the ability to manage complex alerting logic programmatically. + +When you create an alert via a YAML file, it appears in the UI marked as `Created through code`. + +:::info For additional information +For the complete specification of all available properties, see the [Alert YAML Reference](/reference/project-files/alerts). +::: + +:::tip Using live connectors? + +If you're using [live connectors](/build/connectors/olap) (ClickHouse, Druid, Pinot, StarRocks, etc.), **alerts are your primary tool for data quality monitoring**. Since live connectors don't create local models, [data quality tests](/build/models/data-quality-tests) won't run. Use alerts instead to validate your data on a schedule. + +::: + +## Alert Structure + +An alert YAML file has the following core components: + +```yaml +type: alert +display_name: My Alert Name +description: A brief description of what this alert monitors + +# When to check the alert +refresh: + cron: "0 * * * *" # Every hour + +# What data to check +data: + sql: SELECT * FROM my_model WHERE condition_is_bad + +# Where to send notifications +notify: + email: + recipients: + - team@example.com +``` + +## Scheduling Alerts + +The [`refresh`](/reference/project-files/alerts#refresh) property defines when and how often the alert runs. + +### Cron Schedule + +Use standard `cron` expressions to define the schedule: + +```yaml +refresh: + cron: "0 * * * *" # Every hour + time_zone: "America/New_York" # Optional timezone +``` + +## Data Sources + +Alerts support multiple data source types to query your data. + +### SQL Query + +Execute raw SQL against your models: + +```yaml +data: + sql: | + SELECT * + FROM orders + WHERE created_at < NOW() - INTERVAL '24 hours' + AND status = 'pending' +``` + +The alert triggers when the query returns **any rows**. + +### Metrics SQL + +Query a metrics view directly: + +```yaml +data: + metrics_sql: | + SELECT * + FROM sales_metrics + WHERE total_revenue < 1000 +``` + +### Custom API + +Call a custom API defined in your project: + +```yaml +data: + api: my_custom_validation_api + args: + threshold: 100 + date_range: "7d" +``` + +### Resource Status + +Monitor the health of your Rill resources: + +```yaml +data: + resource_status: + where_error: true +``` + +This triggers when any resource in your project has a reconciliation error. + +## Notification Configuration + +### Email Notifications + +```yaml +notify: + email: + recipients: + - alice@example.com + - bob@example.com + - data-team@example.com +``` + +### Slack Notifications + +Before using Slack notifications, you must [configure the Slack integration](/build/connectors/data-source/slack) for your project. + +```yaml +notify: + slack: + channels: + - "#data-alerts" + - "#engineering" + users: + - "U1234567890" # Slack user IDs + webhooks: + - "https://hooks.slack.com/services/..." +``` + +### Combined Notifications + +Send to multiple destinations: + +```yaml +notify: + email: + recipients: + - team@example.com + slack: + channels: + - "#alerts" +``` + +## Alert Behavior + +### Recovery Notifications + +Get notified when an alert condition resolves: + +```yaml +on_recover: true # Notify when alert recovers +on_fail: true # Notify when alert triggers (default) +on_error: false # Notify on evaluation errors +``` + +### Re-notification (Snooze) + +Control how often you're notified for ongoing issues: + +```yaml +renotify: true +renotify_after: "24h" # Re-notify every 24 hours if still failing +``` + +## Working Examples + +### Data Freshness Alert + +Alert when data hasn't been updated in over 24 hours: + +```yaml +# alerts/data_freshness.yaml +type: alert +display_name: Data Freshness Check +description: Alert when event data is stale + +refresh: + cron: "0 * * * *" # Check every hour + +data: + sql: | + SELECT 'Data is stale' AS error_message + FROM ( + SELECT MAX(event_timestamp) AS latest_event + FROM events_model + ) + WHERE latest_event < NOW() - INTERVAL '24 hours' + +notify: + email: + recipients: + - data-ops@example.com + slack: + channels: + - "#data-alerts" + +on_recover: true +renotify: true +renotify_after: "6h" +``` + +### Project Health Monitor + +Alert on any resource errors in your project: + +```yaml +# alerts/project_health.yaml +type: alert +display_name: Project Health Monitor +description: Alert when any resource has a reconciliation error + +refresh: + cron: "*/10 * * * *" # Every 10 minutes + +data: + resource_status: + where_error: true + +notify: + slack: + channels: + - "#rill-alerts" + email: + recipients: + - platform-team@example.com + +on_recover: true +``` + +### Interval-Based Monitoring + +Check data across time intervals: + +```yaml +# alerts/hourly_metrics.yaml +type: alert +display_name: Hourly Metrics Check +description: Validate metrics for each hour + +refresh: + cron: "5 * * * *" # 5 minutes past each hour + +intervals: + duration: PT1H # 1 hour intervals + limit: 24 # Check last 24 intervals + check_unclosed: false + +data: + sql: | + SELECT * + FROM hourly_aggregates + WHERE hour_start = DATE_TRUNC('hour', NOW() - INTERVAL '1 hour') + AND event_count = 0 + +notify: + slack: + channels: + - "#monitoring" +``` + +## Reference + +For the complete specification of all available properties, see the [Alert YAML Reference](/reference/project-files/alerts). + diff --git a/docs/docs/build/models/data-quality-tests.md b/docs/docs/build/models/data-quality-tests.md index a4f058dc496..7fec2fee98a 100644 --- a/docs/docs/build/models/data-quality-tests.md +++ b/docs/docs/build/models/data-quality-tests.md @@ -12,6 +12,12 @@ Tests are defined in your model's YAML file using the `tests:` property. Each te ## When to Use Data Quality Tests +:::tip Using live connectors? Use alerts instead + +Data quality tests run when models refresh, which means they only work with models that Rill manages. If you're using [live connectors](/build/connectors/olap) (ClickHouse, Druid, Pinot, StarRocks, etc.) where data lives in external systems, use [alerts](/build/alerts) to monitor data quality on a schedule instead. + +::: + Data quality tests are useful for: - **Data Quality Checks** - Verify that your data meets business rules and constraints From e5736e5ca0dfeaf85d2fdbf0bec9e311ea81c739 Mon Sep 17 00:00:00 2001 From: royendo <67675319+royendo@users.noreply.github.com> Date: Mon, 5 Jan 2026 09:42:01 -0500 Subject: [PATCH 2/6] Update alerts.md --- docs/docs/build/alerts/alerts.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/docs/build/alerts/alerts.md b/docs/docs/build/alerts/alerts.md index 91e7858c261..bbae15a7521 100644 --- a/docs/docs/build/alerts/alerts.md +++ b/docs/docs/build/alerts/alerts.md @@ -1,7 +1,7 @@ --- title: Alerts description: Define alerts as code for automated monitoring and notifications -sidebar_label: Alerts +sidebar_label: Code Alerts sidebar_position: 0 --- From ed8c5d69cb05471f9fd91e208003456726889c25 Mon Sep 17 00:00:00 2001 From: royendo <67675319+royendo@users.noreply.github.com> Date: Mon, 5 Jan 2026 14:05:45 -0500 Subject: [PATCH 3/6] Update alerts.md --- docs/docs/build/alerts/alerts.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/docs/docs/build/alerts/alerts.md b/docs/docs/build/alerts/alerts.md index bbae15a7521..8473241ad4e 100644 --- a/docs/docs/build/alerts/alerts.md +++ b/docs/docs/build/alerts/alerts.md @@ -11,10 +11,6 @@ Alerts in Rill allow you to monitor your data and receive notifications when spe When you create an alert via a YAML file, it appears in the UI marked as `Created through code`. -:::info For additional information -For the complete specification of all available properties, see the [Alert YAML Reference](/reference/project-files/alerts). -::: - :::tip Using live connectors? If you're using [live connectors](/build/connectors/olap) (ClickHouse, Druid, Pinot, StarRocks, etc.), **alerts are your primary tool for data quality monitoring**. Since live connectors don't create local models, [data quality tests](/build/models/data-quality-tests) won't run. Use alerts instead to validate your data on a schedule. From 7fe424da2ce50252259dd667b50b1e9aa3c39d14 Mon Sep 17 00:00:00 2001 From: royendo <67675319+royendo@users.noreply.github.com> Date: Wed, 7 Jan 2026 15:56:24 -0500 Subject: [PATCH 4/6] feedback loop --- docs/docs/build/alerts/alerts.md | 42 ++++++++++++++++----- docs/docs/reference/project-files/alerts.md | 2 +- runtime/parser/schema/project.schema.yaml | 2 +- 3 files changed, 34 insertions(+), 12 deletions(-) diff --git a/docs/docs/build/alerts/alerts.md b/docs/docs/build/alerts/alerts.md index 8473241ad4e..468f231d4fb 100644 --- a/docs/docs/build/alerts/alerts.md +++ b/docs/docs/build/alerts/alerts.md @@ -1,7 +1,7 @@ --- title: Alerts description: Define alerts as code for automated monitoring and notifications -sidebar_label: Code Alerts +sidebar_label: Alerts sidebar_position: 0 --- @@ -55,6 +55,20 @@ refresh: time_zone: "America/New_York" # Optional timezone ``` +### Interval-Based Monitoring + +Use `intervals` when you need to check data across multiple time windows, such as validating metrics for each hour or day. This is useful for time-series monitoring where you want to ensure data quality across a rolling window of time periods. Interval-based monitoring is more flexible than simple cron schedules when you need to check multiple historical periods on each evaluation. + +```yaml +refresh: + cron: "5 * * * *" # 5 minutes past each hour + +intervals: + duration: PT1H # 1 hour intervals + limit: 24 # Check last 24 intervals + check_unclosed: false +``` + ## Data Sources Alerts support multiple data source types to query your data. @@ -76,7 +90,7 @@ The alert triggers when the query returns **any rows**. ### Metrics SQL -Query a metrics view directly: +Use `metrics_sql` when you want to query a [metrics view](/build/metrics-view) using its defined dimensions and measures, rather than writing raw SQL against the underlying model. This approach leverages the metrics view's security policies and allows you to reference measures and dimensions by name. For details on the `metrics_sql` syntax, see [Custom APIs](/build/custom-apis/custom-apis#metrics-sql-api). ```yaml data: @@ -88,7 +102,7 @@ data: ### Custom API -Call a custom API defined in your project: +Use a custom API when you want to reuse complex query logic that's already defined as a [Custom API](/build/custom-apis/custom-apis) in your project. This approach is useful for sharing validation logic between alerts and other integrations, or when you need to pass dynamic arguments to your alert queries. ```yaml data: @@ -100,7 +114,7 @@ data: ### Resource Status -Monitor the health of your Rill resources: +Monitor the health of your Rill resources to catch pipeline failures and reconciliation errors. This is useful for monitoring pipeline health and catching reconciliation failures before they impact downstream processes. ```yaml data: @@ -112,6 +126,8 @@ This triggers when any resource in your project has a reconciliation error. ## Notification Configuration +Configure where and how you receive notifications when alerts trigger. You can send notifications via email, Slack, or both. Notifications are sent when the alert condition is met (when the data query returns rows), and optionally when the alert recovers or encounters evaluation errors. + ### Email Notifications ```yaml @@ -157,7 +173,7 @@ notify: ### Recovery Notifications -Get notified when an alert condition resolves: +Control when you receive notifications about alert state changes. Use `on_recover` to confirm issues are resolved and get peace of mind that problems have been fixed. Use `on_error` to catch alert evaluation failures (e.g., query syntax errors) that prevent the alert from running properly. ```yaml on_recover: true # Notify when alert recovers @@ -167,7 +183,7 @@ on_error: false # Notify on evaluation errors ### Re-notification (Snooze) -Control how often you're notified for ongoing issues: +Control how often you're notified for ongoing issues. This prevents alert fatigue while ensuring ongoing issues aren't forgotten. Instead of receiving notifications on every evaluation cycle, you'll only be re-notified after the specified duration if the alert is still failing. ```yaml renotify: true @@ -178,7 +194,7 @@ renotify_after: "24h" # Re-notify every 24 hours if still failing ### Data Freshness Alert -Alert when data hasn't been updated in over 24 hours: +This example demonstrates a data freshness check that queries the maximum timestamp from an events model and triggers when data is older than 24 hours. It uses both email and Slack notifications, includes recovery notifications to confirm when data freshness is restored, and implements re-notification every 6 hours to prevent alert fatigue while ensuring ongoing issues are tracked. ```yaml # alerts/data_freshness.yaml @@ -213,7 +229,7 @@ renotify_after: "6h" ### Project Health Monitor -Alert on any resource errors in your project: +This example monitors the overall health of your Rill project by checking for any resource reconciliation errors. It runs every 10 minutes for rapid detection of pipeline failures, uses the `resource_status` data source to automatically detect errors across all resources, and sends notifications to both Slack and email channels. Recovery notifications ensure you're alerted when issues are resolved. ```yaml # alerts/project_health.yaml @@ -239,9 +255,9 @@ notify: on_recover: true ``` -### Interval-Based Monitoring +### Interval-Based Monitoring Example -Check data across time intervals: +This example shows how to use interval-based monitoring to validate metrics across multiple time periods. It checks hourly aggregates for the last 24 hours, looking for any hours with zero event counts. The alert runs 5 minutes past each hour to ensure the previous hour's data is complete, and uses the `intervals` configuration to systematically check each hour in the rolling window. This pattern is ideal for time-series data quality monitoring where you need to validate multiple periods on each evaluation. ```yaml # alerts/hourly_metrics.yaml @@ -274,3 +290,9 @@ notify: For the complete specification of all available properties, see the [Alert YAML Reference](/reference/project-files/alerts). +:::note Advanced Properties + +For advanced properties like `glob`, `for`, `watermark`, and `timeout`, see the [Alert YAML Reference](/reference/project-files/alerts). + +::: + diff --git a/docs/docs/reference/project-files/alerts.md b/docs/docs/reference/project-files/alerts.md index d356427e5e1..30d7aa4ad6f 100644 --- a/docs/docs/reference/project-files/alerts.md +++ b/docs/docs/reference/project-files/alerts.md @@ -142,7 +142,7 @@ _[object]_ - Notification configuration _(required)_ - **`users`** - _[array of string]_ - An array of Slack user IDs to notify. - - **`channels`** - _[array of string]_ - An array of Slack channel IDs to notify. + - **`channels`** - _[array of string]_ - An array of Slack channel names to notify. - **`webhooks`** - _[array of string]_ - An array of Slack webhook URLs to send notifications to. diff --git a/runtime/parser/schema/project.schema.yaml b/runtime/parser/schema/project.schema.yaml index 8051d0770a5..08a33a5edff 100644 --- a/runtime/parser/schema/project.schema.yaml +++ b/runtime/parser/schema/project.schema.yaml @@ -3260,7 +3260,7 @@ definitions: minItems: 1 channels: type: array - description: An array of Slack channel IDs to notify. + description: An array of Slack channel names to notify. items: type: string minItems: 1 From f0102a278384722d0cd69e4ed75cd07f1975f408 Mon Sep 17 00:00:00 2001 From: royendo <67675319+royendo@users.noreply.github.com> Date: Wed, 7 Jan 2026 15:57:44 -0500 Subject: [PATCH 5/6] links --- docs/docs/build/alerts/alerts.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/docs/build/alerts/alerts.md b/docs/docs/build/alerts/alerts.md index 468f231d4fb..272aad638a8 100644 --- a/docs/docs/build/alerts/alerts.md +++ b/docs/docs/build/alerts/alerts.md @@ -90,7 +90,7 @@ The alert triggers when the query returns **any rows**. ### Metrics SQL -Use `metrics_sql` when you want to query a [metrics view](/build/metrics-view) using its defined dimensions and measures, rather than writing raw SQL against the underlying model. This approach leverages the metrics view's security policies and allows you to reference measures and dimensions by name. For details on the `metrics_sql` syntax, see [Custom APIs](/build/custom-apis/custom-apis#metrics-sql-api). +Use `metrics_sql` when you want to query a [metrics view](/build/metrics-view) using its defined dimensions and measures, rather than writing raw SQL against the underlying model. This approach leverages the metrics view's security policies and allows you to reference measures and dimensions by name. For details on the `metrics_sql` syntax, see [Custom APIs](/build/custom-apis/metrics-sql-api). ```yaml data: @@ -102,7 +102,7 @@ data: ### Custom API -Use a custom API when you want to reuse complex query logic that's already defined as a [Custom API](/build/custom-apis/custom-apis) in your project. This approach is useful for sharing validation logic between alerts and other integrations, or when you need to pass dynamic arguments to your alert queries. +Use a custom API when you want to reuse complex query logic that's already defined as a [Custom API](/build/custom-apis) in your project. This approach is useful for sharing validation logic between alerts and other integrations, or when you need to pass dynamic arguments to your alert queries. ```yaml data: From e4559416545dc60489baedb37be2f89deb6b0ab7 Mon Sep 17 00:00:00 2001 From: royendo <67675319+royendo@users.noreply.github.com> Date: Wed, 7 Jan 2026 15:58:21 -0500 Subject: [PATCH 6/6] Update alerts.md --- docs/docs/build/alerts/alerts.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/docs/build/alerts/alerts.md b/docs/docs/build/alerts/alerts.md index 272aad638a8..ebac3ede03a 100644 --- a/docs/docs/build/alerts/alerts.md +++ b/docs/docs/build/alerts/alerts.md @@ -90,7 +90,7 @@ The alert triggers when the query returns **any rows**. ### Metrics SQL -Use `metrics_sql` when you want to query a [metrics view](/build/metrics-view) using its defined dimensions and measures, rather than writing raw SQL against the underlying model. This approach leverages the metrics view's security policies and allows you to reference measures and dimensions by name. For details on the `metrics_sql` syntax, see [Custom APIs](/build/custom-apis/metrics-sql-api). +Use `metrics_sql` when you want to query a [metrics view](/build/metrics-view) using its defined dimensions and measures, rather than writing raw SQL against the underlying model. This approach leverages the metrics view's security policies and allows you to reference measures and dimensions by name. For details on the `metrics_sql` syntax, see [Custom APIs](/build/custom-apis#metrics-sql-api). ```yaml data: