Skip to content

Added data quality alerts#995

Open
ghanse wants to merge 3 commits intomainfrom
dq_alerts
Open

Added data quality alerts#995
ghanse wants to merge 3 commits intomainfrom
dq_alerts

Conversation

@ghanse
Copy link
Collaborator

@ghanse ghanse commented Jan 13, 2026

Changes

This PR introduces data quality alerts for notifying end-users when DQMetrics exceed configured thresholds.

To-Do

  • Create an AlertConfig to store alert configuration
  • Create methods to store alert metadata in Delta or Lakebase tables
  • Create streaming and batch handlers for alerting to Slack, Microsoft Teams, and arbitrary REST API endpoints
  • Update the apply_checks_... methods to use the required alert handler
  • Add test fixtures and integration tests

Note

This PR also fixes a minor compatibility issue for has_valid_schema. Schemas are now parsed using the backwards-compatible _parse_datatype_string method instead of fromDDL.

Linked issues

Resolves #204

Tests

  • manually tested
  • added unit tests
  • added integration tests
  • added end-to-end tests
  • added performance tests

@github-actions
Copy link

github-actions bot commented Jan 13, 2026

❌ 549/550 passed, 10 flaky, 1 failed, 34 skipped, 9h20m7s total

❌ TestRouterIntegration.test_config_endpoints_with_custom_path_parameter: ValueError: TEST_SCHEMA auth: metadata-service: Metadata Service returned empty token. Config: host=https://DATABRICKS_HOST, CLOUD_ENV_client_id=3fe685a1-96cc-4fec-8cdb-6944f5c9787e, CLOUD_ENV_tenant_id=9f37a392-f0ae-4280-9796-f1864a10effc, auth_type=metadata-service, cluster_id=DATABRICKS_DQX_CLUSTER_ID, warehouse_id=TEST_DEFAULT_WAREHOUSE_ID, metadata_service_url=***. Env: DATABRICKS_HOST, ARM_CLIENT_ID, ARM_TENANT_ID, DATABRICKS_AUTH_TYPE, DATABRICKS_CLUSTER_ID, DATABRICKS_WAREHOUSE_ID, DATABRICKS_METADATA_SERVICE_URL (10.005s)
ValueError: TEST_SCHEMA auth: metadata-service: Metadata Service returned empty token. Config: host=https://DATABRICKS_HOST, CLOUD_ENV_client_id=3fe685a1-96cc-4fec-8cdb-6944f5c9787e, CLOUD_ENV_tenant_id=9f37a392-f0ae-4280-9796-f1864a10effc, auth_type=metadata-service, cluster_id=DATABRICKS_DQX_CLUSTER_ID, warehouse_id=TEST_DEFAULT_WAREHOUSE_ID, metadata_service_url=***. Env: DATABRICKS_HOST, ARM_CLIENT_ID, ARM_TENANT_ID, DATABRICKS_AUTH_TYPE, DATABRICKS_CLUSTER_ID, DATABRICKS_WAREHOUSE_ID, DATABRICKS_METADATA_SERVICE_URL
23:10 INFO [tests.conftest] Overriding DATABRICKS_CLUSTER_ID with DATABRICKS_DQX_CLUSTER_ID: DATABRICKS_DQX_CLUSTER_ID
23:10 INFO [tests.conftest] Overriding DATABRICKS_CLUSTER_ID with DATABRICKS_DQX_CLUSTER_ID: DATABRICKS_DQX_CLUSTER_ID
[gw1] linux -- Python 3.12.12 /home/runner/work/dqx/dqx/.venv/bin/python

Flaky tests:

  • 🤪 test_apply_checks_with_is_unique (224ms)
  • 🤪 test_apply_checks_with_is_unique_nulls_not_distinct (223ms)
  • 🤪 test_apply_checks_all_row_checks_as_yaml_with_streaming (264ms)
  • 🤪 test_apply_checks_and_save_in_tables_with_ref_df (10.004s)
  • 🤪 test_apply_checks_all_row_geo_checks_as_yaml_with_streaming (10.004s)
  • 🤪 test_compare_datasets_check_missing_ref_df (10.004s)
  • 🤪 test_apply_checks_with_sql_query_without_merge_columns_negate (10.006s)
  • 🤪 test_compare_datasets_check_null_ref_df (10.004s)
  • 🤪 test_quality_checker (1m45.984s)
  • 🤪 test_e2e_workflow (1m8.386s)

Running from acceptance #4029

@codecov
Copy link

codecov bot commented Jan 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.82%. Comparing base (435459f) to head (3721077).
⚠️ Report is 2 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (435459f) and HEAD (3721077). Click for more details.

HEAD has 5 uploads less than BASE
Flag BASE (435459f) HEAD (3721077)
6 1
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #995       +/-   ##
===========================================
- Coverage   91.24%   48.82%   -42.43%     
===========================================
  Files          64       98       +34     
  Lines        6703     8945     +2242     
===========================================
- Hits         6116     4367     -1749     
- Misses        587     4578     +3991     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mwojtyczka mwojtyczka changed the title Add data quality alerts Added data quality alerts Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Add data quality alerts

2 participants