Skip to content

feat: add Microsoft Fabric DWH integration (ELE-5282)#962

Open
devin-ai-integration[bot] wants to merge 25 commits intomasterfrom
devin/ELE-5282-1772640713
Open

feat: add Microsoft Fabric DWH integration (ELE-5282)#962
devin-ai-integration[bot] wants to merge 25 commits intomasterfrom
devin/ELE-5282-1772640713

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Mar 4, 2026

feat: add Microsoft Fabric DWH integration (ELE-5282)

Summary

Adds Microsoft Fabric Data Warehouse as a supported adapter in the elementary dbt package. Fabric uses T-SQL syntax via the dbt-fabric adapter maintained by Microsoft. Since dbt-sqlserver declares dependencies=["fabric"], all fabric__ macros are auto-dispatched for both adapters (sqlserver__fabric__default__).

Macro overrides added (~25 fabric__ macros):

  • Date/time functions: datediff, date_trunc, time_trunc, to_char, timeadd, dateadd, current_timestamp
  • Data types: type_boolbit, type_stringvarchar(4096), type_timestampdatetime2, data_type_list, get_normalized_data_type, try_cast_column_to_timestamp
  • Table operations: complete_buckets_cte (recursive CTE with option(maxrecursion 10000)), escape_special_chars, create_temp_table (view-based CTE workaround), edr_make_temp_relation (handles dict model nodes)
  • Cross-DB utils: target_database, incremental_strategymerge, generate_elementary_profile_args, day_of_week, hour_of_day, hour_of_week
  • Nested CTE workarounds: query_test_result_rows, get_failed_row_count_calc_query (use temp tables since T-SQL doesn't support nested CTEs)
  • Schema changes: get_columns_snapshot_query, get_columns_changes_query (unnest CTEs, ||dbt.concat(), TEXT normalization)
  • Test materialization: Force Fabric/sqlserver targets to use temp table path to avoid CTE-in-subquery errors
  • INSERT VALUES boolean rendering: render_value is now dispatched; fabric__render_value converts Python booleans to cast(1/0 as bit) instead of bare True/False (which T-SQL treats as column names)

Only 3 sqlserver__ macros exist (everything else auto-dispatches from fabric__):

  • sqlserver__target_database — needed by the CLI
  • sqlserver__create_temp_table — shares T-SQL CTE workaround with Fabric
  • sqlserver__generate_elementary_profile_args — different connection params (SQL auth vs Azure AD)

Infrastructure:

  • Docker Compose for SQL Server 2022 (local testing, target named sqlserver)
  • CI workflows updated (test-warehouse.yml, test-all-warehouses.yml)
  • ODBC Driver 18 installation in CI
  • MSSQL_SA_PASSWORD env var (fallback: Elementary123! for CI)
  • timeout-minutes: 60 on the test job to prevent hung CI runs (e.g. dead dbt-fusion workers)
  • Integration test SQL templates: LIMIT 1TOP 1 for T-SQL compatibility
  • Boolean literals: true/falseedr_boolean_literal() macro across alert/anomaly models
  • Positional GROUP BY → explicit column lists (T-SQL doesn't support GROUP BY 1,2,3)

Changes addressing PR review:

  • Refactored T-SQL checks to dispatched macros for better maintainability:
    • get_columns_snapshot_query: Replaced inline target.type checks with edr_boolean_literal()
    • get_columns_changes_query: Extracted to fabric__/default__get_column_changes_from_baseline_cur dispatch
    • column_monitoring_query: Extracted GROUP BY to fabric__/default__column_monitoring_group_by dispatch
    • insert_as_select: Converted to fabric__/default__insert_as_select dispatch
    • get_anomaly_query and get_anomaly_query_for_dimension_anomalies: Both now dispatched
    • schema_changes_query_group_by: New dispatched macro for GROUP BY in get_columns_changes_query_genericfabric__ variant uses explicit 6-column GROUP BY (source columns from all_column_changes) instead of positional 9-column reference
    • String concatenation: Replaced raw concat() with {{ dbt.concat([...]) }} for cross-DB compatibility (Redshift's concat() only accepts 2 args; dbt.concat() dispatches per adapter)
  • Renamed CI target fabricsqlserver: More accurate since we test against SQL Server Docker, not real Fabric. Kept all macros as fabric__ (shared T-SQL implementation).
  • Simplified DROP VIEW IF EXISTS in create_temp_table.sql (SQL Server 2016+ support).
  • Global GROUP BY refactoring: Replaced positional group by 1,2,3 with explicit column names in several models (metrics_anomaly_score.sql, monitors_runs.sql) and removed order by from dbt_artifacts_hashes.sql CTE for consistency.

Updates since last revision

  • Removed redundant sqlserver_dispatches.sql: A separate Devin session had added ~40 sqlserver__ wrapper macros that just delegated to fabric__ counterparts. Since dbt-sqlserver declares dependencies=["fabric"], the dispatch chain sqlserver__fabric__default__ handles everything automatically. Only the 3 macros with genuinely different implementations remain (listed above).
  • Added CI timeout protection: timeout-minutes: 60 added to test job to catch hung runs early (e.g. dead dbt-fusion workers) instead of waiting 6 hours for GitHub's default timeout.

Review & Testing Checklist for Human

Critical (MUST verify before merge):

  • Global GROUP BY changes affect ALL warehouses - metrics_anomaly_score.sql and monitors_runs.sql changed from positional to explicit GROUP BY across ALL warehouses (postgres, snowflake, bigquery, etc.), not just T-SQL. Also removed order by bucket_end desc from metrics_anomaly_score CTE and order by metadata_hash from dbt_artifacts_hashes. CI passes all targets, but verify these don't change result ordering or grouping behavior.
  • dbt.concat() replacement affects ALL warehouses - Replaced raw concat() with {{ dbt.concat([...]) }} in get_columns_changes_query_generic for cross-DB compatibility. CI passes all targets including Redshift (which was failing before), but verify string concatenation behavior is identical on postgres/snowflake/bigquery.
  • Widespread boolean literal changes - Replaced true/false with {{ elementary.edr_boolean_literal(...) }} in core anomaly detection models (anomaly_threshold_sensitivity.sql, metrics_anomaly_score.sql, model_run_results.sql) and alert models. CI passes but these affect critical detection logic.
  • fabric__schema_changes_query_group_by logic - Groups by 6 source columns (full_table_name, change, column_name, data_type, pre_data_type, detected_at) instead of 9 output columns. All 9 output columns are deterministic functions of these 6, but verify results are identical.

Recommended:

  • Test against real Fabric workspace - All testing is against SQL Server 2022 Docker. Fabric may have subtle T-SQL dialect differences (e.g., Azure Synapse Analytics features/limitations).
  • Verify render_value dispatch doesn't break other adapters - Made render_value dispatched. The default__ implementation is identical to original, but test postgres/snowflake/bigquery INSERT operations to confirm no regressions.

Test Plan

  1. CI validation: All 22 CI checks pass (postgres, snowflake, bigquery, redshift, athena, databricks, duckdb, clickhouse, trino, spark, dremio, sqlserver, fabric) ✅
  2. Local Docker test (optional):
    cd integration_tests
    export MSSQL_SA_PASSWORD="Elementary123!"
    docker compose -f docker-compose-sqlserver.yml up -d
    pytest tests -vvv --target sqlserver -n4

Notes

Summary by CodeRabbit

  • New Features

    • Microsoft Fabric/SQL Server added as a supported warehouse option (testing via sqlserver target).
  • Compatibility

    • Expanded Fabric/SQL Server support for data types, date/time functions, string/char formatting, time math, and incremental behavior.
    • Improved test/result query paths and boolean rendering for cross-dialect correctness.
    • CTE-in-subquery workaround for T-SQL via view-based temp table creation.
    • Dispatched macros for cleaner T-SQL handling (get_column_changes_from_baseline_cur, column_monitoring_group_by, insert_as_select, get_anomaly_query, schema_changes_query_group_by).
    • Dispatched render_value macro with Fabric-specific boolean literal handling.
    • Global refactoring: positional GROUP BY replaced with explicit column names in anomaly/run models.
    • Cross-DB string concatenation via dbt.concat() instead of raw concat() (fixes Redshift compatibility).
    • CI timeout protection: 60-minute job timeout to catch hung runs early.
  • Tests

    • CI and integration tests updated to exercise SQL Server scenarios and local service startup.
    • 187 integration tests passing in CI against SQL Server 2022.
    • All 22 CI checks pass across all supported warehouses.
  • Chores

    • Profiles generation gains an env-var helper for template rendering.
    • Target renamed from fabric to sqlserver for accuracy.
    • Removed redundant sqlserver__ dispatch wrappers (auto-dispatched via dependencies=["fabric"]).

Add support for Microsoft Fabric Data Warehouse as a new adapter in the
elementary dbt package. Fabric uses T-SQL syntax and is supported via the
dbt-fabric adapter maintained by Microsoft.

Changes:
- Add fabric__ macro overrides for date/time functions (datediff, date_trunc,
  time_trunc, to_char, timeadd, dateadd, current_timestamp)
- Add fabric__ data type overrides (data_type_list, type_timestamp, type_string,
  type_bool, get_normalized_data_type, try_cast_column_to_timestamp)
- Add fabric__ table operations (complete_buckets_cte with recursive CTE,
  escape_special_chars)
- Add fabric__ cross-DB utilities (target_database, incremental_strategy,
  generate_elementary_profile_args, day_of_week, hour_of_day, hour_of_week)
- Add fabric__ nested CTE workarounds (query_test_result_rows,
  get_failed_row_count_calc_query) using temp tables to avoid nested CTEs
  and TOP instead of LIMIT
- Add docker-compose-sqlserver.yml for local testing with env var credentials
- Add fabric target to profiles.yml.j2
- Update CI workflows (test-warehouse.yml, test-all-warehouses.yml) with
  SQL Server Docker setup, ODBC driver installation, and fabric target

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@linear
Copy link

linear bot commented Mar 4, 2026

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Fabric (SQL Server) support: CI/workflow inputs and secrets, Dockerized SQL Server test service and healthchecks, profile/template env helper, and extensive Fabric-specific macros and T-SQL-compatible materialization paths across EDR, utils, and data-type macros.

Changes

Cohort / File(s) Summary
CI / Workflows
.github/workflows/test-all-warehouses.yml, .github/workflows/test-warehouse.yml
Adds fabric option to matrices/inputs and workflow_dispatch, introduces MSSQL_SA_PASSWORD env, starts SQL Server in fabric runs, installs ODBC driver, and maps dbt/Elementary install extras to include fabric (sqlserver).
Test Infra
integration_tests/docker-compose-sqlserver.yml
New Docker Compose service for Microsoft SQL Server 2022 with port binding, env vars, and a sqlcmd-based healthcheck.
Profiles & Rendering
integration_tests/profiles/profiles.yml.j2, integration_tests/profiles/generate_profiles.py
Adds a fabric local target in the profiles template and a Jinja2 global env_var(key, default) for injecting env vars (e.g., MSSQL_SA_PASSWORD) when generating profiles.
Materializations / Test helpers
macros/edr/materializations/test/failed_row_count.sql, macros/edr/materializations/test/test.sql
Delegates failed-row-count and test-row queries via adapter.dispatch and adds fabric__ implementations that use temp relations/tables to work around Fabric/T-SQL limitations.
EDR System Utilities
macros/edr/system/system_utils/...
buckets_cte.sql, empty_table.sql, get_config_var.sql
Adds fabric__complete_buckets_cte, Fabric-specific dummy_values, boolean-rendering helpers (render_bool_config_var, edr_boolean_literal, fabric__edr_boolean_literal).
Cross-DB date/time & formatting
macros/utils/cross_db_utils/...
current_timestamp.sql, date_trunc.sql, dateadd.sql, datediff.sql, day_of_week.sql, hour_of_day.sql, hour_of_week.sql, time_trunc.sql, timeadd.sql, to_char.sql
Introduces many fabric__ variants implementing T‑SQL/Fabric expressions (getdate/sysutcdatetime, datetrunc/dateadd/datediff, datename/datepart, time trunc/add, to_char formatting, hour/day/week helpers).
Profile & strategy helpers
macros/utils/cross_db_utils/generate_elementary_profile_args.sql, macros/utils/cross_db_utils/incremental_strategy.sql, macros/utils/cross_db_utils/target_database.sql
Adds fabric__generate_elementary_profile_args (supports cli/service-principal/user flows), fabric__get_default_incremental_strategy (returns "merge"), and fabric__target_database.
Data types & normalization
macros/utils/data_types/...
data_type.sql, data_type_list.sql, get_normalized_data_type.sql, try_cast_column_to_timestamp.sql
Adds fabric__ type mappings (bit, varchar(4096), datetime2), data type lists and normalization mappings, and a Fabric try-cast-to-timestamp helper. Note: duplicate macro insertions observed in some files.
Table operations
macros/utils/table_operations/insert_rows.sql
Adds fabric__escape_special_chars (doubles single quotes). Duplicate insertion locations observed.
Data monitoring & numeric ops
macros/edr/data_monitoring/monitors/column_numeric_monitors.sql
Adds fabric__standard_deviation and fabric__variance using T‑SQL STDEV/VAR patterns.
Model / SQL template changes
models/edr/...
alerts/*, anomaly_detection/*, run_results/*, dbt_artifacts_hashes.sql, system/monitors_runs.sql
Replaces raw boolean literals with edr_boolean_literal/render_bool_config_var in many CASE/WHERE expressions; converts some GROUP BY/ORDER BY usages (explicit grouping, removed ordering clause).
Potential Attention Areas
macros/..., integration_tests/..., .github/...
Duplicate macro insertions (e.g., datediff, data_type_list, generate_elementary_profile_args, escape_special_chars) and Fabric/T‑SQL patterns (temp tables, maxrecursion) need SQL validation; CI secret/env interplay (MSSQL_SA_PASSWORD) and healthcheck timing should be reviewed.

Sequence Diagram(s)

sequenceDiagram
  participant User as User
  participant CI as CI Workflow
  participant Runner as Actions Runner
  participant Docker as Docker (sqlserver)
  participant Profiles as Profiles Renderer
  participant DBT as dbt/Elementary

  User->>CI: dispatch workflow (warehouse-type=fabric)
  CI->>Runner: start job
  Runner->>Docker: docker-compose up (sqlserver)
  Docker-->>Runner: healthcheck OK
  Runner->>Runner: install msodbcsql18, dbt + extras (fabric/sqlserver)
  Runner->>Profiles: render profiles (env_var, pass MSSQL_SA_PASSWORD)
  Profiles-->>Runner: profiles.yml
  Runner->>DBT: run dbt with fabric profile
  DBT->>Docker: connect via ODBC/T-SQL, execute `fabric__*` macros & temp-table flows
  DBT-->>Runner: tests/results
  Runner-->>CI: report status
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped in, nibbling schema seeds,

Temp tables sprouted to handle the deeds,
getdate and sysutc sang the hour,
Buckets and casts bent to Fabric's power,
Now tests can hop through SQL Server meads.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding Microsoft Fabric DWH integration to the elementary dbt package.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/ELE-5282-1772640713

Comment @coderabbitai help to get the list of available commands and usage tips.

devin-ai-integration bot and others added 2 commits March 4, 2026 16:23
The generate_profiles.py uses Python Jinja2 (not dbt Jinja2), so the
env_var() function used in profiles.yml.j2 for the fabric password was
undefined. Add env_var as a Jinja2 global that reads from os.environ.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…os to sqlserver__

- Changed profile type from 'fabric' to 'sqlserver' for SQL auth support
- Renamed all fabric__ macro overrides to sqlserver__ (dbt-fabric dispatches
  to sqlserver as fallback, so these work with both adapters)
- Updated CI to install dbt-sqlserver instead of dbt-fabric
- Added encrypt: false to profile for local SQL Server connection

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
macros/edr/materializations/test/test.sql (1)

205-212: Style consistency: Consider using elementary.run_query throughout.

Lines 206 and 212 use run_query, while line 211 uses elementary.run_query. Since the codebase consistently uses elementary.run_query in other table operations, standardize to that form for maintainability. Note: Both ultimately call the same dbt.run_query and maintain the same session context, so temp table visibility is not a concern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/materializations/test/test.sql` around lines 205 - 212, Replace
the two raw run_query calls with the consistent wrapper elementary.run_query:
change the calls that reference creating and dropping the temp table (the
run_query invocations surrounding tmp_relation) to use elementary.run_query
instead, keeping the same SQL strings; ensure the rest of the block that builds
query (the {% set query %} ... select ... from {{ tmp_relation }} and the usage
of elementary.agate_to_dicts(elementary.run_query(query))) remains unchanged so
behavior and session context are preserved.
macros/utils/cross_db_utils/generate_elementary_profile_args.sql (1)

270-270: Consider multi-line formatting for consistency.

The cli method handling is condensed to a single line while other branches use multi-line formatting. This is a minor readability nit.

♻️ Optional: Multi-line formatting
-    {% if method == "cli" %} {% do parameters.append(_parameter("method", "cli")) %}
+    {% if method == "cli" %}
+        {% do parameters.append(_parameter("method", "cli")) %}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/utils/cross_db_utils/generate_elementary_profile_args.sql` at line
270, The single-line Jinja branch for method == "cli" should be rewritten to
match the multi-line style used elsewhere: expand the conditional into separate
lines so the {% if method == "cli" %} is on its own line, the {% do
parameters.append(_parameter("method", "cli")) %} is on the next line, and close
with {% endif %}; this keeps the handling of method, parameters.append and
_parameter("cli") consistent with other branches for readability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/profiles/profiles.yml.j2`:
- Line 73: The Jinja expression requiring MSSQL_SA_PASSWORD causes rendering to
fail when the env var is missing; update the password template (the line
containing password: "{{ env_var('MSSQL_SA_PASSWORD') }}" ) to provide a safe
fallback by using env_var with a default value or Jinja's default filter (e.g.,
env_var('MSSQL_SA_PASSWORD', 'fallback') or env_var('MSSQL_SA_PASSWORD') |
default('fallback')) so local/profile rendering succeeds when the environment
variable is not set.

In `@macros/edr/materializations/test/failed_row_count.sql`:
- Around line 62-68: The macro fabric__get_failed_row_count_calc_query currently
calls a non-existent helper elementary.make_temp_relation(model) — replace that
call with the correct temp-relation helper (choose one of
edr_make_temp_relation, make_temp_table_relation, or make_temp_view_relation
depending on whether you need a temp table vs view) and update the tmp_relation
assignment accordingly; also ensure the temp table is cleaned up by adding `{%
do run_query("drop table if exists " ~ tmp_relation) %}` after the final SELECT
to prevent orphaned tables (refer to the tmp_relation variable and the existing
run_query("select * into " ~ tmp_relation ~ ...) usage to place the cleanup).

In `@macros/edr/system/system_utils/buckets_cte.sql`:
- Around line 241-247: The recursive CTE "timestamps" (using
min_bucket_start_expr, bucket_end_expr and max_bucket_end_expr) and the
SELECT-list alias next_bucket in the WHERE clause are incompatible with
Microsoft Fabric Warehouse; replace the recursive generation with a
Fabric-compatible approach (e.g., build bucket rows by CROSS JOINing against a
numbers/sequence source or a VALUES list and compute edr_bucket_start from
min_bucket_start_expr + n * bucket_interval) and remove use of the SELECT alias
in predicates by repeating the expression or computing it in a derived table, or
alternatively mark this macro as SQL Server–only and remove the fabric__ prefix
so it is not run on Fabric. Ensure the code references the same symbols
(min_bucket_start_expr, bucket_end_expr, max_bucket_end_expr,
edr_bucket_start/next_bucket) so callers still work.
- Line 253: The recursive CTE currently ends with "option (maxrecursion 10000)"
which can overflow for long ranges; update the recursion option to "MAXRECURSION
0" to allow unlimited recursion and rely on the existing termination condition
"where next_bucket < {{ max_bucket_end_expr }}" to stop recursion; edit the
buckets CTE in buckets_cte.sql (the clause containing option (maxrecursion
10000) and the recursive column next_bucket) to replace 10000 with 0 so large
1-minute bucket ranges do not error.

In `@macros/utils/cross_db_utils/incremental_strategy.sql`:
- Around line 23-25: The macro fabric__get_default_incremental_strategy
currently returns the hardcoded string "merge"; change it to follow the shared
strategy contract by returning "none" for non-Athena adapters and delegating to
the Athena-specific helper for Athena: update
fabric__get_default_incremental_strategy to detect the adapter type (or call the
existing get_append_only_incremental_strategy() for Athena) and return "none"
otherwise so adapter defaults are preserved.

In `@macros/utils/cross_db_utils/to_char.sql`:
- Around line 27-32: The fabric__edr_to_char macro currently injects the format
unquoted into CONVERT, which breaks when callers pass pattern-style strings;
update the macro (fabric__edr_to_char) to emit the format argument quoted (e.g.,
wrap {{ format }} in single quotes) when present while keeping the existing
numeric fallback 120 unchanged, so CONVERT(varchar, {{ column }}, '{{ format
}}') is produced for custom formats and CONVERT(varchar, {{ column }}, 120)
remains the fallback.

---

Nitpick comments:
In `@macros/edr/materializations/test/test.sql`:
- Around line 205-212: Replace the two raw run_query calls with the consistent
wrapper elementary.run_query: change the calls that reference creating and
dropping the temp table (the run_query invocations surrounding tmp_relation) to
use elementary.run_query instead, keeping the same SQL strings; ensure the rest
of the block that builds query (the {% set query %} ... select ... from {{
tmp_relation }} and the usage of
elementary.agate_to_dicts(elementary.run_query(query))) remains unchanged so
behavior and session context are preserved.

In `@macros/utils/cross_db_utils/generate_elementary_profile_args.sql`:
- Line 270: The single-line Jinja branch for method == "cli" should be rewritten
to match the multi-line style used elsewhere: expand the conditional into
separate lines so the {% if method == "cli" %} is on its own line, the {% do
parameters.append(_parameter("method", "cli")) %} is on the next line, and close
with {% endif %}; this keeps the handling of method, parameters.append and
_parameter("cli") consistent with other branches for readability.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 98b8c87c-9b9b-45a1-8964-0c755b95550d

📥 Commits

Reviewing files that changed from the base of the PR and between d88dc70 and 2c08c87.

📒 Files selected for processing (25)
  • .github/workflows/test-all-warehouses.yml
  • .github/workflows/test-warehouse.yml
  • integration_tests/docker-compose-sqlserver.yml
  • integration_tests/profiles/profiles.yml.j2
  • macros/edr/materializations/test/failed_row_count.sql
  • macros/edr/materializations/test/test.sql
  • macros/edr/system/system_utils/buckets_cte.sql
  • macros/utils/cross_db_utils/current_timestamp.sql
  • macros/utils/cross_db_utils/date_trunc.sql
  • macros/utils/cross_db_utils/dateadd.sql
  • macros/utils/cross_db_utils/datediff.sql
  • macros/utils/cross_db_utils/day_of_week.sql
  • macros/utils/cross_db_utils/generate_elementary_profile_args.sql
  • macros/utils/cross_db_utils/hour_of_day.sql
  • macros/utils/cross_db_utils/hour_of_week.sql
  • macros/utils/cross_db_utils/incremental_strategy.sql
  • macros/utils/cross_db_utils/target_database.sql
  • macros/utils/cross_db_utils/time_trunc.sql
  • macros/utils/cross_db_utils/timeadd.sql
  • macros/utils/cross_db_utils/to_char.sql
  • macros/utils/data_types/data_type.sql
  • macros/utils/data_types/data_type_list.sql
  • macros/utils/data_types/get_normalized_data_type.sql
  • macros/utils/data_types/try_cast_column_to_timestamp.sql
  • macros/utils/table_operations/insert_rows.sql

database: master
schema: {{ schema_name }}
user: sa
password: "{{ env_var('MSSQL_SA_PASSWORD') }}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a fallback for MSSQL_SA_PASSWORD to avoid local profile rendering failures.

Line [73] currently requires the env var to be set, otherwise Jinja rendering fails immediately.

Suggested change
-      password: "{{ env_var('MSSQL_SA_PASSWORD') }}"
+      password: "{{ env_var('MSSQL_SA_PASSWORD', 'Elementary123!') }}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
password: "{{ env_var('MSSQL_SA_PASSWORD') }}"
password: "{{ env_var('MSSQL_SA_PASSWORD', 'Elementary123!') }}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/profiles/profiles.yml.j2` at line 73, The Jinja expression
requiring MSSQL_SA_PASSWORD causes rendering to fail when the env var is
missing; update the password template (the line containing password: "{{
env_var('MSSQL_SA_PASSWORD') }}" ) to provide a safe fallback by using env_var
with a default value or Jinja's default filter (e.g.,
env_var('MSSQL_SA_PASSWORD', 'fallback') or env_var('MSSQL_SA_PASSWORD') |
default('fallback')) so local/profile rendering succeeds when the environment
variable is not set.

Comment on lines +23 to +25
{%- macro fabric__get_default_incremental_strategy() %}
{% do return("merge") %}
{% endmacro %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid forcing merge as Fabric’s default incremental strategy.

Returning "merge" here hardcodes adapter behavior where the shared strategy contract is to return none for non-Athena adapters and let adapter defaults apply. This can create unexpected behavior drift for Fabric.

Suggested change
 {%- macro fabric__get_default_incremental_strategy() %}
-    {% do return("merge") %}
+    {% do return(none) %}
 {% endmacro %}

Based on learnings: “implement that incremental_strategy should return none for all adapters except Athena, where get_append_only_incremental_strategy() is used.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/utils/cross_db_utils/incremental_strategy.sql` around lines 23 - 25,
The macro fabric__get_default_incremental_strategy currently returns the
hardcoded string "merge"; change it to follow the shared strategy contract by
returning "none" for non-Athena adapters and delegating to the Athena-specific
helper for Athena: update fabric__get_default_incremental_strategy to detect the
adapter type (or call the existing get_append_only_incremental_strategy() for
Athena) and return "none" otherwise so adapter defaults are preserved.

Comment on lines +27 to +32
{% macro fabric__edr_to_char(column, format) %}
convert(varchar, {{ column }}
{%- if format %}, {{ format }})
{%- else %}, 120)
{%- endif %}
{% endmacro %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Show the Fabric implementation
echo "=== Fabric edr_to_char implementation ==="
rg -n -C2 'fabric__edr_to_char|convert\s*\(\s*varchar' macros/utils/cross_db_utils/to_char.sql 2>/dev/null || echo "File or pattern not found"

# 2) List all edr_to_char call sites and supplied format argument
echo -e "\n=== All edr_to_char call sites ==="
rg -nP --type=sql '\bedr_to_char\s*\(\s*[^,]+,\s*([^)]+)\)' -C1 2>/dev/null || echo "No matches found"

# 3) Highlight likely non-numeric formats (risk for CONVERT style arg)
echo -e "\n=== Potential pattern-style format arguments (risk) ==="
rg -nP --type=sql '\bedr_to_char\s*\(\s*[^,]+,\s*[^)]*[A-Za-z%][^)]*\)' -C1 2>/dev/null || echo "No matches found"

Repository: elementary-data/dbt-data-reliability

Length of output: 798


🏁 Script executed:

# Additional checks to understand the macro structure
echo "=== File structure and other adapters ==="
cat -n macros/utils/cross_db_utils/to_char.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 1241


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for edr_to_char calls across all file types
echo "=== Searching for edr_to_char calls across codebase ==="
rg -n 'edr_to_char\s*\(' --type=sql --type=jinja -A1 -B1 2>/dev/null | head -50

echo -e "\n=== Check if there are any SQL/jinja files outside macros ==="
fd -e sql -e jinja | grep -v node_modules | head -20

Repository: elementary-data/dbt-data-reliability

Length of output: 2567


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for actual edr_to_char calls (excluding the macro definitions in to_char.sql)
echo "=== Searching for edr_to_char invocations (excluding definitions) ==="
rg '\bedr_to_char\b' -g '!to_char.sql' -A2 -B2 2>/dev/null | head -100 || echo "No calls found outside to_char.sql"

# Also try broader search across whole repo
echo -e "\n=== Broader search for edr_to_char ==="
rg '\bedr_to_char\b' -g '!to_char.sql' 2>/dev/null | head -20 || echo "No matches"

Repository: elementary-data/dbt-data-reliability

Length of output: 235


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check if fabric__edr_to_char is newly added or modified (check git history)
echo "=== Git history for to_char.sql (last 5 commits) ==="
git log --oneline -5 -- macros/utils/cross_db_utils/to_char.sql 2>/dev/null || echo "Git history unavailable"

# Show current diff if this file has uncommitted changes
echo -e "\n=== Check for uncommitted changes ==="
git diff macros/utils/cross_db_utils/to_char.sql 2>/dev/null || echo "No uncommitted changes"

# Check fabric adapter detection in the codebase
echo -e "\n=== Check if Fabric/T-SQL adapter is configured ==="
rg -i 'fabric|t.?sql|convert.*varchar' -g '*.yml' -g '*.yaml' 2>/dev/null | head -20 || echo "No fabric config found"

Repository: elementary-data/dbt-data-reliability

Length of output: 408


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check when fabric__edr_to_char was added
echo "=== Full git log for to_char.sql ==="
git log --oneline -- macros/utils/cross_db_utils/to_char.sql 2>/dev/null

# Look at the most recent change to understand context
echo -e "\n=== Most recent commit details ==="
git show 4699b73a --stat 2>/dev/null | head -30

# Check if fabric is mentioned in PR/branch context
echo -e "\n=== Check dbt manifest or profiles for Fabric ==="
rg -i 'fabric' 2>/dev/null | grep -v node_modules | head -30

Repository: elementary-data/dbt-data-reliability

Length of output: 1872


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get full git log with diffs for to_char.sql to see when fabric__edr_to_char was added
echo "=== Git log with diffs for to_char.sql (showing only macro changes) ==="
git log -p -- macros/utils/cross_db_utils/to_char.sql 2>/dev/null | grep -A30 -B5 'fabric__edr_to_char' | head -60

# Alternative: check which commits modified fabric related macros
echo -e "\n=== Commits touching fabric macros ==="
git log --all --oneline --grep='fabric' 2>/dev/null | head -10

Repository: elementary-data/dbt-data-reliability

Length of output: 267


Fix Fabric edr_to_char format argument handling to match cross-db contract.

The fabric implementation at line 29 passes format unquoted to T-SQL's CONVERT(..., style), which expects a numeric style code. However, all other adapters (default, BigQuery, Spark) wrap format in quotes, treating it as a pattern string (e.g., 'YYYY-MM-DD HH:MI:SS').

If callers pass pattern-style format strings—which is the standard across other adapters—the Fabric SQL will fail at runtime. The default fallback to 120 masks this for calls without a custom format, but any formatted call will break.

Quote the format argument on line 29 to align with the cross-db contract, or document that Fabric requires numeric T-SQL style codes instead of pattern strings:

Current implementation (lines 27-32)
{% macro fabric__edr_to_char(column, format) %}
    convert(varchar, {{ column }}
    {%- if format %}, {{ format }})
    {%- else %}, 120)
    {%- endif %}
{% endmacro %}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/utils/cross_db_utils/to_char.sql` around lines 27 - 32, The
fabric__edr_to_char macro currently injects the format unquoted into CONVERT,
which breaks when callers pass pattern-style strings; update the macro
(fabric__edr_to_char) to emit the format argument quoted (e.g., wrap {{ format
}} in single quotes) when present while keeping the existing numeric fallback
120 unchanged, so CONVERT(varchar, {{ column }}, '{{ format }}') is produced for
custom formats and CONVERT(varchar, {{ column }}, 120) remains the fallback.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (3)
macros/utils/cross_db_utils/incremental_strategy.sql (1)

23-25: ⚠️ Potential issue | 🟠 Major

Avoid hardcoding merge for SQL Server default strategy.

At Line 24, returning "merge" reintroduces the same behavior drift previously flagged; non-Athena adapters should return none so adapter defaults apply.

Suggested fix
 {%- macro sqlserver__get_default_incremental_strategy() %}
-    {% do return("merge") %}
+    {% do return(none) %}
 {% endmacro %}

Based on learnings: “incremental_strategy should return none for all adapters except Athena, where get_append_only_incremental_strategy() is used.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/utils/cross_db_utils/incremental_strategy.sql` around lines 23 - 25,
The macro sqlserver__get_default_incremental_strategy currently returns the
hardcoded string "merge"; change it to return "none" so SQL Server defers to the
adapter's default incremental strategy (consistent with the rule that only
Athena uses get_append_only_incremental_strategy); update the return in
sqlserver__get_default_incremental_strategy to "none" and ensure no other
hardcoded "merge" defaults are introduced for non-Athena adapters.
macros/edr/system/system_utils/buckets_cte.sql (1)

244-246: ⚠️ Potential issue | 🔴 Critical

Fix invalid alias usage in the recursive predicate.

Line 246 references next_bucket in WHERE, but in T-SQL a SELECT-list alias is not visible in the same query block’s WHERE. This can fail at runtime/compile time.

Suggested fix
-          select {{ bucket_end_expr }} as next_bucket
+          select {{ bucket_end_expr }} as edr_bucket_start
           from timestamps
-          where next_bucket < {{ max_bucket_end_expr }}
+          where {{ bucket_end_expr }} < {{ max_bucket_end_expr }}
In SQL Server T-SQL, can a SELECT-list alias be referenced in the WHERE clause of the same SELECT query block?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/system/system_utils/buckets_cte.sql` around lines 244 - 246, The
WHERE is incorrectly referencing the SELECT-list alias next_bucket (produced by
{{ bucket_end_expr }}) which is not visible in the same query block; update the
predicate to use the actual expression instead of the alias (i.e., replace
next_bucket in the WHERE with the same {{ bucket_end_expr }} expression or wrap
the SELECT as a subquery/CTE so the alias becomes available), targeting the
query that selects from timestamps and uses {{ max_bucket_end_expr }} to enforce
the upper bound.
macros/edr/materializations/test/failed_row_count.sql (1)

62-73: ⚠️ Potential issue | 🔴 Critical

Missing temp table cleanup will leak orphaned tables.

Unlike test.sql (line 214), this macro creates a temp table but never drops it. Add cleanup after the SELECT statement.

Additionally, the elementary.make_temp_relation(model) call was flagged in a previous review as potentially undefined.

🐛 Proposed fix to add cleanup
 {% macro sqlserver__get_failed_row_count_calc_query(failed_row_count_calc) %}
     {# Fabric / T-SQL does not support nested CTEs.
        We create a temp table from the test SQL, then select from it. #}
     {% set tmp_relation = elementary.make_temp_relation(model) %}
     {% do run_query(
         "select * into " ~ tmp_relation ~ " from (" ~ sql ~ ") as __edr_inner"
     ) %}
+    {% set query %}
     select
         {{ failed_row_count_calc }}
         as {{ elementary.escape_reserved_keywords("count") }}
     from {{ tmp_relation }}
+    {% endset %}
+    {% set result = elementary.run_query(query) %}
+    {% do run_query("drop table if exists " ~ tmp_relation) %}
+    {{ query }}
 {% endmacro %}

Note: The fix above is illustrative. The macro returns raw SQL (not a result set), so the actual fix depends on how the caller executes it. If the SQL is executed elsewhere, consider documenting that cleanup is the caller's responsibility, or restructure to handle cleanup internally.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/materializations/test/failed_row_count.sql` around lines 62 - 73,
The macro sqlserver__get_failed_row_count_calc_query creates a temp table
(tmp_relation via elementary.make_temp_relation(model)) but never drops it,
which will leak tables; after the SELECT from {{ tmp_relation }} add a DROP
TABLE (or DROP TABLE IF EXISTS / appropriate T-SQL-safe drop) for {{
tmp_relation }} to ensure cleanup, and either ensure model is defined in scope
or change the macro signature to accept model (e.g., add model as a parameter)
so elementary.make_temp_relation(model) is valid; if you prefer caller-managed
cleanup, document that behavior instead.
🧹 Nitpick comments (2)
macros/utils/cross_db_utils/dateadd.sql (1)

19-29: Comment block is misplaced relative to the new macro.

The docstring at lines 19-26 documents the Dremio override's bugs, but it now appears directly above sqlserver__edr_dateadd, which could mislead readers into thinking the comment applies to the SQL Server macro.

Consider adding a blank line and brief comment for the sqlserver macro, or relocating the Dremio docstring to sit immediately above dremio__edr_dateadd.

📝 Suggested reordering
-{#
-  Override dbt-dremio's dateadd macro which has two bugs:
-  1. Calls interval.replace() on the interval parameter, failing when interval is an integer
-  2. Wraps result in "select TIMESTAMPADD(...)" which creates a scalar subquery when
-     embedded in larger SQL expressions, causing $SCALAR_QUERY errors in Dremio
-
-  This override outputs just TIMESTAMPADD(...) as an expression (no "select" prefix).
-#}
+{# SQL Server / Fabric: use native DATEADD function #}
 {% macro sqlserver__edr_dateadd(datepart, interval, from_date_or_timestamp) %}
     dateadd({{ datepart }}, {{ interval }}, {{ from_date_or_timestamp }})
 {% endmacro %}

+{#
+  Override dbt-dremio's dateadd macro which has two bugs:
+  1. Calls interval.replace() on the interval parameter, failing when interval is an integer
+  2. Wraps result in "select TIMESTAMPADD(...)" which creates a scalar subquery when
+     embedded in larger SQL expressions, causing $SCALAR_QUERY errors in Dremio
+
+  This override outputs just TIMESTAMPADD(...) as an expression (no "select" prefix).
+#}
 {% macro dremio__edr_dateadd(datepart, interval, from_date_or_timestamp) %}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/utils/cross_db_utils/dateadd.sql` around lines 19 - 29, The
documentation block describing the Dremio override is placed above the
sqlserver__edr_dateadd macro and may mislead readers; move the existing
docstring so it immediately precedes the dremio__edr_dateadd macro, and add a
short one-line comment or a blank line above sqlserver__edr_dateadd to indicate
it is a separate SQL Server-specific implementation. Ensure the docstring text
remains unchanged and that references to the two macros (dremio__edr_dateadd and
sqlserver__edr_dateadd) are clearly separated in the file.
.github/workflows/test-warehouse.yml (1)

116-122: Consider pinning the ODBC driver version for CI reproducibility.

The installation pulls the latest msodbcsql18 package, which could introduce unexpected behavior if Microsoft releases a breaking change. For more deterministic CI builds, consider pinning to a specific version.

♻️ Optional: Pin ODBC driver version
-          sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18 unixodbc-dev
+          sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18=18.4.1.1-1 unixodbc-dev

Note: Check the latest stable version and update accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test-warehouse.yml around lines 116 - 122, The workflow
step installs msodbcsql18 without pinning, risking CI breakage; update the
Install ODBC Driver run block to install a specific msodbcsql18 version by
replacing the unpinned package name (msodbcsql18) with a pinned package spec
(e.g., msodbcsql18=<VERSION>) or by introducing a variable/parameter for the
ODBC driver version and using that in the apt install command; ensure
unixodbc-dev remains as-is and document/update the chosen <VERSION> to a known
stable release.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/test-warehouse.yml:
- Line 55: Replace the hardcoded fallback value 'Elementary123!' used for
MSSQL_SA_PASSWORD in the workflow with a more generic/weak local-only default
(e.g., 'Password1!') or remove the inline fallback and fail the job when the
secret is missing; update the MSSQL_SA_PASSWORD assignment so it either uses a
documented local-only default or requires the secret to be set, and add a short
comment clarifying that any default is for ephemeral/local CI only.

In `@integration_tests/profiles/generate_profiles.py`:
- Line 126: The current env.globals["env_var"] lambda silently returns "" when
no default is provided; replace it with a real function (e.g., env_var) that
distinguishes between an omitted default and an explicit default: if a default
is supplied, return os.environ.get(key, default); if no default was supplied and
the environment variable is missing, raise an error (e.g., KeyError or a clear
RuntimeError) so templates like env_var('MSSQL_SA_PASSWORD') fail fast in strict
mode; update the reference to env.globals["env_var"] to point to this new
function.

In `@macros/edr/materializations/test/test.sql`:
- Around line 191-216: The call to elementary.make_temp_relation in the
sqlserver__query_test_result_rows macro is incorrect and should use the renamed
helper elementary.edr_make_temp_relation; update the tmp_relation assignment to
call elementary.edr_make_temp_relation(model) instead of
elementary.make_temp_relation(model) so tmp_relation, the temp-table creation
(run_query "select * into " ~ tmp_relation ~ ...), and the subsequent DROP use
the correct helper name.

In `@macros/utils/cross_db_utils/generate_elementary_profile_args.sql`:
- Around line 270-279: The template currently treats any non-matching method
(including typos) as username/password; update the logic in
generate_elementary_profile_args.sql so that you handle three cases explicitly:
when method is none (append no auth params or a clear placeholder like
_parameter("method", "<NONE>")), when method == "cli" or "service-principal"
(keep the existing branches that append parameters using _parameter and
parameters.append), and a new else branch for unsupported methods that appends a
clear placeholder/instruction parameter (e.g., _parameter("method",
"<UNSUPPORTED: specify 'cli' or 'service-principal' or 'none'>")) rather than
silently inserting user/password; reference the symbols method,
parameters.append, and _parameter to locate and change the conditional branches.

In `@macros/utils/cross_db_utils/hour_of_week.sql`:
- Around line 83-84: Replace the locale-dependent datename(weekday, {{ date_expr
}}) with a locale-stable day calculation: compute weekday as ((DATEDIFF(day,
'1900-01-01', CAST({{ date_expr }} AS date)) % 7) + 1) and cast that to {{
elementary.edr_type_string() }} (keep the existing cast for the hour using
datepart(hour, {{ date_expr }})); ensure you CAST({{ date_expr }} AS date)
before DATEDIFF to avoid time-boundary shifts and update the expression that
currently uses datename(weekday, {{ date_expr }}) to this deterministic formula.

In `@macros/utils/data_types/get_normalized_data_type.sql`:
- Around line 161-167: The mapping in the get_normalized_data_type macro
currently normalizes distinct SQL Server types (e.g., "DATETIME",
"SMALLDATETIME", "MONEY", "SMALLMONEY") to other canonical types, which hides
real schema differences; remove the mapping entries for "DATETIME",
"SMALLDATETIME", "MONEY", and "SMALLMONEY" from the normalization dictionary so
they remain their native types during comparisons, but keep true SQL Server
synonym mappings such as "NVARCHAR", "NCHAR", "NTEXT", "ROWVERSION", "DOUBLE
PRECISION", and "REAL" in the same mapping structure used by the
get_normalized_data_type macro.

In `@macros/utils/data_types/try_cast_column_to_timestamp.sql`:
- Around line 32-39: The WHERE clause in try_cast_column_to_timestamp.sql can
return a TOP 1 row that fails try_cast() because try_cast returns NULL on
failure; update the query that sets "query" (which uses {{ timestamp_column }},
{{ table_relation }}, and {{ elementary.edr_type_timestamp() }}) to add "and
try_cast({{ timestamp_column }} as {{ elementary.edr_type_timestamp() }}) is not
null" so only successfully castable rows are considered before calling
elementary.result_value(); keep the rest of the logic (result handling with
result_value() and return(true)/return(false)) unchanged.

---

Duplicate comments:
In `@macros/edr/materializations/test/failed_row_count.sql`:
- Around line 62-73: The macro sqlserver__get_failed_row_count_calc_query
creates a temp table (tmp_relation via elementary.make_temp_relation(model)) but
never drops it, which will leak tables; after the SELECT from {{ tmp_relation }}
add a DROP TABLE (or DROP TABLE IF EXISTS / appropriate T-SQL-safe drop) for {{
tmp_relation }} to ensure cleanup, and either ensure model is defined in scope
or change the macro signature to accept model (e.g., add model as a parameter)
so elementary.make_temp_relation(model) is valid; if you prefer caller-managed
cleanup, document that behavior instead.

In `@macros/edr/system/system_utils/buckets_cte.sql`:
- Around line 244-246: The WHERE is incorrectly referencing the SELECT-list
alias next_bucket (produced by {{ bucket_end_expr }}) which is not visible in
the same query block; update the predicate to use the actual expression instead
of the alias (i.e., replace next_bucket in the WHERE with the same {{
bucket_end_expr }} expression or wrap the SELECT as a subquery/CTE so the alias
becomes available), targeting the query that selects from timestamps and uses {{
max_bucket_end_expr }} to enforce the upper bound.

In `@macros/utils/cross_db_utils/incremental_strategy.sql`:
- Around line 23-25: The macro sqlserver__get_default_incremental_strategy
currently returns the hardcoded string "merge"; change it to return "none" so
SQL Server defers to the adapter's default incremental strategy (consistent with
the rule that only Athena uses get_append_only_incremental_strategy); update the
return in sqlserver__get_default_incremental_strategy to "none" and ensure no
other hardcoded "merge" defaults are introduced for non-Athena adapters.

---

Nitpick comments:
In @.github/workflows/test-warehouse.yml:
- Around line 116-122: The workflow step installs msodbcsql18 without pinning,
risking CI breakage; update the Install ODBC Driver run block to install a
specific msodbcsql18 version by replacing the unpinned package name
(msodbcsql18) with a pinned package spec (e.g., msodbcsql18=<VERSION>) or by
introducing a variable/parameter for the ODBC driver version and using that in
the apt install command; ensure unixodbc-dev remains as-is and document/update
the chosen <VERSION> to a known stable release.

In `@macros/utils/cross_db_utils/dateadd.sql`:
- Around line 19-29: The documentation block describing the Dremio override is
placed above the sqlserver__edr_dateadd macro and may mislead readers; move the
existing docstring so it immediately precedes the dremio__edr_dateadd macro, and
add a short one-line comment or a blank line above sqlserver__edr_dateadd to
indicate it is a separate SQL Server-specific implementation. Ensure the
docstring text remains unchanged and that references to the two macros
(dremio__edr_dateadd and sqlserver__edr_dateadd) are clearly separated in the
file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7d8ad01e-d735-4aa7-8a79-b1c603c9bb7c

📥 Commits

Reviewing files that changed from the base of the PR and between 2c08c87 and a0a7285.

📒 Files selected for processing (24)
  • .github/workflows/test-warehouse.yml
  • integration_tests/profiles/generate_profiles.py
  • integration_tests/profiles/profiles.yml.j2
  • macros/edr/materializations/test/failed_row_count.sql
  • macros/edr/materializations/test/test.sql
  • macros/edr/system/system_utils/buckets_cte.sql
  • macros/utils/cross_db_utils/current_timestamp.sql
  • macros/utils/cross_db_utils/date_trunc.sql
  • macros/utils/cross_db_utils/dateadd.sql
  • macros/utils/cross_db_utils/datediff.sql
  • macros/utils/cross_db_utils/day_of_week.sql
  • macros/utils/cross_db_utils/generate_elementary_profile_args.sql
  • macros/utils/cross_db_utils/hour_of_day.sql
  • macros/utils/cross_db_utils/hour_of_week.sql
  • macros/utils/cross_db_utils/incremental_strategy.sql
  • macros/utils/cross_db_utils/target_database.sql
  • macros/utils/cross_db_utils/time_trunc.sql
  • macros/utils/cross_db_utils/timeadd.sql
  • macros/utils/cross_db_utils/to_char.sql
  • macros/utils/data_types/data_type.sql
  • macros/utils/data_types/data_type_list.sql
  • macros/utils/data_types/get_normalized_data_type.sql
  • macros/utils/data_types/try_cast_column_to_timestamp.sql
  • macros/utils/table_operations/insert_rows.sql
🚧 Files skipped from review as they are similar to previous changes (7)
  • macros/utils/cross_db_utils/day_of_week.sql
  • integration_tests/profiles/profiles.yml.j2
  • macros/utils/cross_db_utils/target_database.sql
  • macros/utils/cross_db_utils/date_trunc.sql
  • macros/utils/data_types/data_type.sql
  • macros/utils/cross_db_utils/to_char.sql
  • macros/utils/data_types/data_type_list.sql

env:
BRANCH_NAME: ${{ github.head_ref || github.ref_name }}
TESTS_DIR: ${{ github.workspace }}/dbt-data-reliability/integration_tests
MSSQL_SA_PASSWORD: ${{ secrets.MSSQL_SA_PASSWORD || 'Elementary123!' }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hardcoded fallback password in workflow file.

The fallback password 'Elementary123!' is exposed in plaintext. While this is acceptable for local CI testing with ephemeral Docker containers, consider using a more generic/weak password for the fallback (e.g., 'Password1!') or documenting that this is intentionally for local testing only.

If this repo is public, anyone can see this default password—though the risk is low since it's only used for ephemeral test containers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test-warehouse.yml at line 55, Replace the hardcoded
fallback value 'Elementary123!' used for MSSQL_SA_PASSWORD in the workflow with
a more generic/weak local-only default (e.g., 'Password1!') or remove the inline
fallback and fail the job when the secret is missing; update the
MSSQL_SA_PASSWORD assignment so it either uses a documented local-only default
or requires the secret to be set, and add a short comment clarifying that any
default is for ephemeral/local CI only.

keep_trailing_newline=True,
)
env.filters["toyaml"] = _yaml_inline
env.globals["env_var"] = lambda key, default="": os.environ.get(key, default)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and read the generate_profiles.py file around line 126
wc -l integration_tests/profiles/generate_profiles.py

Repository: elementary-data/dbt-data-reliability

Length of output: 131


🏁 Script executed:

# Read the file with context around line 126
sed -n '110,140p' integration_tests/profiles/generate_profiles.py | cat -n

Repository: elementary-data/dbt-data-reliability

Length of output: 1193


🏁 Script executed:

# Look for usage of env_var in the Jinja templates
find integration_tests -name "*.j2" -o -name "*.yml.j2" | head -20

Repository: elementary-data/dbt-data-reliability

Length of output: 122


🏁 Script executed:

# Check for env_var usage patterns in templates
rg "env_var\(" integration_tests/profiles/ -C 2

Repository: elementary-data/dbt-data-reliability

Length of output: 438


🏁 Script executed:

# Get more context around line 117 in the file
sed -n '100,134p' integration_tests/profiles/generate_profiles.py | cat -n

Repository: elementary-data/dbt-data-reliability

Length of output: 1611


🌐 Web query:

dbt env_var Jinja function behavior when variable not set no default

💡 Result:

In dbt, env_var() requires the environment variable to be set unless you pass a default:

  • {{ env_var('MY_VAR') }} (no default) → if MY_VAR is not set, dbt errors with: Env var required but not provided: 'MY_VAR' (this is a hard failure during parsing/compilation). [1][2]
  • {{ env_var('MY_VAR', 'some_default') }} → if MY_VAR is not set, dbt returns the provided default instead of erroring. [3]

So “when variable not set, no default” behavior = fail fast with an exception/error. [1][2]

Sources:
[1] https://airflow.apache.org/docs/apache-airflow-providers-openlineage/2.6.1/_modules/tests/system/openlineage/operator.html
[2] https://stackoverflow.com/questions/76653253/env-var-function-in-dbt-project-is-not-behaving-as-expected
[3] https://docs.open-metadata.org/latest/connectors/ingestion/workflows/dbt/auto-ingest-dbt-core


🏁 Script executed:

# Check if there are more usages of env_var in templates or documentation
rg "env_var" integration_tests/ -t yml -t yaml -C 3

Repository: elementary-data/dbt-data-reliability

Length of output: 107


🏁 Script executed:

# Check for all env_var usages in .j2 files more carefully
rg "env_var" integration_tests/profiles/

Repository: elementary-data/dbt-data-reliability

Length of output: 305


🏁 Script executed:

# Check if MSSQL_SA_PASSWORD or similar env vars are expected to be set
grep -r "MSSQL_SA_PASSWORD\|env_var" integration_tests/ --include="*.py" --include="*.md"

Repository: elementary-data/dbt-data-reliability

Length of output: 209


🏁 Script executed:

# See the full context of the function to understand when secrets are vs aren't loaded
sed -n '80,134p' integration_tests/profiles/generate_profiles.py | cat -n

Repository: elementary-data/dbt-data-reliability

Length of output: 2568


🏁 Script executed:

# Check all env_var calls in the entire codebase to see if any have explicit defaults
rg "env_var\(" --type py --type yaml -C 2

Repository: elementary-data/dbt-data-reliability

Length of output: 379


🏁 Script executed:

# Also check .j2 files specifically
fd "\.j2$" integration_tests/ | xargs cat | grep -n "env_var"

Repository: elementary-data/dbt-data-reliability

Length of output: 135


Preserve required env_var semantics instead of silently returning empty string.

At line 126, missing env vars silently resolve to "" even when no default is provided. The template at line 73 calls env_var('MSSQL_SA_PASSWORD') without an explicit default, which should fail in strict mode (when secrets are loaded) per dbt's documented behavior. The current implementation bypasses StrictUndefined's fail-fast guarantee.

🔧 Suggested fix
+_MISSING = object()
+
+def _env_var(key: str, default: Any = _MISSING) -> str:
+    value = os.environ.get(key)
+    if value is None:
+        if default is _MISSING:
+            raise click.ClickException(f"Missing required environment variable: {key}")
+        return default
+    return value
+
@@
-    env.globals["env_var"] = lambda key, default="": os.environ.get(key, default)
+    env.globals["env_var"] = _env_var
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/profiles/generate_profiles.py` at line 126, The current
env.globals["env_var"] lambda silently returns "" when no default is provided;
replace it with a real function (e.g., env_var) that distinguishes between an
omitted default and an explicit default: if a default is supplied, return
os.environ.get(key, default); if no default was supplied and the environment
variable is missing, raise an error (e.g., KeyError or a clear RuntimeError) so
templates like env_var('MSSQL_SA_PASSWORD') fail fast in strict mode; update the
reference to env.globals["env_var"] to point to this new function.

Comment on lines +270 to +279
{% if method == "cli" %} {% do parameters.append(_parameter("method", "cli")) %}
{% elif method == "service-principal" %}
{% do parameters.append(_parameter("method", "service-principal")) %}
{% do parameters.append(_parameter("tenant_id", "<TENANT_ID>")) %}
{% do parameters.append(_parameter("client_id", "<CLIENT_ID>")) %}
{% do parameters.append(_parameter("client_secret", "<CLIENT_SECRET>")) %}
{% else %}
{% do parameters.append(_parameter("user", target.user)) %}
{% do parameters.append(_parameter("password", "<PASSWORD>")) %}
{% endif %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid silently treating unknown auth methods as username/password.

Right now, any typo or unsupported method value falls into the else branch (Line 276) and generates user/password, which can hide config mistakes and produce confusing failures. Please handle explicit default (method is none) separately and surface unsupported methods as placeholders/instructions.

Suggested fix
-    {% if method == "cli" %} {% do parameters.append(_parameter("method", "cli")) %}
+    {% if method == "cli" %} {% do parameters.append(_parameter("method", "cli")) %}
     {% elif method == "service-principal" %}
         {% do parameters.append(_parameter("method", "service-principal")) %}
         {% do parameters.append(_parameter("tenant_id", "<TENANT_ID>")) %}
         {% do parameters.append(_parameter("client_id", "<CLIENT_ID>")) %}
         {% do parameters.append(_parameter("client_secret", "<CLIENT_SECRET>")) %}
-    {% else %}
+    {% elif method is none %}
         {% do parameters.append(_parameter("user", target.user)) %}
         {% do parameters.append(_parameter("password", "<PASSWORD>")) %}
+    {% else %}
+        {% do parameters.append(
+            _parameter(
+                "method",
+                "<AUTH_METHOD>",
+                "Supported methods: cli, service-principal, or omit for username/password authentication.",
+            )
+        ) %}
     {% endif %}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{% if method == "cli" %} {% do parameters.append(_parameter("method", "cli")) %}
{% elif method == "service-principal" %}
{% do parameters.append(_parameter("method", "service-principal")) %}
{% do parameters.append(_parameter("tenant_id", "<TENANT_ID>")) %}
{% do parameters.append(_parameter("client_id", "<CLIENT_ID>")) %}
{% do parameters.append(_parameter("client_secret", "<CLIENT_SECRET>")) %}
{% else %}
{% do parameters.append(_parameter("user", target.user)) %}
{% do parameters.append(_parameter("password", "<PASSWORD>")) %}
{% endif %}
{% if method == "cli" %} {% do parameters.append(_parameter("method", "cli")) %}
{% elif method == "service-principal" %}
{% do parameters.append(_parameter("method", "service-principal")) %}
{% do parameters.append(_parameter("tenant_id", "<TENANT_ID>")) %}
{% do parameters.append(_parameter("client_id", "<CLIENT_ID>")) %}
{% do parameters.append(_parameter("client_secret", "<CLIENT_SECRET>")) %}
{% elif method is none %}
{% do parameters.append(_parameter("user", target.user)) %}
{% do parameters.append(_parameter("password", "<PASSWORD>")) %}
{% else %}
{% do parameters.append(
_parameter(
"method",
"<AUTH_METHOD>",
"Supported methods: cli, service-principal, or omit for username/password authentication.",
)
) %}
{% endif %}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/utils/cross_db_utils/generate_elementary_profile_args.sql` around
lines 270 - 279, The template currently treats any non-matching method
(including typos) as username/password; update the logic in
generate_elementary_profile_args.sql so that you handle three cases explicitly:
when method is none (append no auth params or a clear placeholder like
_parameter("method", "<NONE>")), when method == "cli" or "service-principal"
(keep the existing branches that append parameters using _parameter and
parameters.append), and a new else branch for unsupported methods that appends a
clear placeholder/instruction parameter (e.g., _parameter("method",
"<UNSUPPORTED: specify 'cli' or 'service-principal' or 'none'>")) rather than
silently inserting user/password; reference the symbols method,
parameters.append, and _parameter to locate and change the conditional branches.

Comment on lines +83 to +84
cast(datename(weekday, {{ date_expr }}) as {{ elementary.edr_type_string() }}),
cast(datepart(hour, {{ date_expr }}) as {{ elementary.edr_type_string() }})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

❓ Verification inconclusive

SQL Server DATENAME weekday session language dependent


DATENAME(weekday, <date>) is session-language dependent: it returns the weekday name translated to the language of the current session (e.g., Tuesday vs dimanche). This is why the same query can return different strings for different users/sessions. [1], [2]

What controls the language?

  • The session language can be seen with SELECT @@LANGUAGE; [3]
  • It comes from the login default language (server setting) unless overridden for the session. [4]
  • Installed languages (and their day/month names) are in sys.syslanguages / sp_helplanguage. [5], [6]

How to make results consistent

Option A (force a session language):

SET LANGUAGE us_english;
SELECT DATENAME(weekday, CAST('2026-03-04' AS date));

This makes DATENAME stable within that session, but you’re changing session settings (which can have side effects). [2]

Option B (don’t rely on names; use numbers):
Use DATEPART(weekday, ...) (an int 1–7), but note it depends on DATEFIRST (also a session setting visible in sys.dm_exec_sessions.date_first). [7]

Option C (need a specific output language without changing session settings):
Use .NET formatting with a culture:

SELECT FORMAT(CAST('2026-03-04' AS date), 'dddd', 'en-US') AS WeekdayName;

This is culture-specific and avoids SET LANGUAGE, but can be slower on large result sets. [2]

Sources: DATENAME behavior/examples [2]; session language/@@language [3]; weekday-name language dependence discussion [1]; language metadata [5][6]; login default language behavior [4]; session DATEFIRST metadata [7].


SQL Server locale-independent weekday extraction best practice


Key point

In SQL Server, DATEPART(weekday, …) / DATEPART(dw, …) is not locale-independent: its 1–7 mapping depends on the session’s SET DATEFIRST value. [1][2]

Best-practice options (pick one)

Option A (recommended): Normalize using @@DATEFIRST (works under any session setting)

If you want a stable ISO-like weekday number (Monday=1 … Sunday=7) regardless of user/session language or DATEFIRST:

-- ISO weekday: Monday=1 ... Sunday=7 (independent of DATEFIRST)
SELECT ((DATEPART(weekday, `@d`) + @@DATEFIRST - 2) % 7) + 1 AS iso_weekday;
  • @@DATEFIRST returns the current session’s DATEFIRST. [3]
  • This avoids relying on a particular SET DATEFIRST being in effect (important in views/UDF callers, etc.). In particular, you can’t put SET DATEFIRST inside a function. [5]

Option B: Use an anchor-date modulo (fully independent of DATEFIRST)

Compute weekday from a known reference date (example uses 1900-01-01 = Monday):

-- Monday=1 ... Sunday=7 (independent of DATEFIRST)
SELECT (DATEDIFF(day, '19000101', `@d`) % 7) + 1 AS weekday_mon1;

This avoids DATEPART(weekday, …) entirely. [7]

What to avoid (or only use for presentation)

  • DATENAME(weekday, @d) returns localized day names and is affected by session language and DATEFIRST, so don’t use it for logic/comparisons. [4]
  • Relying on default DATEFIRST is risky because it can vary by language/login settings (e.g., SET LANGUAGE can change the effective DATEFIRST). [6]

Sources: DATEPART docs (weekday depends on SET DATEFIRST) [1]; SET DATEFIRST behavior/scope [2]; @@DATEFIRST [3]; DATENAME dependencies [4]; UDFs can’t use SET statements like SET DATEFIRST [5]; language affecting weekday mapping via DATEFIRST [6]; anchor-date modulo approach [7].


SQL Server DATEDIFF day % 7 weekday calculation accuracy


Using DATEDIFF(day, <anchor>, <date>) % 7 is accurate/reliable for “day-of-week math” as long as you normalize to a DATE (midnight) and you pick an anchor whose weekday you know.

Why it works (and when it doesn’t)

  • DATEDIFF(day, start, end) counts day-boundary crossings, not “elapsed 24-hour blocks”, so DST shifts don’t break it if you’re counting by calendar date. This is why it’s commonly used for weekday/weekend logic rather than DATEDIFF(hour)/24. (Microsoft examples/discussion emphasize boundary-count behavior.) [1]
  • If your input is datetime/datetime2 with a time portion, you should do CONVERT(date, dt) first; otherwise “near-midnight” times can change the boundary count relative to what you intend (weekday by date).

A safe pattern (Monday=1 … Sunday=7), independent of @@DATEFIRST

If you want weekday numbers, don’t rely on DATEPART(weekday, …) alone because it depends on the session setting DATEFIRST (and you can’t SET DATEFIRST inside a function). Use an adjustment with @@DATEFIRST instead. [2]

Example (ISO-style Monday=1..Sunday=7):

SELECT ((DATEPART(weekday, `@d`) + @@DATEFIRST - 2) % 7) + 1;

This “normalizes away” DATEFIRST. [2]

If you specifically want the DATEDIFF % 7 approach

Pick a known Monday anchor and normalize:

-- 1900-01-01 was a Monday (commonly used anchor)
SELECT (DATEDIFF(day, '19000101', CONVERT(date, `@d`)) % 7) + 1;  -- Monday=1

Key gotchas

  • Don’t mix this with DATEDIFF(week, …) expectations: DATEDIFF(week, …) effectively treats Sunday as the week boundary regardless of DATEFIRST, so “week-based” math can surprise you. [3]
  • Be explicit about the mapping (what number corresponds to what weekday) by documenting the anchor and the +1 (or other) offset.

Sources:
[1] Microsoft Q&A discussion referencing DATEDIFF returning differences “expressed in the boundary set by datepart” (boundary-count semantics) https://learn.microsoft.com/en-us/answers/questions/1470414/how-to-get-decimal-hour-value-in-datediff-function
[2] Microsoft Q&A example showing DATEPART(dw, date) varies with @@DATEFIRST, and a normalization formula using @@DATEFIRST https://learn.microsoft.com/en-us/answers/questions/1556050/datefirst-in-sql-server-function
[3] Microsoft Q&A note that DATEDIFF week calculations need compensation because DATEDIFF considers Sunday as first day of week https://learn.microsoft.com/en-us/answers/questions/696693/add-a-hour-column-and-offset-columns-to-date-table


Make weekday extraction deterministic across SQL Server sessions.

DATENAME(weekday, ...) returns localized day names based on session language, so the same timestamp can produce different bucket keys across environments. This fragments hour-of-week aggregations and historical comparisons.

Suggested fix (locale-stable weekday mapping using day-count math)
 {% macro sqlserver__edr_hour_of_week_expression(date_expr) %}
     concat(
-        cast(datename(weekday, {{ date_expr }}) as {{ elementary.edr_type_string() }}),
+        cast(
+            case (datediff(day, cast('19000101' as date), cast({{ date_expr }} as date)) % 7)
+                when 0 then 'Monday'
+                when 1 then 'Tuesday'
+                when 2 then 'Wednesday'
+                when 3 then 'Thursday'
+                when 4 then 'Friday'
+                when 5 then 'Saturday'
+                when 6 then 'Sunday'
+            end as {{ elementary.edr_type_string() }}
+        ),
         cast(datepart(hour, {{ date_expr }}) as {{ elementary.edr_type_string() }})
     )
 {% endmacro %}

This approach uses a known reference date (1900-01-01 = Monday) and day-boundary counting to derive weekday independently of session language or SET DATEFIRST settings. Ensure {{ date_expr }} is cast to DATE (not datetime) to avoid boundary-count shifts from time portions near midnight.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cast(datename(weekday, {{ date_expr }}) as {{ elementary.edr_type_string() }}),
cast(datepart(hour, {{ date_expr }}) as {{ elementary.edr_type_string() }})
{% macro sqlserver__edr_hour_of_week_expression(date_expr) %}
concat(
cast(
case (datediff(day, cast('19000101' as date), cast({{ date_expr }} as date)) % 7)
when 0 then 'Monday'
when 1 then 'Tuesday'
when 2 then 'Wednesday'
when 3 then 'Thursday'
when 4 then 'Friday'
when 5 then 'Saturday'
when 6 then 'Sunday'
end as {{ elementary.edr_type_string() }}
),
cast(datepart(hour, {{ date_expr }}) as {{ elementary.edr_type_string() }})
)
{% endmacro %}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/utils/cross_db_utils/hour_of_week.sql` around lines 83 - 84, Replace
the locale-dependent datename(weekday, {{ date_expr }}) with a locale-stable day
calculation: compute weekday as ((DATEDIFF(day, '1900-01-01', CAST({{ date_expr
}} AS date)) % 7) + 1) and cast that to {{ elementary.edr_type_string() }} (keep
the existing cast for the hour using datepart(hour, {{ date_expr }})); ensure
you CAST({{ date_expr }} AS date) before DATEDIFF to avoid time-boundary shifts
and update the expression that currently uses datename(weekday, {{ date_expr }})
to this deterministic formula.

Comment on lines +161 to +167
"DATETIME": "DATETIME2",
"SMALLDATETIME": "DATETIME2",
"MONEY": "DECIMAL",
"SMALLMONEY": "DECIMAL",
"DOUBLE PRECISION": "FLOAT",
"REAL": "FLOAT",
"INTEGER": "INT",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

SQL Server T-SQL data type synonyms DATETIME SMALLDATETIME MONEY SMALLMONEY REAL FLOAT

💡 Result:

SQL Server T‑SQL data type synonyms for the types you listed:

  • DATETIMEno synonym (distinct type) [2]
  • SMALLDATETIMEno synonym (distinct type) [2]
  • MONEYno synonym (distinct type) [2]
  • SMALLMONEYno synonym (distinct type) [2]
  • REAL → synonym for FLOAT(24) (ISO synonym) [1]
  • FLOAT → defaults to FLOAT(53) when no precision is specified; DOUBLE PRECISION is a synonym for FLOAT(53) [1]

Sources: Microsoft Learn: FLOAT/REAL (Transact‑SQL) [1], Data types (Transact‑SQL) [2].


🌐 Web query:

SQL Server INFORMATION_SCHEMA.COLUMNS.DATA_TYPE return values DATETIME SMALLDATETIME MONEY DECIMAL REAL

💡 Result:

In SQL Server, INFORMATION_SCHEMA.COLUMNS.DATA_TYPE returns the base type name (no (p,s) / (n) arguments). For the types you listed, the returned values are: datetime, smalldatetime, money, decimal, real. [1]

If you need the parameters, read them from the other INFORMATION_SCHEMA.COLUMNS fields (e.g., NUMERIC_PRECISION/NUMERIC_SCALE for decimal, DATETIME_PRECISION for date/time types). [1]

Sources: [1] Stack Overflow discussion of INFORMATION_SCHEMA.COLUMNS.DATA_TYPE behavior; [2] Database.Guide examples listing SQL Server type names (e.g., datetime, smalldatetime, money, decimal, real).


🏁 Script executed:

# Examine the macro file to understand context and usage
find . -type f -name "get_normalized_data_type.sql" | head -5

Repository: elementary-data/dbt-data-reliability

Length of output: 134


🏁 Script executed:

cat -n ./macros/utils/data_types/get_normalized_data_type.sql | head -180

Repository: elementary-data/dbt-data-reliability

Length of output: 7445


Remove non-synonym type mappings; preserve true SQL Server type synonyms.

Lines 161–164 map distinct SQL Server types (DATETIME, SMALLDATETIME, MONEY, SMALLMONEY) to different canonical types, which masks real schema mismatches during comparisons. Only NVARCHAR, NCHAR, NTEXT, ROWVERSION, DOUBLE PRECISION, and REAL are true SQL Server synonyms and should remain. REAL is an ISO standard synonym for FLOAT(24) and must not be removed.

Proposed change
     {% set exact_data_type_to_data_type_returned_by_the_info_schema = {
         "NVARCHAR": "VARCHAR",
         "NCHAR": "CHAR",
         "NTEXT": "TEXT",
         "ROWVERSION": "TIMESTAMP",
-        "DATETIME": "DATETIME2",
-        "SMALLDATETIME": "DATETIME2",
-        "MONEY": "DECIMAL",
-        "SMALLMONEY": "DECIMAL",
         "DOUBLE PRECISION": "FLOAT",
         "REAL": "FLOAT",
         "INTEGER": "INT",
         "BOOLEAN": "BIT",
     } %}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/utils/data_types/get_normalized_data_type.sql` around lines 161 - 167,
The mapping in the get_normalized_data_type macro currently normalizes distinct
SQL Server types (e.g., "DATETIME", "SMALLDATETIME", "MONEY", "SMALLMONEY") to
other canonical types, which hides real schema differences; remove the mapping
entries for "DATETIME", "SMALLDATETIME", "MONEY", and "SMALLMONEY" from the
normalization dictionary so they remain their native types during comparisons,
but keep true SQL Server synonym mappings such as "NVARCHAR", "NCHAR", "NTEXT",
"ROWVERSION", "DOUBLE PRECISION", and "REAL" in the same mapping structure used
by the get_normalized_data_type macro.

Comment on lines +32 to +39
select top 1 try_cast({{ timestamp_column }} as {{ elementary.edr_type_timestamp() }}) as timestamp_column
from {{ table_relation }}
where {{ timestamp_column }} is not null
{%- endset %}

{%- set result = elementary.result_value(query) %}
{%- if result is not none %} {{ return(true) }} {%- endif %}
{{ return(false) }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "try_cast_column_to_timestamp.sql" -type f

Repository: elementary-data/dbt-data-reliability

Length of output: 138


🏁 Script executed:

git ls-files | grep -E "(try_cast|timestamp)" | head -20

Repository: elementary-data/dbt-data-reliability

Length of output: 444


🏁 Script executed:

cat -n ./macros/utils/data_types/try_cast_column_to_timestamp.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 1884


🏁 Script executed:

cat -n ./macros/utils/data_types/is_column_timestamp.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 646


🏁 Script executed:

rg -A 5 "def result_value|macro result_value" --type jinja --type py

Repository: elementary-data/dbt-data-reliability

Length of output: 62


🏁 Script executed:

fd "test*" | grep -i "try_cast\|timestamp" | head -10

Repository: elementary-data/dbt-data-reliability

Length of output: 62


🏁 Script executed:

rg "result_value" --type jinja -B 2 -A 5 | head -50

Repository: elementary-data/dbt-data-reliability

Length of output: 62


🏁 Script executed:

rg "result_value" --type py -B 2 -A 5 | head -50

Repository: elementary-data/dbt-data-reliability

Length of output: 62


🏁 Script executed:

find . -name "*.py" -o -name "*.sql" | xargs grep -l "def result_value" 2>/dev/null

Repository: elementary-data/dbt-data-reliability

Length of output: 64


🏁 Script executed:

rg "result_value" -A 3 | head -80

Repository: elementary-data/dbt-data-reliability

Length of output: 3115


🏁 Script executed:

find . -path ./venv -prune -o -name "*.jinja2" -o -name "*.sql" | xargs grep -l "result_value" 2>/dev/null | head -10

Repository: elementary-data/dbt-data-reliability

Length of output: 488


🏁 Script executed:

cat -n ./macros/utils/run_queries/result_value.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 332


🏁 Script executed:

cat -n ./macros/utils/data_types/try_cast_column_to_timestamp.sql | grep -A 10 "default__"

Repository: elementary-data/dbt-data-reliability

Length of output: 753


🏁 Script executed:

rg "try_cast|safe_cast" ./macros/utils/data_types/try_cast_column_to_timestamp.sql -B 2 -A 2

Repository: elementary-data/dbt-data-reliability

Length of output: 1227


Filter WHERE clause to ensure only successfully castable rows are evaluated.

The SQL Server implementation uses TOP 1 try_cast(...) without filtering on cast success. Since try_cast() returns NULL on failure, a non-deterministic row selection can pick an un-castable row, causing result_value() to return NULL and incorrectly report false even when castable rows exist in the table.

Add and try_cast({{ timestamp_column }} as {{ elementary.edr_type_timestamp() }}) is not null to the WHERE clause to guarantee only rows with successful casts are returned.

Proposed fix
 {% macro sqlserver__try_cast_column_to_timestamp(table_relation, timestamp_column) %}
     {%- set query %}
-        select top 1 try_cast({{ timestamp_column }} as {{ elementary.edr_type_timestamp() }}) as timestamp_column
+        select top 1 1 as has_valid_timestamp
         from {{ table_relation }}
         where {{ timestamp_column }} is not null
+          and try_cast({{ timestamp_column }} as {{ elementary.edr_type_timestamp() }}) is not null
     {%- endset %}

     {%- set result = elementary.result_value(query) %}
     {%- if result is not none %} {{ return(true) }} {%- endif %}
     {{ return(false) }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
select top 1 try_cast({{ timestamp_column }} as {{ elementary.edr_type_timestamp() }}) as timestamp_column
from {{ table_relation }}
where {{ timestamp_column }} is not null
{%- endset %}
{%- set result = elementary.result_value(query) %}
{%- if result is not none %} {{ return(true) }} {%- endif %}
{{ return(false) }}
{% macro sqlserver__try_cast_column_to_timestamp(table_relation, timestamp_column) %}
{%- set query %}
select top 1 1 as has_valid_timestamp
from {{ table_relation }}
where {{ timestamp_column }} is not null
and try_cast({{ timestamp_column }} as {{ elementary.edr_type_timestamp() }}) is not null
{%- endset %}
{%- set result = elementary.result_value(query) %}
{%- if result is not none %} {{ return(true) }} {%- endif %}
{{ return(false) }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/utils/data_types/try_cast_column_to_timestamp.sql` around lines 32 -
39, The WHERE clause in try_cast_column_to_timestamp.sql can return a TOP 1 row
that fails try_cast() because try_cast returns NULL on failure; update the query
that sets "query" (which uses {{ timestamp_column }}, {{ table_relation }}, and
{{ elementary.edr_type_timestamp() }}) to add "and try_cast({{ timestamp_column
}} as {{ elementary.edr_type_timestamp() }}) is not null" so only successfully
castable rows are considered before calling elementary.result_value(); keep the
rest of the logic (result handling with result_value() and
return(true)/return(false)) unchanged.

…dev, ORDER BY, GROUP BY, buckets_cte)

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
macros/edr/data_monitoring/monitors/column_numeric_monitors.sql (1)

69-79: ⚠️ Potential issue | 🟠 Major

Add sqlserver__variance macro — SQL Server uses VAR, not variance.

The variance macro lacks a SQL Server-specific implementation. T-SQL does not have a variance() function; it uses VAR() for sample variance or VARP() for population variance. This will cause a runtime error when the variance monitor runs against a SQL Server or Azure Synapse Fabric target.

Proposed fix
 {% macro clickhouse__variance(column_name) -%}
     varSamp(cast({{ column_name }} as Nullable({{ elementary.edr_type_float() }})))
 {%- endmacro %}
+
+{# T-SQL uses VAR instead of variance #}
+{% macro sqlserver__variance(column_name) -%}
+    var(cast({{ column_name }} as {{ elementary.edr_type_float() }}))
+{%- endmacro %}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/data_monitoring/monitors/column_numeric_monitors.sql` around lines
69 - 79, The variance macro needs a SQL Server-specific implementation: add a
macro named sqlserver__variance that returns VAR(cast({{ column_name }} as {{
elementary.edr_type_float() }})) (use VAR for sample variance in T-SQL),
mirroring how default__variance and clickhouse__variance cast the column to the
float edr type; ensure the new macro follows the same macro signature {% macro
sqlserver__variance(column_name) -%} so adapter.dispatch("variance",
"sqlserver") will resolve correctly.
🧹 Nitpick comments (1)
macros/edr/tests/test_execution_sla.sql (1)

292-300: The when 1 = 1 pattern works but could be clearer.

The conditional logic is correct—when the deadline hasn't passed, the when 1 = 1 clause acts as a catch-all to return false before reaching else. However, this pattern may confuse future readers.

Consider restructuring for clarity:

💡 Alternative approach using Jinja conditional
                 case
                     when sla_status = 'MET_SLA'
                     then {{ elementary.edr_boolean_literal(false) }}
-                    {# If deadline hasn't passed, don't fail yet #}
-                    {% if not deadline_passed %}
-                        when 1 = 1 then {{ elementary.edr_boolean_literal(false) }}
-                    {% endif %}
-                    else {{ elementary.edr_boolean_literal(true) }}
+                    else {{ elementary.edr_boolean_literal(deadline_passed) }}
                 end as is_failure,

This directly expresses the intent: fail only if the deadline has passed and the SLA was not met.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_execution_sla.sql` around lines 292 - 300, The current
"when 1 = 1" catch-all is confusing; remove that pattern and instead use a Jinja
conditional to inject a clear when-clause only when deadline_passed is false so
intent is explicit. Update the CASE that produces is_failure (referencing
sla_status, is_failure, and elementary.edr_boolean_literal) so that you keep
"when sla_status = 'MET_SLA' then false", and then conditionally add a
descriptive when-clause like "when deadline has not passed then false" via {% if
not deadline_passed %} ... {% endif %} rather than using "when 1 = 1".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@macros/edr/system/system_utils/empty_table.sql`:
- Around line 242-255: Add a fabric-specific override macro similar to
sqlserver__dummy_values to avoid falling back to default__dummy_values (which
uses the invalid T-SQL literal "True"); create a macro named
fabric__dummy_values() that returns the same dummy_values mapping used in
sqlserver__dummy_values (notably setting "boolean": 1 and the same
string/int/float/timestamp values) so dbt's dummy_values() dispatch will pick
the Fabric-safe implementation instead of default__dummy_values.

---

Outside diff comments:
In `@macros/edr/data_monitoring/monitors/column_numeric_monitors.sql`:
- Around line 69-79: The variance macro needs a SQL Server-specific
implementation: add a macro named sqlserver__variance that returns VAR(cast({{
column_name }} as {{ elementary.edr_type_float() }})) (use VAR for sample
variance in T-SQL), mirroring how default__variance and clickhouse__variance
cast the column to the float edr type; ensure the new macro follows the same
macro signature {% macro sqlserver__variance(column_name) -%} so
adapter.dispatch("variance", "sqlserver") will resolve correctly.

---

Nitpick comments:
In `@macros/edr/tests/test_execution_sla.sql`:
- Around line 292-300: The current "when 1 = 1" catch-all is confusing; remove
that pattern and instead use a Jinja conditional to inject a clear when-clause
only when deadline_passed is false so intent is explicit. Update the CASE that
produces is_failure (referencing sla_status, is_failure, and
elementary.edr_boolean_literal) so that you keep "when sla_status = 'MET_SLA'
then false", and then conditionally add a descriptive when-clause like "when
deadline has not passed then false" via {% if not deadline_passed %} ... {%
endif %} rather than using "when 1 = 1".

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3452a2df-29ac-4ce9-87e1-dafa143b3dfa

📥 Commits

Reviewing files that changed from the base of the PR and between a0a7285 and 319dee6.

📒 Files selected for processing (15)
  • macros/edr/data_monitoring/monitors/column_numeric_monitors.sql
  • macros/edr/system/system_utils/buckets_cte.sql
  • macros/edr/system/system_utils/empty_table.sql
  • macros/edr/system/system_utils/get_config_var.sql
  • macros/edr/tests/test_execution_sla.sql
  • models/edr/alerts/alerts_anomaly_detection.sql
  • models/edr/alerts/alerts_dbt_models.sql
  • models/edr/alerts/alerts_dbt_source_freshness.sql
  • models/edr/alerts/alerts_dbt_tests.sql
  • models/edr/alerts/alerts_schema_changes.sql
  • models/edr/data_monitoring/anomaly_detection/anomaly_threshold_sensitivity.sql
  • models/edr/data_monitoring/anomaly_detection/metrics_anomaly_score.sql
  • models/edr/dbt_artifacts/dbt_artifacts_hashes.sql
  • models/edr/run_results/model_run_results.sql
  • models/edr/system/monitors_runs.sql
💤 Files with no reviewable changes (1)
  • models/edr/dbt_artifacts/dbt_artifacts_hashes.sql

…cursion

- Replace dbt_utils.group_by() with explicit column names in metrics_anomaly_score (T-SQL treats positional refs as literal constants)
- Fix make_temp_relation -> edr_make_temp_relation in test.sql and failed_row_count.sql
- Add temp table cleanup in failed_row_count.sql
- Add sqlserver__variance macro (T-SQL uses VAR not variance)
- Fix maxrecursion 10000 -> 0 for unlimited recursion in buckets_cte

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
macros/edr/system/system_utils/buckets_cte.sql (1)

234-256: ⚠️ Potential issue | 🔴 Critical

Recursive CTE remains incompatible with Fabric Warehouse runtime.

Line 241 introduces recursion; Fabric Warehouse currently documents recursive queries as unsupported. If this macro is part of Fabric support, this path is a runtime blocker.

What is the current Microsoft Fabric Warehouse support status for recursive CTEs/recursive queries in T-SQL? Please cite the latest T-SQL surface area docs.

Source referenced:

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/system/system_utils/buckets_cte.sql` around lines 234 - 256, The
macro sqlserver__complete_buckets_cte uses a recursive CTE (timestamps) and
option (maxrecursion 0), which is incompatible with Fabric Warehouse; replace
the recursion with a non-recursive generator (e.g., use a numbers/tally table, a
system view that provides sequential integers, or a calendar table) to produce
edr_bucket_start/edr_bucket_end rows between min_bucket_start_expr and
max_bucket_end_expr stepping by time_bucket, and remove the option (maxrecursion
0); update the macro sqlserver__complete_buckets_cte to build the same sequence
using that iterative approach so Fabric-compatible T-SQL is emitted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@macros/edr/data_monitoring/monitors/column_numeric_monitors.sql`:
- Around line 64-67: The fabric adapter needs explicit implementations for
standard deviation and variance so adapter.dispatch finds them; add macro
definitions named fabric__standard_deviation and fabric__variance that either
call the existing sqlserver__standard_deviation and sqlserver__variance wrappers
or directly return T-SQL expressions using stdev(...) and var(cast(... as {{
elementary.edr_type_float() }})); ensure the new macros accept the same
parameter (column_name) and use the same casting helper
(elementary.edr_type_float) so Fabric resolves to the T-SQL behavior instead of
falling back to the default PostgreSQL-style stddev_pop/variance variants.

In `@macros/edr/materializations/test/failed_row_count.sql`:
- Around line 62-73: The temp table is dropped during macro rendering inside
sqlserver__get_failed_row_count_calc_query (the inline do run_query("drop table
if exists " ~ tmp_relation) runs before the returned SELECT is executed), so
move the cleanup to after the query result is fetched: have
sqlserver__get_failed_row_count_calc_query only create the temp table and return
the SELECT (without dropping), then perform run_query("drop table if exists " ~
tmp_relation) from the caller (get_failed_row_count) after calling
elementary.result_value(failed_row_count_query); alternatively change the macro
to return both the query string and a cleanup callable so the caller can execute
cleanup post-result.

---

Duplicate comments:
In `@macros/edr/system/system_utils/buckets_cte.sql`:
- Around line 234-256: The macro sqlserver__complete_buckets_cte uses a
recursive CTE (timestamps) and option (maxrecursion 0), which is incompatible
with Fabric Warehouse; replace the recursion with a non-recursive generator
(e.g., use a numbers/tally table, a system view that provides sequential
integers, or a calendar table) to produce edr_bucket_start/edr_bucket_end rows
between min_bucket_start_expr and max_bucket_end_expr stepping by time_bucket,
and remove the option (maxrecursion 0); update the macro
sqlserver__complete_buckets_cte to build the same sequence using that iterative
approach so Fabric-compatible T-SQL is emitted.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 144d403f-e137-44d4-b5ba-429e01db83fb

📥 Commits

Reviewing files that changed from the base of the PR and between 319dee6 and ba6bece.

📒 Files selected for processing (5)
  • macros/edr/data_monitoring/monitors/column_numeric_monitors.sql
  • macros/edr/materializations/test/failed_row_count.sql
  • macros/edr/materializations/test/test.sql
  • macros/edr/system/system_utils/buckets_cte.sql
  • models/edr/data_monitoring/anomaly_detection/metrics_anomaly_score.sql

devin-ai-integration bot and others added 2 commits March 4, 2026 17:05
- Renamed all sqlserver__ macros to fabric__ (primary for dbt-fabric production use)
- Added sqlserver__ wrapper macros that delegate to fabric__ (for dbt-sqlserver CI testing)
- Fixed temp table drop timing bug in failed_row_count.sql
- Added fabric__ variants for all T-SQL utility macros across 24 files

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…patch target

dbt-sqlserver declares dependencies=["fabric"] in its plugin config,
creating dispatch chain: sqlserver__ -> fabric__ -> default__.
This means fabric__ macros are automatically found by dbt-sqlserver
without explicit sqlserver__ wrappers (same pattern as redshift/postgres).

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (5)
macros/utils/data_types/try_cast_column_to_timestamp.sql (1)

30-41: ⚠️ Potential issue | 🟠 Major

Filter WHERE clause to ensure only successfully castable rows are evaluated.

The try_cast() function returns NULL on cast failure. Without filtering for successful casts, TOP 1 may non-deterministically select an un-castable row, causing result_value() to return NULL and incorrectly report false even when valid timestamp rows exist.

Add and try_cast({{ timestamp_column }} as {{ elementary.edr_type_timestamp() }}) is not null to the WHERE clause.

Proposed fix
 {% macro fabric__try_cast_column_to_timestamp(table_relation, timestamp_column) %}
     {%- set query %}
-        select top 1 try_cast({{ timestamp_column }} as {{ elementary.edr_type_timestamp() }}) as timestamp_column
+        select top 1 1 as has_valid_timestamp
         from {{ table_relation }}
         where {{ timestamp_column }} is not null
+          and try_cast({{ timestamp_column }} as {{ elementary.edr_type_timestamp() }}) is not null
     {%- endset %}

     {%- set result = elementary.result_value(query) %}
     {%- if result is not none %} {{ return(true) }} {%- endif %}
     {{ return(false) }}

 {% endmacro %}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/utils/data_types/try_cast_column_to_timestamp.sql` around lines 30 -
41, The WHERE clause in macro fabric__try_cast_column_to_timestamp currently
only checks that {{ timestamp_column }} is not null, so TOP 1 can pick a row
where try_cast(...) is null and cause elementary.result_value(query) to return
null; update the WHERE to also require try_cast({{ timestamp_column }} as {{
elementary.edr_type_timestamp() }}) is not null so only successfully castable
rows are considered, keeping the rest of the query (select top 1 ... as
timestamp_column from {{ table_relation }}) and the use of
elementary.result_value unchanged.
macros/edr/system/system_utils/buckets_cte.sql (1)

241-253: ⚠️ Potential issue | 🔴 Critical

Recursive CTE is still incompatible with Fabric runtime.

Line 241 introduces a recursive CTE, but Fabric Warehouse currently lists recursive queries as unsupported, so this macro can still fail on its target adapter despite fixing the alias and recursion hint. Please switch this macro to a non-recursive bucket generator (e.g., tally/numbers via VALUES cross joins) for fabric__.
Source: https://learn.microsoft.com/en-us/fabric/data-warehouse/tsql-surface-area

In the latest Microsoft Fabric Data Warehouse T-SQL surface area documentation, are recursive queries/recursive CTEs supported in Warehouse?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/system/system_utils/buckets_cte.sql` around lines 241 - 253, The
recursive CTE named "timestamps" (using {{ min_bucket_start_expr }}, {{
bucket_end_expr }}, {{ max_bucket_end_expr }}) is incompatible with the Fabric
runtime; replace the recursive generator with a non-recursive numbers/tally
approach for the fabric__ adapter: detect fabric__ and produce a bucket sequence
by cross joining a VALUES-derived numbers table (or repeated VALUES blocks)
multiplied to cover the max range, then compute edr_bucket_start and
edr_bucket_end from those generated offsets instead of the recursive CTE so the
macro avoids recursion and preserves the existing {{ ... }} expressions.
macros/utils/cross_db_utils/incremental_strategy.sql (1)

23-25: ⚠️ Potential issue | 🟠 Major

Line 24 hardcodes Fabric to merge, but the shared strategy contract should remain adapter-default (none) for non-Athena paths.

Please return none here unless Fabric has a documented exception with tests proving why it must diverge.

Suggested change
 {%- macro fabric__get_default_incremental_strategy() %}
-    {% do return("merge") %}
+    {% do return(none) %}
 {% endmacro %}

Based on learnings: “implement that incremental_strategy should return none (to use the adapter default) for all adapters except Athena, where get_append_only_incremental_strategy() is used.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/utils/cross_db_utils/incremental_strategy.sql` around lines 23 - 25,
The macro fabric__get_default_incremental_strategy currently hardcodes "merge";
change it to return the adapter-default "none" for all adapters except Athena:
inside fabric__get_default_incremental_strategy() check the current adapter
(e.g., inspect target.adapter or target.type) and if it's Athena call/return
get_append_only_incremental_strategy() (or the Athena-specific strategy helper),
otherwise return "none" so non-Athena paths use the adapter default.
macros/utils/cross_db_utils/to_char.sql (1)

27-32: ⚠️ Potential issue | 🟠 Major

Line 29 injects format directly into CONVERT, which can break Fabric SQL generation.

fabric__edr_to_char currently renders CONVERT(varchar, <expr>, {{ format }}) with raw SQL at Line 29. If callers pass cross-db format patterns, this can fail at runtime or diverge from the macro contract used by other adapters. Please normalize this path (e.g., explicitly support one format contract and enforce it in the macro).

In SQL Server / Microsoft Fabric Warehouse T-SQL, what are valid values for the third argument of CONVERT(varchar, datetime_expr, style)? Does passing a pattern string like YYYY-MM-DD work, or must it be a style code?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/utils/cross_db_utils/to_char.sql` around lines 27 - 32, The macro
fabric__edr_to_char currently injects the caller-supplied format directly into
CONVERT which can break Fabric SQL; update fabric__edr_to_char to enforce a
single format contract by accepting only integer T-SQL style codes (or mapping a
small set of common pattern strings to their style codes) and render the third
CONVERT argument as an integer literal; specifically, modify fabric__edr_to_char
to: validate if the provided format is an integer literal and output it
unquoted, else if it's a known pattern (e.g., 'YYYY-MM-DD', 'YYYY-MM-DD
HH24:MI:SS') map it to the corresponding style code and use that code, otherwise
raise a clear jinja/dbt error so callers cannot inject arbitrary pattern strings
into CONVERT.
macros/utils/cross_db_utils/hour_of_week.sql (1)

81-85: ⚠️ Potential issue | 🟠 Major

Line 83 uses locale-dependent weekday names, making hour-of-week keys non-deterministic.

datename(weekday, ...) can return different strings across sessions/environments. For stable grouping keys, switch to a deterministic weekday mapping (independent of session language) before concatenating with the hour.

Suggested deterministic replacement
 {% macro fabric__edr_hour_of_week_expression(date_expr) %}
     concat(
-        cast(datename(weekday, {{ date_expr }}) as {{ elementary.edr_type_string() }}),
+        cast(
+            case (datediff(day, cast('19000101' as date), cast({{ date_expr }} as date)) % 7)
+                when 0 then 'Monday'
+                when 1 then 'Tuesday'
+                when 2 then 'Wednesday'
+                when 3 then 'Thursday'
+                when 4 then 'Friday'
+                when 5 then 'Saturday'
+                when 6 then 'Sunday'
+            end as {{ elementary.edr_type_string() }}
+        ),
         cast(datepart(hour, {{ date_expr }}) as {{ elementary.edr_type_string() }})
     )
 {% endmacro %}
In SQL Server / Microsoft Fabric Warehouse, is DATENAME(weekday, date_expr) dependent on session language (SET LANGUAGE / @@LANGUAGE)? If yes, what deterministic alternatives are recommended for stable weekday keys?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/utils/cross_db_utils/hour_of_week.sql` around lines 81 - 85, The macro
fabric__edr_hour_of_week_expression uses DATENAME(weekday, {{ date_expr }})
which returns locale-dependent names; replace that with a deterministic numeric
weekday computed from a fixed epoch (e.g. DATEDIFF(day, 0, {{ date_expr }}) % 7)
so the key is stable across sessions, e.g. cast((DATEDIFF(day, 0, {{ date_expr
}}) % 7) as {{ elementary.edr_type_string() }}) concatenated with the existing
cast(datepart(hour, {{ date_expr }}) as {{ elementary.edr_type_string() }}).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@macros/edr/system/system_utils/buckets_cte.sql`:
- Around line 241-253: The recursive CTE named "timestamps" (using {{
min_bucket_start_expr }}, {{ bucket_end_expr }}, {{ max_bucket_end_expr }}) is
incompatible with the Fabric runtime; replace the recursive generator with a
non-recursive numbers/tally approach for the fabric__ adapter: detect fabric__
and produce a bucket sequence by cross joining a VALUES-derived numbers table
(or repeated VALUES blocks) multiplied to cover the max range, then compute
edr_bucket_start and edr_bucket_end from those generated offsets instead of the
recursive CTE so the macro avoids recursion and preserves the existing {{ ... }}
expressions.

In `@macros/utils/cross_db_utils/hour_of_week.sql`:
- Around line 81-85: The macro fabric__edr_hour_of_week_expression uses
DATENAME(weekday, {{ date_expr }}) which returns locale-dependent names; replace
that with a deterministic numeric weekday computed from a fixed epoch (e.g.
DATEDIFF(day, 0, {{ date_expr }}) % 7) so the key is stable across sessions,
e.g. cast((DATEDIFF(day, 0, {{ date_expr }}) % 7) as {{
elementary.edr_type_string() }}) concatenated with the existing
cast(datepart(hour, {{ date_expr }}) as {{ elementary.edr_type_string() }}).

In `@macros/utils/cross_db_utils/incremental_strategy.sql`:
- Around line 23-25: The macro fabric__get_default_incremental_strategy
currently hardcodes "merge"; change it to return the adapter-default "none" for
all adapters except Athena: inside fabric__get_default_incremental_strategy()
check the current adapter (e.g., inspect target.adapter or target.type) and if
it's Athena call/return get_append_only_incremental_strategy() (or the
Athena-specific strategy helper), otherwise return "none" so non-Athena paths
use the adapter default.

In `@macros/utils/cross_db_utils/to_char.sql`:
- Around line 27-32: The macro fabric__edr_to_char currently injects the
caller-supplied format directly into CONVERT which can break Fabric SQL; update
fabric__edr_to_char to enforce a single format contract by accepting only
integer T-SQL style codes (or mapping a small set of common pattern strings to
their style codes) and render the third CONVERT argument as an integer literal;
specifically, modify fabric__edr_to_char to: validate if the provided format is
an integer literal and output it unquoted, else if it's a known pattern (e.g.,
'YYYY-MM-DD', 'YYYY-MM-DD HH24:MI:SS') map it to the corresponding style code
and use that code, otherwise raise a clear jinja/dbt error so callers cannot
inject arbitrary pattern strings into CONVERT.

In `@macros/utils/data_types/try_cast_column_to_timestamp.sql`:
- Around line 30-41: The WHERE clause in macro
fabric__try_cast_column_to_timestamp currently only checks that {{
timestamp_column }} is not null, so TOP 1 can pick a row where try_cast(...) is
null and cause elementary.result_value(query) to return null; update the WHERE
to also require try_cast({{ timestamp_column }} as {{
elementary.edr_type_timestamp() }}) is not null so only successfully castable
rows are considered, keeping the rest of the query (select top 1 ... as
timestamp_column from {{ table_relation }}) and the use of
elementary.result_value unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3060fa55-2f7a-4ceb-823a-810bfc841aa9

📥 Commits

Reviewing files that changed from the base of the PR and between ba6bece and d489172.

📒 Files selected for processing (24)
  • macros/edr/data_monitoring/monitors/column_numeric_monitors.sql
  • macros/edr/materializations/test/failed_row_count.sql
  • macros/edr/materializations/test/test.sql
  • macros/edr/system/system_utils/buckets_cte.sql
  • macros/edr/system/system_utils/empty_table.sql
  • macros/edr/system/system_utils/get_config_var.sql
  • macros/utils/cross_db_utils/current_timestamp.sql
  • macros/utils/cross_db_utils/date_trunc.sql
  • macros/utils/cross_db_utils/dateadd.sql
  • macros/utils/cross_db_utils/datediff.sql
  • macros/utils/cross_db_utils/day_of_week.sql
  • macros/utils/cross_db_utils/generate_elementary_profile_args.sql
  • macros/utils/cross_db_utils/hour_of_day.sql
  • macros/utils/cross_db_utils/hour_of_week.sql
  • macros/utils/cross_db_utils/incremental_strategy.sql
  • macros/utils/cross_db_utils/target_database.sql
  • macros/utils/cross_db_utils/time_trunc.sql
  • macros/utils/cross_db_utils/timeadd.sql
  • macros/utils/cross_db_utils/to_char.sql
  • macros/utils/data_types/data_type.sql
  • macros/utils/data_types/data_type_list.sql
  • macros/utils/data_types/get_normalized_data_type.sql
  • macros/utils/data_types/try_cast_column_to_timestamp.sql
  • macros/utils/table_operations/insert_rows.sql
🚧 Files skipped from review as they are similar to previous changes (7)
  • macros/edr/data_monitoring/monitors/column_numeric_monitors.sql
  • macros/edr/system/system_utils/get_config_var.sql
  • macros/edr/system/system_utils/empty_table.sql
  • macros/utils/cross_db_utils/dateadd.sql
  • macros/utils/data_types/get_normalized_data_type.sql
  • macros/utils/cross_db_utils/generate_elementary_profile_args.sql
  • macros/utils/cross_db_utils/date_trunc.sql

devin-ai-integration bot and others added 6 commits March 4, 2026 17:24
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
- Add fabric__edr_type_long_string returning varchar(max) to avoid truncation
- Add fabric__event_freshness_metric_query with explicit GROUP BY columns
- Fix LIMIT→TOP in dimension_anomalies test query for T-SQL
- Fix boolean literals in column_boolean_monitors.sql
- Fix INSERT INTO with CTE pattern in insert_as_select.sql
- Fix CTE-in-subquery and CTE-in-IN-clause in get_anomaly_query.sql
- All 18 anomalyless integration tests pass locally against SQL Server

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…erflow in CI

SQL Server limits identifiers to 128 chars. When dbt-sqlserver creates
relations, the schema name (~60 chars in CI) can be prefixed onto the
table identifier. Reducing from 80 to 60 ensures the combined name
stays within the 128-char limit even with long CI schema names.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…MIT→TOP for Fabric/T-SQL

- Add fabric__create_temp_table using view-based approach to avoid CTE-in-subquery errors
- Force Fabric/sqlserver targets to use temp table path in test materialization
- Add fabric__edr_make_temp_relation to handle dict model nodes
- Add 'string' to fabric data_type_list for proper normalization
- Add STRING→VARCHAR mapping in fabric__get_normalized_data_type
- Convert hardcoded LIMIT 1 to TOP 1 in integration test SQL templates
- Apply pre-commit formatting (black, sqlfmt)

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…n INSERT VALUES

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
devin-ai-integration bot and others added 6 commits March 4, 2026 22:53
…edr_boolean_literal

- get_columns_snapshot_query: replace inline target.type checks with edr_boolean_literal()
- get_columns_changes_query: extract to dispatched macro (fabric__/default__get_column_changes_from_baseline_cur), use edr_boolean_literal, restore dbt_utils.group_by(9) to fix test_schema_changes failures
- column_monitoring_query: extract GROUP BY to dispatched column_monitoring_group_by macro
- insert_as_select: convert to dispatched macro (fabric__/default__insert_as_select)
- get_anomaly_query: convert both macros to dispatched (fabric__/default__ variants)
- create_temp_table: simplify DROP VIEW IF EXISTS syntax
- Rename CI target from 'fabric' to 'sqlserver' (honest about what we test)
- Add sqlserver__generate_elementary_profile_args (separate from fabric__)

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
… compatibility

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…nges (fixes Redshift)

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…ting

- Cast datetime2 to explicit precision (6) to avoid unsupported datetime2(7)
- Cast sysutcdatetime()/getdate() to datetime2(6) in current_timestamp macros
- Rewrite fabric__insert_as_select to use temp view pattern (Fabric rejects INSERT...EXEC)
- Replace CONCAT() with + operator in full_names macros to avoid nvarchar return type
- Cast PARSENAME results to varchar in fabric__full_name_split
- Cast datename() result to varchar in fabric__edr_day_of_week_expression
- Add fabric__generate_surrogate_key using + operator instead of dbt.concat()
- Add fabric__schema_change_description_column with varchar cast
- Add fabric__edr_list_schemas and fabric__edr_schema_exists using sys.schemas

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add fabric target to profiles.yml.j2 with ServicePrincipal auth
- Add fabric to cloud targets matrix in test-all-warehouses.yml
- Install ODBC driver for fabric target in test-warehouse.yml
- Include fabric in post-test schema cleanup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
devin-ai-integration bot and others added 6 commits March 7, 2026 14:03
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…ric__ counterparts

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…s instead of SQL text

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…turning non-string values

Fixes 'unsupported operand type(s) for -: str and int' in fabric__edr_make_temp_relation
when called via sqlserver__edr_make_temp_relation. The {{ }} pattern stringifies return
values (e.g. integers, Relations), breaking downstream arithmetic and method calls.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
dbt-sqlserver declares dependencies=['fabric'], so the dispatch chain
sqlserver__ -> fabric__ -> default__ handles everything automatically.

Only sqlserver__ macros with genuinely different implementations are kept:
- sqlserver__target_database (needed by CLI)
- sqlserver__create_temp_table (T-SQL CTE workaround)
- sqlserver__generate_elementary_profile_args (different connection params)

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant