You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
Hello, we are trying to use dbt cloud with Redshfit Spectrum.
When we first build our incremental models with dbt run everything works fine, however when we run it again without the --full-refresh flag, we get the error Predicates not yet supported in nested s3 scan. This same error happens if we run the compiled incremental query on redshift too, suggesting i's not supported by Redshift Spectrum (see Redshift docs - Nested-data-restrictions).
Is there another way to make this work?
Expected Behavior
We expected to update the table with the latest modifications by date as summarised in the dbt docs for incremental models
Steps To Reproduce
In dbt cloud, using as an example the following model:
{{ config(
materialized='incremental'
)
}}
with A as (
SELECTu.entity.id as id,
CAST(u.last_updatedastimestamp) as last_updated,
u.entity.colA,
u.entity.colB,
u.entity.colC,
FROM {{ source('test', 'test_table') }} u
)
SELECT*FROM A
{% if is_incremental() %}
where last_updated > (selectmax(last_updated) from {{ this }})
{% endif %}
Relevant log output
Predicates not yet supported in nested s3 scan.
Environment
dbt cloud
Additional Context
No response
The text was updated successfully, but these errors were encountered:
github-actionsbot
changed the title
[Bug] Incremental models updates failing due to nested data limitation in Redshift Spectrum
[CT-1704] [Bug] Incremental models updates failing due to nested data limitation in Redshift Spectrum
Dec 21, 2022
I agree with your assessment that this error message is specific to Redshift Spectrum and not something that dbt is doing.
Some ideas, some or all of which may be impractical in your specific use case:
Collaborate with the producer of the object in S3 to add last_updated as a top-level attribute (rather than nested)
Add a transformation step that writes the entire object to a regular table with unnested attributes first, then do a second step that performs the incremental logic
I haven't used Redshift Spectrum personally, so there might be some other approaches that I can't think of!
I'd suggest asking your question in dbt Discourse -- it's best place to get help & advice from the community of dbt practitioners:
If you go with idea 2), it sounds like you wouldn't be the first to resort to a similar strategy [1]:
Because of the query execution inefficiency for nested fields, ETL pipelines were set up at LinkedIn to copy the nested columns as a set of top-level columns corresponding to subfields
I'm going to close this because I don't think we can do anything about it within dbt-redshift. But if anyone discovers we can do something about it, we'll be glad to re-open it.
Sorry for necroposting, but since I had this exact same issue (incremental model on Redshift Spectrum), I wanted to share the work-around that I came up with, as I know I won't remember the next time I need it and hopefully will find my own comment here to remind me, and maybe it'll help others who wind up here as well, as it was the first result in Google for Predicates not yet supported in nested s3 scan.
My solution, tl;dr:
{% set last_timestamp_query %}
select max(event_timestamp) from {{ this }}
{% endset %}
{% set last_timestamp = dbt_utils.get_single_value(last_timestamp_query, default=0) if is_incremental() else 0 %}
select *
from {{ source('some_schema', 'some_table') }}
where 1 = 1
{% if is_incremental() %}
and event_timestamp > {{ last_timestamp }}
{% endif %}
The implementation will need to be customized to your specific object names, but this is the "skeleton" of the approach that solves the problem for me.
Is this a new bug in dbt-redshift?
Current Behavior
Hello, we are trying to use dbt cloud with Redshfit Spectrum.
When we first build our incremental models with
dbt run
everything works fine, however when we run it again without the--full-refresh
flag, we get the errorPredicates not yet supported in nested s3 scan.
This same error happens if we run the compiled incremental query on redshift too, suggesting i's not supported by Redshift Spectrum (see Redshift docs - Nested-data-restrictions).Is there another way to make this work?
Expected Behavior
We expected to update the table with the latest modifications by date as summarised in the dbt docs for incremental models
Steps To Reproduce
In dbt cloud, using as an example the following model:
Relevant log output
Environment
Additional Context
No response
The text was updated successfully, but these errors were encountered: