-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Regression] DBT 1.7.3 violates Redshift transactions. #766
Comments
hi @maksym-dibrov-solidgate thank you writing this up. I agree that there's something seriously wonky about how dbt-redshift handles transactions after version The predominant reason we cannot support user's defined transactions is due to an limitation of the redshift driver. see aws/amazon-redshift-python-driver#162. Ultimately, I feel that until that limitation is addressed, we cannot confidently address transactions in dbt without significant additional investment. In the meantime, I'd be very interested to know how your project is configured that causes it to fail. Can you share more about how your project is configured such that it fails after one second? More importantly, can you help me create a minimal reproducible example of the behavior. As in, starting with vanilla jaffle_shop, what's the least I'd have to modify to reproduce the behavior you are seeing? Last I'm curious if the behavior is different if you specify |
@dataders thank you for replying to this issue! I will try to reproduce it on vanilla jaffle_shop and come back to this thread when I succeed |
We see in the slack thread linked here that this is fixed in 1.7.7, closing. :D |
@martynydbt the thread says it is fixed in 1.7.5. I updated to 1.7.5 but it still throws the same error, so these two issues should be unrelated |
Tried with dbt-redshift==1.7.7 - still the same error |
@maksym-dibrov-solidgate (on slack): |
I figured out what the issue was and fixed it. Here is what happened: We had two separate processes that both threw the same error
In the second case it looks that dbt renames and drops tables in two separate transactions when called with To me this problem looks like an issue with transactionality and considering that the transactionality of operations is determined by dbt, it looks that it can be fixed without intervention of AWS Redshift devs. |
Thanks for clarifying @maksym-dibrov-solidgate! Discussed this with the team, this behavior is due to dbt not supporting parallel runs.
Unfortunately this just isn't functionality that we can support and have no plans to do so for now. |
Thanks @colin-rogers-dbt! One additional detail: We strongly encourage folks to use late-binding views on Redshift, which should avoid this issue of concurrent cascade-dropping ( |
This is a problem in 1.8.4, not sure if this is closed because a spike is opened or misinformation, but this is still a problem. Same query as the top listed at the stort of this issue. Same same error Here is one of my errors from today: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "dev", "target_name": "env", "connection_name": "list_db_schema"} */\nselect\n table_catalog as database,\n table_name as name,\n table_schema as schema,\n 'table' as type\n from information_schema.tables\n where table_schema ilike ''\n and table_type = 'BASE TABLE'\n union all\n select\n table_catalog as database,\n table_name as name,\n table_schema as schema,\n case\n when view_definition ilike '%create materialized view%'\n then 'materialized_view'\n else 'view'\n end as type\n from information_schema.views\n where table_schema ilike '' It is ALWAYS the same type of query. SELECT .. FROM information_schema.tables... We are seeing this a lot, we run in airflow with a bash operator starting dbt run ... --select model.*. ~100 airflow dags, multiplied lets say 5 dag tasks per dag. Each tasks runs 1-3 models. Some dags run hourly/daily/weekly. So we have lots of concurrent dbt run executions at any one point in time. I actually think dbt-redshift knew about this error in 1.6 but for some reason did not fix it: https://github.com/dbt-labs/dbt-redshift/pull/569/files#diff-4ab8ac7de10abb4128d1534093662382aaeb4e8a99ed2f3f880a4d9720717d00. So this test is the reproduction of the issue. Maybe there is a way I as a dbt user can just add a bunch of retry logic to the information_schema queries, unti redshift-connector is fixed? I think this is related to locking/transaction concurrency. This never occurs in our pre-prod environment which is scheduled ad-hoc, less concurrency. In airflow, if I want to reduce this error on dbt-redshift==1.8.4, I need to reduce the concurrent tasks, and reduce the threads in the dbt profile. Once I make those changes the error occurs less frequently intra-dag (but i still have multiple dags running at one time). I could see why late-binding views is an option since that would reduce the information_schema queries performed by dbt, thus avoiding this concurrency/locking issue. But I am using this is in production today receiving 100 alerts a day due to this behavior. Is another solution like pinning dbt-redshift=1.4.0 version prior to adding redshift-connector? Since this conversation seems to suggest that to fix dbt-redshift first we wait for redshift-connector to fix some things. |
Update I was just notified that there is a forthcoming redshift patch P184 to be applied to Q3 2024 to fix could not open OID |
Curious for those who are still seeing this issue, have you opened up an issue with AWS? This is the best way for us (dbt Labs) to advocate to the Redshift team that we need fixes for the Redshift driver. |
@amychen1776 I haven't opened an issue with AWS. If you open one - I will be happy to give it thumbs up |
@maksym-dibrov-solidgate I recommend doing it through AWS support and opening up your own ticket (so you wouldn't be able to thumbs up). In working with the Redshift team, that is how they have recommended flagging up issues like this. |
Is this a regression in a recent version of dbt-core?
Current Behavior
After updating to DBT 1.1.0 -> 1.7.3 we started receiving errors either when running
dbt run
or when reading the tables that are fully-refreshed by DBT at the moment when they are read. This behaviour never appeared on DBT==1.1.0, so the issue seems to be with the newer version of DBT and the the DB itself. The error is:Expected/Previous Behavior
Issue never appeared on DBT==1.1.0 and we were able to read tables that are fully refreshed without problems.
Steps To Reproduce
Pipfile
dbt-packages
models:
our_dbt_project:
+enabled: true
+full_refresh: false
+meta:
alerts_config:
alert_fields: ["table", "description", "result_message", "test_parameters", "test_query", "test_results_sample"]
pipenv run dbt run --debug --profiles-dir our_profile --project-dir our_project --models +tag:our_tag
Relevant log output
Environment
Which database adapter are you using with dbt?
redshift
Additional Context
Started a thread on this issue in slack's dbt-community: https://getdbt.slack.com/archives/CJARVS0RY/p1712935236552749
The text was updated successfully, but these errors were encountered: