Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] --empty doesn't work with relation that are already named and might conflict #124

Closed
2 tasks done
github-christophe-oudar opened this issue Mar 9, 2024 · 13 comments
Assignees
Labels
type:bug Something isn't working as documented

Comments

@github-christophe-oudar
Copy link
Contributor

github-christophe-oudar commented Mar 9, 2024

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

This a follow up from my comment on the original PR.

If any is using code like FROM {{ source('schema', 'table') }} my_table_alias while using the --empty alias.
The resulting SQL is the following: FROM (select * from `db`.`schema`.`table` where false limit 0) _dbt_limit_subq my_table_alias

The problem is linked to the alias that's already set up here:

return f"(select * from {rendered} where false limit 0) _dbt_limit_subq"
else:

which prevents from aliasing by ourselves. Also if there are multiple references, I guess the tables will all be aliased to _dbt_limit_subq resulting in an error.

Expected Behavior

The simplest expected would be that it results in FROM (select * from `db`.`schema`.`table` where false limit 0) my_table_alias (practically that no table name is suffixed automatically). However if the user didn't name its table, it will fail... which is another problem. Yet we could look into alternatives:

  • finding a way to detect the table is suffixed
    • it looks like complex based on how dbt works
  • enabling setting the alias through the kwargs from the source/ref macros
    • it might be seems reasonable but requires modifications on the end users

Steps To Reproduce

  • Create a simple model my_model.sql with an SQL like:
    SELECT * FROM {{ source('schema', 'table') }} my_table_alias
  • run dbt command dbt run -s my_model --empty

Relevant log output

No response

Environment

- OS: MacOS 13.2.1
- Python: 3.11
- dbt-adapter: 1.8.0b1

Additional Context

No response

@github-christophe-oudar github-christophe-oudar added type:bug Something isn't working as documented triage:product In Product's queue labels Mar 9, 2024
@martynydbt martynydbt added High Severity and removed triage:product In Product's queue labels Mar 15, 2024
@dwreeves
Copy link
Contributor

dwreeves commented Mar 21, 2024

set no alias (but then I think some DBs requires an alias and then it would fail for them with default setup)

Yes, some do (e.g. Postgres) but some don't (e.g. Snowflake).

This could be an attribute of the dialect's adapter, e.g. requires_aliases_for_subqueries: bool.

For those that do not required, aliasing shouldn't be used at all. Then the problem is solved arbitrarily, no other fixes required.

For those that do require aliases, that is where it gets trickier. One of the issues I see with something like, select * from {{ ref("foo", alias="foo") }} or select * from {{ ref("foo", alias=True) }} (pick your poison for an API) is that this isn't any less complex in terms of the pre-compiled code than just doing select * from {{ ref("foo", alias=True) }} as foo.

So, based on that, and based on the fact that most (all?) OLAP systems (i.e. what typical users are using dbt with) don't require aliasing subqueries, I actually think there is a strong argument for not aliasing these subqueries at all: OLAP people don't need aliases; Postgres people do but the code isn't made simpler by forcing it into the Jinja instead of the SQL. (In that case, you also wouldn't even need a requires_aliases_for_subqueries: bool for the adapter.)

@khaledh
Copy link

khaledh commented Apr 18, 2024

Just want to add +1 as we're having the same issue (we're using dbt-bigquery).

@dcarvalhofernandes
Copy link

+1 on this, using dbt-snowflake

@colin-rogers-dbt
Copy link
Contributor

Working on a PR for this, approach is going to be to make the aliasing configurable for each adapter so we can keep it for postgres or mysql but drop for platforms that don't need it (i.e. bigquery / snowflake). This should at least fix this case for those platforms.

Worth noting the workaround, in the meantime, is to move the ref into a CTE.
So instead of:
SELECT * FROM {{ source('schema', 'table') }} my_table_alias
this should work...

WITH my_table_alias as (SELECT * FROM {{ source('schema', 'table') }})

SELECT * FROM my_table_alias

@dwreeves
Copy link
Contributor

dwreeves commented Apr 18, 2024

Users may still be using aliases for tables in Postgres, e.g.

select a.*
from {{ ref("base_table") }} a
left join {{ ref("foreign_entity") }} b
on a.foreign_id = b.id

The above is valid Postgres SQL, after rendering. So it seems like adding aliases can still end up breaking workflows for some Postgres users, unless I'm missing something.

@colin-rogers-dbt
Copy link
Contributor

You're not missing something and we still need to come up with a better long term solution here (other than asking people to write their SQL differently if they want to use --empty). Essentially I'm going to get a fast-ish fix to reduce the scope of this case to just platforms that require aliasing subqueries.
Currently there isn't a way for us to interpret the context of the ref to know if we should or should not add an alias

@dwreeves
Copy link
Contributor

dwreeves commented Apr 18, 2024

I guess my point is, for SQL for dialects requiring aliased subqueries, the path to making all queries working arbitrarily with --empty is:

  • If an alias is added by dbt: Replace aliased refs {{ ref("table") }} as x with (select * from {{ ref("table") }}) as x
  • If an alias is not added by dbt: Replace un-aliased refs {{ ref("table") }} with {{ ref("table") }} as x

In both cases, SQL rewrites are required (one set of users has to make changes), and between the two I think people would rather write as x than wrap in a subquery.

Furthermore, unless someone can chime in with expertise on the Postgres query planner, I'd not rule out the possibility that replacing from table with from (select * from table) could cause performance troubles for non-empty queries. In my limited testing with Postgres on some queries we use in prod, it doesn't seem like replacing tbl with (select * from tbl) impacted the explains, but I don't know enough about Postgres, and don't have enough real world test cases, to rule out that possibility. (Could anyone who knows more than me chime in and assure it never impacts what the query planner does? 😅)

@dwreeves
Copy link
Contributor

dwreeves commented Apr 18, 2024

Ah, it seems like it does not ever impact the query planner to replace from tbl with from (select * from tbl)! https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-SUBQUERIES OK, that is one of my concerns alleviated. 😄

One more note on Postgres, specifically, for what it's worth, since apparently my information is a little out of date: Postgres ≤13 requires aliases, but ≥14 does not.

@colin-rogers-dbt
Copy link
Contributor

@dwreeves It may be true that most users have those preferences but changing this behavior in dbt-postgres would amount to a breaking change for the set of users for whom this is working as opposed to preserving the present behavior where it's always been broken for this case.

Ultimately we need to find a better long term approach for supporting --empty and making these kind of dynamic sql manipulation.

@colin-rogers-dbt
Copy link
Contributor

closing, tracking the long term fix here: #199

@benesch
Copy link

benesch commented Jun 7, 2024

One more note on Postgres, specifically, for what it's worth, since apparently my information is a little out of date: Postgres ≤13 requires aliases, but ≥14 does not.

For posterity: aliases became optional in v16, not v14! See: postgres/postgres@bcedd8f

bobbyiliev added a commit to MaterializeInc/materialize that referenced this issue Jun 7, 2024
<!--
Describe the contents of the PR briefly but completely.

If you write detailed commit messages, it is acceptable to copy/paste
them
here, or write "see commit messages for details." If there is only one
commit
in the PR, GitHub will have already added its commit message above.
-->

### Motivation

As per this bug
[here](dbt-labs/dbt-adapters#124), the
`--empty` flag doesn't work with relation that are already named and
fails.

A quick repro:
* Create a simple model with: `SELECT * FROM {{
ref('my_upstream_model')}} some_alias`
* The generated SQL when using the `--empty` flag, looks like this:
```sql
as (
  SELECT * FROM (
    select * from "my_upstream_model" where false limit 0) _dbt_limit_subq_my_upstream_model alias
);
```

<!--
Which of the following best describes the motivation behind this PR?

  * This PR fixes a recognized bug.

    [Ensure issue is linked somewhere.]

  * This PR adds a known-desirable feature.

    [Ensure issue is linked somewhere.]

  * This PR fixes a previously unreported bug.

    [Describe the bug in detail, as if you were filing a bug report.]

  * This PR adds a feature that has not yet been specified.

[Write a brief specification for the feature, including justification
for its inclusion in Materialize, as if you were writing the original
     feature specification.]

   * This PR refactors existing code.

[Describe what was wrong with the existing code, if it is not obvious.]
-->

### Tips for reviewer

<!--
Leave some tips for your reviewer, like:

    * The diff is much smaller if viewed with whitespace hidden.
    * [Some function/module/file] deserves extra attention.
* [Some function/module/file] is pure code movement and only needs a
skim.

Delete this section if no tips.
-->

### Checklist

- [ ] This PR has adequate test coverage / QA involvement has been duly
considered. ([trigger-ci for additional test/nightly
runs](https://trigger-ci.dev.materialize.com/))
- [ ] This PR has an associated up-to-date [design
doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md),
is a design doc
([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)),
or is sufficiently small to not require a design.
  <!-- Reference the design in the description. -->
- [ ] If this PR evolves [an existing `$T ⇔ Proto$T`
mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md)
(possibly in a backwards-incompatible way), then it is tagged with a
`T-proto` label.
- [ ] If this PR will require changes to cloud orchestration or tests,
there is a companion cloud PR to account for those changes that is
tagged with the release-blocker label
([example](MaterializeInc/cloud#5021)).
<!-- Ask in #team-cloud on Slack if you need help preparing the cloud
PR. -->
- [ ] This PR includes the following [user-facing behavior
changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note):
- <!-- Add release notes here or explicitly state that there are no
user-facing behavior changes. -->
@dwreeves
Copy link
Contributor

dwreeves commented Jun 7, 2024

One more note on Postgres, specifically, for what it's worth, since apparently my information is a little out of date: Postgres ≤13 requires aliases, but ≥14 does not.

For posterity: aliases became optional in v16, not v14! See: postgres/postgres@bcedd8f

Thanks! Seems I have poor reading comprehension, or confused myself clicking through the docs versions 😞

@mvzakharov
Copy link

Hello everyone. I use the adapter dbt-vertica
I want to add there is the same problem.
It is not possible to use the key --empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working as documented
Projects
None yet
Development

No branches or pull requests

10 participants