-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(dbt): emit table schema metadata #19548
Closed
rexledesma
wants to merge
1
commit into
rl/add-dagster-metadata-dbt-test-project-pt2
from
rl/add-dbt-table-schema-metadata
Closed
feat(dbt): emit table schema metadata #19548
rexledesma
wants to merge
1
commit into
rl/add-dagster-metadata-dbt-test-project-pt2
from
rl/add-dbt-table-schema-metadata
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Feb 1, 2024
Current dependencies on/for this PR:
This stack of pull requests is managed by Graphite. |
10a539f
to
d8010ab
Compare
d98d04d
to
f3ae9fa
Compare
sryza
reviewed
Feb 2, 2024
sryza
reviewed
Feb 2, 2024
...odules/libraries/dagster-dbt/dagster_dbt/include/macros/dagster__log_columns_in_relation.sql
Show resolved
Hide resolved
d8010ab
to
9ec13e3
Compare
This was referenced Feb 6, 2024
f3ae9fa
to
00765c5
Compare
rexledesma
added a commit
that referenced
this pull request
Feb 14, 2024
## Summary & Motivation Formalize #19548 by pulling the new macro logic into a separate dbt package. Separated this out into a separate PR since I wanted to discuss where this package should live in the monorepo. ### Assumptions [Although dbt Hub packages are recommended](https://docs.getdbt.com/docs/build/packages#hub-packages-recommended), I believe we should instead prioritize the [`git` installation experience](https://docs.getdbt.com/docs/build/packages#git-packages). - This will give us a better experience while developing this package. Users will be able to install via revision if we need to push out hotfixes. - Users still have the option to install via named tags/revisions (e.g. `1.6.0`) Note that we will still have the option to formally move to dbt Hub in the future, so this isn't a binding choice. Here's some examples `dependencies.yml` would look like for a user: **Named Revision** ```yaml packages: - git: "https://github.com/dagster-io/dagster.git" subdirectory: "/python_modules/libraries/dagster-dbt/dbt_packages/dagster" revision: 1.6.0 ``` **Hash Revision** ```yaml packages: - git: "https://github.com/dagster-io/dagster.git" subdirectory: "/python_modules/libraries/dagster-dbt/dbt_packages/dagster" revision: fd13563 ``` ### Open Questions We could put this package directory at the top level of our monorepo, just to make `subdirectory` a bit more ergonomic. I don't think it's a big deal since this is one time copy paste anyways. ```yaml packages: - git: "https://github.com/dagster-io/dagster.git" subdirectory: "dbt_packages/dagster" revision: fd13563 ``` ## How I Tested These Changes follow up PRs
jmsanders
pushed a commit
that referenced
this pull request
Feb 14, 2024
## Summary & Motivation Formalize #19548 by pulling the new macro logic into a separate dbt package. Separated this out into a separate PR since I wanted to discuss where this package should live in the monorepo. ### Assumptions [Although dbt Hub packages are recommended](https://docs.getdbt.com/docs/build/packages#hub-packages-recommended), I believe we should instead prioritize the [`git` installation experience](https://docs.getdbt.com/docs/build/packages#git-packages). - This will give us a better experience while developing this package. Users will be able to install via revision if we need to push out hotfixes. - Users still have the option to install via named tags/revisions (e.g. `1.6.0`) Note that we will still have the option to formally move to dbt Hub in the future, so this isn't a binding choice. Here's some examples `dependencies.yml` would look like for a user: **Named Revision** ```yaml packages: - git: "https://github.com/dagster-io/dagster.git" subdirectory: "/python_modules/libraries/dagster-dbt/dbt_packages/dagster" revision: 1.6.0 ``` **Hash Revision** ```yaml packages: - git: "https://github.com/dagster-io/dagster.git" subdirectory: "/python_modules/libraries/dagster-dbt/dbt_packages/dagster" revision: fd13563 ``` ### Open Questions We could put this package directory at the top level of our monorepo, just to make `subdirectory` a bit more ergonomic. I don't think it's a big deal since this is one time copy paste anyways. ```yaml packages: - git: "https://github.com/dagster-io/dagster.git" subdirectory: "dbt_packages/dagster" revision: fd13563 ``` ## How I Tested These Changes follow up PRs (cherry picked from commit 1a445a2)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary & Motivation
Rather than relying on the user to define their table schema upfront in their dbt model properties, we can just retrieve it at run time.
On behalf of the user, we create a macro
dagster__log_columns_in_relation
and invoke it on the dbt project's models/seeds/snapshots. This macro emits a log with table schema information by leveraging the dbt adapter function.get_columns_in_relation
(link) to retrieve column name/column data type information.To enable this feature, the user needs to configure their
@dbt_assets
using theirDagsterDbtTranslator
. Here's a small example:In the future:
+meta
config on individual dbt models/seeds/snapshots to prevent column information from being emitted.dbt-dagster
package for users to install.How I Tested These Changes
pytest, local
Demo
schema-demo.mov