Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(dbt): emit table schema metadata #19548

Conversation

rexledesma
Copy link
Contributor

@rexledesma rexledesma commented Feb 1, 2024

Summary & Motivation

Rather than relying on the user to define their table schema upfront in their dbt model properties, we can just retrieve it at run time.

On behalf of the user, we create a macro dagster__log_columns_in_relation and invoke it on the dbt project's models/seeds/snapshots. This macro emits a log with table schema information by leveraging the dbt adapter function .get_columns_in_relation (link) to retrieve column name/column data type information.

To enable this feature, the user needs to configure their @dbt_assets using their DagsterDbtTranslator. Here's a small example:

from dagster import AssetExecutionContext
from dagster_dbt import (
    DagsterDbtTranslator,
    DagsterDbtTranslatorSettings,
    DbtCliResource,
    dbt_assets,
)

from .constants import dbt_manifest_path


@dbt_assets(
    manifest=dbt_manifest_path,
    dagster_dbt_translator=DagsterDbtTranslator(
        settings=DagsterDbtTranslatorSettings(enable_table_schema_metadata=True)
    ),
)
def dbt_assets(context: AssetExecutionContext, dbt: DbtCliResource):
    yield from dbt.cli(["build"], context=context).stream()

In the future:

  • We can add +meta config on individual dbt models/seeds/snapshots to prevent column information from being emitted.
  • We can refactor this functionality as a separate dbt-dagster package for users to install.

How I Tested These Changes

pytest, local

Demo

schema-demo.mov

@rexledesma rexledesma force-pushed the rl/add-dagster-metadata-dbt-test-project-pt2 branch from d8010ab to 9ec13e3 Compare February 6, 2024 16:06
@rexledesma rexledesma force-pushed the rl/add-dbt-table-schema-metadata branch from f3ae9fa to 00765c5 Compare February 6, 2024 16:39
@rexledesma rexledesma closed this Feb 6, 2024
rexledesma added a commit that referenced this pull request Feb 14, 2024
## Summary & Motivation
Formalize #19548 by pulling
the new macro logic into a separate dbt package.

Separated this out into a separate PR since I wanted to discuss where
this package should live in the monorepo.

### Assumptions

[Although dbt Hub packages are
recommended](https://docs.getdbt.com/docs/build/packages#hub-packages-recommended),
I believe we should instead prioritize the [`git` installation
experience](https://docs.getdbt.com/docs/build/packages#git-packages).
- This will give us a better experience while developing this package.
Users will be able to install via revision if we need to push out
hotfixes.
- Users still have the option to install via named tags/revisions (e.g.
`1.6.0`)

Note that we will still have the option to formally move to dbt Hub in
the future, so this isn't a binding choice.

Here's some examples `dependencies.yml` would look like for a user:

**Named Revision**
```yaml
packages:
  - git: "https://github.com/dagster-io/dagster.git"
    subdirectory: "/python_modules/libraries/dagster-dbt/dbt_packages/dagster"
    revision: 1.6.0
```

**Hash Revision**
```yaml
packages:
  - git: "https://github.com/dagster-io/dagster.git"
    subdirectory: "/python_modules/libraries/dagster-dbt/dbt_packages/dagster"
    revision: fd13563
```

### Open Questions

We could put this package directory at the top level of our monorepo,
just to make `subdirectory` a bit more ergonomic. I don't think it's a
big deal since this is one time copy paste anyways.

```yaml
packages:
  - git: "https://github.com/dagster-io/dagster.git"
    subdirectory: "dbt_packages/dagster"
    revision: fd13563
```

## How I Tested These Changes
follow up PRs
rexledesma added a commit that referenced this pull request Feb 14, 2024
…19631)

## Summary & Motivation
Rework #19548 on top of
#19623.

## How I Tested These Changes
pytest
jmsanders pushed a commit that referenced this pull request Feb 14, 2024
## Summary & Motivation
Formalize #19548 by pulling
the new macro logic into a separate dbt package.

Separated this out into a separate PR since I wanted to discuss where
this package should live in the monorepo.

### Assumptions

[Although dbt Hub packages are
recommended](https://docs.getdbt.com/docs/build/packages#hub-packages-recommended),
I believe we should instead prioritize the [`git` installation
experience](https://docs.getdbt.com/docs/build/packages#git-packages).
- This will give us a better experience while developing this package.
Users will be able to install via revision if we need to push out
hotfixes.
- Users still have the option to install via named tags/revisions (e.g.
`1.6.0`)

Note that we will still have the option to formally move to dbt Hub in
the future, so this isn't a binding choice.

Here's some examples `dependencies.yml` would look like for a user:

**Named Revision**
```yaml
packages:
  - git: "https://github.com/dagster-io/dagster.git"
    subdirectory: "/python_modules/libraries/dagster-dbt/dbt_packages/dagster"
    revision: 1.6.0
```

**Hash Revision**
```yaml
packages:
  - git: "https://github.com/dagster-io/dagster.git"
    subdirectory: "/python_modules/libraries/dagster-dbt/dbt_packages/dagster"
    revision: fd13563
```

### Open Questions

We could put this package directory at the top level of our monorepo,
just to make `subdirectory` a bit more ergonomic. I don't think it's a
big deal since this is one time copy paste anyways.

```yaml
packages:
  - git: "https://github.com/dagster-io/dagster.git"
    subdirectory: "dbt_packages/dagster"
    revision: fd13563
```

## How I Tested These Changes
follow up PRs

(cherry picked from commit 1a445a2)
jmsanders pushed a commit that referenced this pull request Feb 14, 2024
…19631)

## Summary & Motivation
Rework #19548 on top of
#19623.

## How I Tested These Changes
pytest

(cherry picked from commit 22f5a27)
@rexledesma rexledesma deleted the rl/add-dbt-table-schema-metadata branch February 21, 2024 01:40
PedramNavid pushed a commit that referenced this pull request Mar 28, 2024
…19631)

## Summary & Motivation
Rework #19548 on top of
#19623.

## How I Tested These Changes
pytest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants