Skip to content

[Feature] support inline session python submission method #478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
dkruh1 opened this issue Jul 17, 2024 · 2 comments
Closed
3 tasks done

[Feature] support inline session python submission method #478

dkruh1 opened this issue Jul 17, 2024 · 2 comments
Labels
pkg:dbt-spark Issue affects dbt-spark type:enhancement New feature request

Comments

@dkruh1
Copy link

dkruh1 commented Jul 17, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-spark functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Summary:
Introduce an option to run Python models within an existing session, similar to the session option available for SQL models.

Description:
Currently, users must choose between an all-purpose cluster or a job cluster to run Python models (see docs). This requirement limits the ability to execute dbt models inline within an existing notebook, forcing model execution to be triggered outside of Databricks.

In contrast, SQL models in dbt can leverage the session connection method, allowing them to be executed as part of an existing session. This separation of model logic from job cluster definitions enables orchestration systems to define clusters based on different considerations.

Request:
We propose introducing a similar session option for Python models. This feature would allow users to submit Python models to be executed within a given session, thereby decoupling model definitions from job cluster specifications.

Describe alternatives you've considered

For job clusters, there isn't a viable alternative that leverages the same Databricks API and costs. A possible, but problematic, option is to create an all-purpose cluster, provide the model with its cluster ID, and destroy the cluster after use. However, this approach is significantly more expensive (due to the cost difference between all-purpose clusters and job clusters) and disrupts the existing architecture that uses the session method to execute models within a job cluster.

Who will this benefit?

All dbt users currently leveraging the session method and considering adopting dbt Python models will benefit from this feature. Additionally, users who use third-party tools to define job cluster specifications based on AI or other methods will be able to decouple model logic from cluster spec configuration, allowing for greater flexibility and efficiency.

Are you interested in contributing this feature?

yes - I'm preparing a pull request

Anything else?

No response

@dkruh1 dkruh1 added type:enhancement New feature request triage:product In Product's queue labels Jul 17, 2024
@amychen1776 amychen1776 removed the triage:product In Product's queue label Aug 1, 2024
@amychen1776
Copy link
Contributor

@dkruh1 are you using the adapter with Databricks? If so, is there a reason why you're not using the dbt-databricks adapter?

@mikealfare mikealfare added the pkg:dbt-spark Issue affects dbt-spark label Jan 13, 2025
@mikealfare mikealfare transferred this issue from dbt-labs/dbt-spark Jan 13, 2025
@amychen1776
Copy link
Contributor

@dkruh1 please recreate this issue with the dbt-databricks adapter since connection logic is on that adapter. Thank you!

@amychen1776 amychen1776 closed this as not planned Won't fix, can't repro, duplicate, stale Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:dbt-spark Issue affects dbt-spark type:enhancement New feature request
Projects
None yet
Development

No branches or pull requests

3 participants