Added serverless support to spark fixture #91

mwojtyczka · 2025-01-30T13:11:48Z

Extend spark fixture to support Serverless compute.

Linked issues

Resolves #90

Tests

manually tested
[] added unit tests
added integration tests
verified on staging environment (screenshot attached)

github-actions · 2025-01-30T13:20:39Z

✅ 41/41 passed, 5 skipped, 5m42s total

_{Running from acceptance #169}

JCZuurmond

Hi @mwojtyczka , thanks for opening this PR. This "auto" serverless cluster id is new to me. What happens if that variable is set to something else than "auto"? Does that make sense for serverles?

tests/integration/fixtures/test_connect.py

src/databricks/labs/pytester/fixtures/connect.py

tests/integration/conftest.py

README.md

mwojtyczka · 2025-01-30T13:39:30Z

Hi @mwojtyczka , thanks for opening this PR. This "auto" serverless cluster id is new to me. What happens if that variable is set to something else than "auto"? Does that make sense for serverles?

You can set it to the actual cluster id but it does not make sense for serverless as the cluster id is going to be active only for short period of time most of the time (due to autotermination). This is also the reason we only mention "auto" in the documentation:
https://docs.databricks.com/en/dev-tools/databricks-connect/cluster-config.html#configure-a-connection-to-serverless-compute

It works the same in notebooks for example. You cannot select specific cluster. You just select Serverless.

JCZuurmond · 2025-01-30T13:53:15Z

Hi @mwojtyczka , thanks for opening this PR. This "auto" serverless cluster id is new to me. What happens if that variable is set to something else than "auto"? Does that make sense for serverles?

You can set it to the actual cluster id but it does not make sense for serverless as the cluster id is going to be active only for short period of time most of the time (due to autotermination). This is also the reason we only mention "auto" in the documentation: https://docs.databricks.com/en/dev-tools/databricks-connect/cluster-config.html#configure-a-connection-to-serverless-compute

It works the same in notebooks for example. You cannot select specific cluster. You just select Serverless.

Yeah, it makes sense to have no cluster id for serverless - kinda by definition. It confuses me that the workspace configuration has an attribute called serverless_compute_id. Please add the documentation link to the README and the docstring.

Also, should we add a check that explicitly tests the value is auto otherwise we fail with a ValueError saying we only accept it to be "auto"?

mwojtyczka · 2025-01-30T14:14:37Z

Hi @mwojtyczka , thanks for opening this PR. This "auto" serverless cluster id is new to me. What happens if that variable is set to something else than "auto"? Does that make sense for serverles?

You can set it to the actual cluster id but it does not make sense for serverless as the cluster id is going to be active only for short period of time most of the time (due to autotermination). This is also the reason we only mention "auto" in the documentation: https://docs.databricks.com/en/dev-tools/databricks-connect/cluster-config.html#configure-a-connection-to-serverless-compute
It works the same in notebooks for example. You cannot select specific cluster. You just select Serverless.

Yeah, it makes sense to have no cluster id for serverless - kinda by definition. It confuses me that the workspace configuration has an attribute called serverless_compute_id. Please add the documentation link to the README and the docstring.

Also, should we add a check that explicitly tests the value is auto otherwise we fail with a ValueError saying we only accept it to be "auto"?

will do but not sure we need to check for "auto". I tested it and if you have a running cluster it works if you give it a specific cluster_id. Maybe if given we should check it exists otherwise fail

mwojtyczka · 2025-01-30T19:02:48Z

@JCZuurmond i had to update the sdk dependency because UCX is using lower version of the sdk and that was causing downstreams job to fail. I don't really understand why the projects are coupled this way. It does not make much sense since UCX and Remoph can pin to a specific version of pytester.

The project was missing databricks-connect dependency. Therefore the test_databricks_connect was previously always skipped. After adding it and enabling the test, it seems that incompatible cluster for databricks connect is used to run the tests.

test_databricks_connect: pyspark.errors.exceptions.connect.SparkConnectGrpcException: BAD_REQUEST: SingleClusterComputeMode(DATABRICKS_CLUSTER_ID) is not Shared or Single User Cluster. (requestId=b64b9411-74d5-4e27-9566-5d1ed1fa72ce) (1.537s)

Can you please update cluster id DATABRICKS_CLUSTER_ID and TEST_DEFAULT_CLUSTER_ID in the vault_uri: ${{ secrets.VAULT_URI }} to a shared cluster 1114-152544-29g1w07e. The one that is currently used is neither Shared nor Single user cluster. We cannot use databricks-connect with this.

JCZuurmond

@mwojtyczka : Approving as it LGTM. Please cover the last two suggestions before merging

tests/integration/fixtures/test_connect.py

src/databricks/labs/pytester/fixtures/connect.py

JCZuurmond · 2025-01-31T07:54:51Z

@JCZuurmond i had to update the sdk dependency because UCX is using lower version of the sdk and that was causing downstreams job to fail. I don't really understand why the projects are coupled this way. It does not make much sense since UCX and Remoph can pin to a specific version of pytester.

The project was missing databricks-connect dependency. Therefore the test_databricks_connect was previously always skipped. After adding it and enabling the test, it seems that incompatible cluster for databricks connect is used to run the tests.
test_databricks_connect: pyspark.errors.exceptions.connect.SparkConnectGrpcException: BAD_REQUEST: SingleClusterComputeMode(DATABRICKS_CLUSTER_ID) is not Shared or Single User Cluster. (requestId=b64b9411-74d5-4e27-9566-5d1ed1fa72ce) (1.537s)
Can you please update cluster id DATABRICKS_CLUSTER_ID and TEST_DEFAULT_CLUSTER_ID in the vault_uri: ${{ secrets.VAULT_URI }} to a shared cluster 1114-152544-29g1w07e. The one that is currently used is neither Shared nor Single user cluster. We cannot use databricks-connect with this.

@mwojtyczka : I cannot not help you here. @FastLee has access to our testing infrastructure, I expect that he can update that value.

On this design pattern, I suspect it is intended as a sanity check by @nfx , a backstop if you will

mwojtyczka requested a review from nfx as a code owner January 30, 2025 13:11

mwojtyczka temporarily deployed to account-admin January 30, 2025 13:11 — with GitHub Actions Inactive

mwojtyczka requested review from JCZuurmond and alexott and removed request for nfx January 30, 2025 13:12

JCZuurmond suggested changes Jan 30, 2025

View reviewed changes

mwojtyczka had a problem deploying to account-admin January 30, 2025 13:32 — with GitHub Actions Error

mwojtyczka had a problem deploying to account-admin January 30, 2025 13:33 — with GitHub Actions Error

mwojtyczka temporarily deployed to account-admin January 30, 2025 13:40 — with GitHub Actions Inactive

mwojtyczka had a problem deploying to account-admin January 30, 2025 16:34 — with GitHub Actions Error

mwojtyczka had a problem deploying to account-admin January 30, 2025 16:36 — with GitHub Actions Failure

mwojtyczka had a problem deploying to account-admin January 30, 2025 16:42 — with GitHub Actions Failure

mwojtyczka had a problem deploying to account-admin January 30, 2025 16:58 — with GitHub Actions Failure

mwojtyczka had a problem deploying to account-admin January 30, 2025 17:02 — with GitHub Actions Failure

mwojtyczka requested a review from JCZuurmond January 30, 2025 18:13

mwojtyczka had a problem deploying to account-admin January 30, 2025 18:14 — with GitHub Actions Failure

mwojtyczka had a problem deploying to account-admin January 30, 2025 18:28 — with GitHub Actions Failure

mwojtyczka had a problem deploying to account-admin January 30, 2025 18:34 — with GitHub Actions Failure

mwojtyczka had a problem deploying to account-admin January 30, 2025 18:41 — with GitHub Actions Error

mwojtyczka temporarily deployed to account-admin January 30, 2025 18:47 — with GitHub Actions Inactive

mwojtyczka temporarily deployed to account-admin January 30, 2025 18:51 — with GitHub Actions Inactive

mwojtyczka had a problem deploying to account-admin January 30, 2025 18:59 — with GitHub Actions Failure

mwojtyczka had a problem deploying to account-admin January 30, 2025 19:05 — with GitHub Actions Error

mwojtyczka had a problem deploying to account-admin January 30, 2025 19:08 — with GitHub Actions Failure

mwojtyczka had a problem deploying to account-admin January 30, 2025 19:14 — with GitHub Actions Failure

mwojtyczka temporarily deployed to account-admin January 30, 2025 19:52 — with GitHub Actions Inactive

JCZuurmond approved these changes Jan 31, 2025

View reviewed changes

tests/integration/fixtures/test_connect.py Outdated Show resolved Hide resolved

tests/integration/fixtures/test_connect.py Outdated Show resolved Hide resolved

src/databricks/labs/pytester/fixtures/connect.py Outdated Show resolved Hide resolved

mwojtyczka temporarily deployed to account-admin January 31, 2025 10:31 — with GitHub Actions Inactive

mwojtyczka removed the request for review from alexott January 31, 2025 10:44

code review feedback

cb743a9

mwojtyczka force-pushed the serverless_support branch from 3a0169a to cb743a9 Compare January 31, 2025 10:50

mwojtyczka temporarily deployed to account-admin January 31, 2025 10:50 — with GitHub Actions Inactive

added types

57ec09b

mwojtyczka temporarily deployed to account-admin January 31, 2025 10:57 — with GitHub Actions Inactive

udpated type

4d4b3b5

mwojtyczka had a problem deploying to account-admin January 31, 2025 11:05 — with GitHub Actions Error

refactor

747435d

mwojtyczka temporarily deployed to account-admin January 31, 2025 11:08 — with GitHub Actions Inactive

refactor

842f6a7

mwojtyczka temporarily deployed to account-admin January 31, 2025 11:18 — with GitHub Actions Inactive

mwojtyczka merged commit cefd79e into main Jan 31, 2025
7 checks passed

mwojtyczka deleted the serverless_support branch January 31, 2025 11:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added serverless support to spark fixture #91

Added serverless support to spark fixture #91

Uh oh!

mwojtyczka commented Jan 30, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jan 30, 2025 •

edited

Loading

Uh oh!

JCZuurmond left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mwojtyczka commented Jan 30, 2025 •

edited

Loading

Uh oh!

JCZuurmond commented Jan 30, 2025

Uh oh!

mwojtyczka commented Jan 30, 2025 •

edited

Loading

Uh oh!

mwojtyczka commented Jan 30, 2025 •

edited

Loading

Uh oh!

JCZuurmond left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JCZuurmond commented Jan 31, 2025

Uh oh!

Uh oh!

Uh oh!

Added serverless support to spark fixture #91

Added serverless support to spark fixture #91

Uh oh!

Conversation

mwojtyczka commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Linked issues

Tests

Uh oh!

github-actions bot commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JCZuurmond left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mwojtyczka commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JCZuurmond commented Jan 30, 2025

Uh oh!

mwojtyczka commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mwojtyczka commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JCZuurmond left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JCZuurmond commented Jan 31, 2025

Uh oh!

Uh oh!

Uh oh!

mwojtyczka commented Jan 30, 2025 •

edited

Loading

github-actions bot commented Jan 30, 2025 •

edited

Loading

mwojtyczka commented Jan 30, 2025 •

edited

Loading

mwojtyczka commented Jan 30, 2025 •

edited

Loading

mwojtyczka commented Jan 30, 2025 •

edited

Loading