Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add execution context API #455

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fjakobs
Copy link
Contributor

@fjakobs fjakobs commented May 4, 2022

This PR adds the still supported functionality from the REST API 1.2.

It allows users to create execution contexts and run Python, SQL, and R code on a cluster.

Note that his PR is still missing tests and can't be merged as is.

Copy link
Contributor

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6c429d4#diff-72d81b5e87625d187fbc145e395e1981ef31119974e19e2b34dfa3ab0a9b983bR32-R51 Command/Context API support was added along with Tunnels PoC

i'd also suggest to move .devcontainer into separate PR

@fjakobs fjakobs force-pushed the execution-context-api branch 2 times, most recently from c6649d5 to d3cb6fb Compare May 4, 2022 12:02
@fjakobs
Copy link
Contributor Author

fjakobs commented May 4, 2022

I moved the dev container code out into #450 and made sure the linter is happy.

@fjakobs fjakobs force-pushed the execution-context-api branch 2 times, most recently from 4940954 to 1eaf336 Compare May 4, 2022 13:17
These are the still supported functionality from the REST API 1.2.
https://docs.databricks.com/dev-tools/api/1.2/index.html
Copy link
Contributor

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left detailed review on proposed changes

// Update 'VARIANT' to pick a Python version: 3, 3.10, 3.9, 3.8, 3.7, 3.6
// Append -bullseye or -buster to pin to an OS version.
// Use -bullseye variants on local on arm64/Apple Silicon.
"VARIANT": "3.6-bullseye",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be pinning Python 3.8, i think. At least the minimum python version in out LTS runtime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tox is still using 3.6 and will fail with any other version. I think bumping the python version should be a separate task.

# See here for image contents: https://github.com/microsoft/vscode-dev-containers/tree/v0.231.3/containers/python-3/.devcontainer/base.Dockerfile

# [Choice] Python version (use -bullseye variants on local arm64/Apple Silicon): 3, 3.10, 3.9, 3.8, 3.7, 3.6, 3-bullseye, 3.10-bullseye, 3.9-bullseye, 3.8-bullseye, 3.7-bullseye, 3.6-bullseye, 3-buster, 3.10-buster, 3.9-buster, 3.8-buster, 3.7-buster, 3.6-buster
ARG VARIANT="3.10-bullseye"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be pinning Python 3.8, i think. At least the minimum python version in out LTS runtime

self.id = None


class ExecutionContextApi(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you reuse the basic structure for command/context APIs from Tunnel CLI, so that it won't break, once merged? :)
6c429d4

it's removed, so probably it has to be re-added
0296117

@profile_option
@eat_exceptions
@provide_api_client
def create_context_cli(api_client, cluster_id, language): # NOQA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this command makes no sense in isolation within the CLI from end-user perspective. what makes more sense from practical perspective is "run this python code on cluster with this name and return it's results".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make sense when you want to run multiple commands within the same context. What you describe is possible with command-execute-once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just not clear for me which end user journey it simplifies.

@profile_option
@eat_exceptions
@provide_api_client
def get_context_status_cli(api_client, cluster_id, context_id): # NOQA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this command makes no sense in isolation within the CLI from end-user perspective.

@profile_option
@eat_exceptions
@provide_api_client
def cancel_command_cli(api_client, cluster_id, context_id, command_id): # NOQA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this command makes no sense in isolation within the CLI from end-user perspective.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make sense when you start a command async.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execute_command_cli can fit into "workspace initialization" journey with implemented additions, as i see requests for something like that from time to time. current status quo is a run-submit a notebook job, which is also cancellable.

my concern is keeping new command surface to minimum.

@profile_option
@eat_exceptions
@provide_api_client
def get_command_status_cli(api_client, cluster_id, context_id, command_id): # NOQA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this command makes no sense in isolation within the CLI from end-user perspective.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make sense when you start a command async.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why people need to start individual notebook command async if they have run-submit jobs, that also have a snapshot of notebook being executed?

pass


execution_context_group.add_command(create_context_cli, name='create')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that logically this belongs to a cluster command group instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be convinced about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because commands are possible only in scope of a running interactive cluster

@@ -105,6 +105,12 @@ def __init__(self, user=None, password=None, host=None, token=None,
self.api_version = api_version
self.jobs_api_version = jobs_api_version

def get_v1_client(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will make OAuth (and AAD, and OIDC) implementation fragile for long-running commands, where token might have to be refreshed couple of times to keep working. could we have another way of propagating /api/1.2 path prefix? kwarg in perform_query method, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this code from dbx https://github.com/databrickslabs/dbx/blob/main/dbx/utils/v1_client.py#L12
There might be a better way though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it to use kwargs, that's indeed cleaner

@@ -1114,3 +1117,80 @@ def delete_repo(self, id, headers=None):
_data = {}

return self.client.perform_query('DELETE', '/repos/{id}'.format(id=id), data=_data, headers=headers)

class ExecutionContextService(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you reuse the basic structure for command/context APIs from Tunnel CLI, so that it won't break, once merged? :)
6c429d4

it's removed, so probably it has to be re-added
0296117

@fjakobs
Copy link
Contributor Author

fjakobs commented May 5, 2022

@nfx what do you think about the following changes:

  1. Keep the execution-context group but only keep the 1:1 mapping of commands to the API there. This are the low level commands for advanced usage. e.g. When running multiple commands, potentially in different languages, in the same context. This allows us to simulate notebooks from the CLI.
  2. Add a single execute command to the cluster group that is smart and convenient and covers the 80% used case. This could look like described in Add jobs run-script command #341 (comment)

@nfx
Copy link
Contributor

nfx commented May 5, 2022

@fjakobs it's just that i've literally never heard of any customer asking for interactive notebook individual commands to be exposed in CLI in any of workflows.

It's quite the opposite for SDK for command execution - yes, we need to expose a python context manager for command context in the SDK, it has plenty of real-world scenarios. Some of the applications are https://github.com/databrickslabs/jupyterlab-integration and https://github.com/databrickslabs/migrate

and of course, there were high-level workflow requests, like #122 (10 votes since 2018) / #341 (back then there were too many issues, so i followed the broken window theory and created more).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants