Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Wheel Install Command #286

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

nfx
Copy link
Contributor

@nfx nfx commented Apr 1, 2020

Allows installation of wheels onto databricks clusters by using standard Python setuptools framework - through setup.py distutils.command entry point. E.g.

python setup.py databricks_install --cluster-id abcd --databricks-cli-profile staging

will do the following automatically:

  1. build wheel
  2. use staging profile from CLI or throw error with instructions to configre
  3. upload it to DBFS location (configurable as well)
  4. install it on cluster abcd as whl library

TODO:

  1. wait until library is successfully installed or throw error
  2. install library on cluster by name
  3. install library on clusters by tag (e.g. team tags)

References:
https://setuptools.readthedocs.io/en/latest/setuptools.html#adding-commands
https://books.google.nl/books?id=9G9zX_jf1f8C&pg=PT236&lpg=PT236&dq=entry_points+distutils.commands&source=bl&ots=_4deWhAJIf&sig=ACfU3U0LgNjqMdOVTc2zNbdwpWlNr43xkg&hl=en&sa=X&ved=2ahUKEwjzl9TFnsjoAhVNDewKHSlkDUgQ6AEwA3oECAsQKA#v=onepage&q=entry_points%20distutils.commands&f=false

nfx added 4 commits April 1, 2020 23:48
Allows installation of wheels onto databricks clusters by using standard Python setuptools framework. E.g.

```
python setup.py databricks_install --cluster-id abcd --databricks-cli-profile staging
```

will do the following:

1) build wheel
2) use `staging` profile from CLI or throw error with instructions to configre
3) upload it to DBFS location (configurable as well)
4) install it on cluster `abcd` as `whl` library

TODO:
1) wait until library is successfully installed or throw error
2) install library on cluster by name
3) install library on clusters by tag (e.g. team tags)
@codecov-io
Copy link

codecov-io commented Apr 2, 2020

Codecov Report

Merging #286 into master will decrease coverage by 0.08%.
The diff coverage is 78.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #286      +/-   ##
==========================================
- Coverage   83.53%   83.45%   -0.09%     
==========================================
  Files          33       34       +1     
  Lines        2211     2266      +55     
==========================================
+ Hits         1847     1891      +44     
- Misses        364      375      +11     
Impacted Files Coverage Δ
setup.py 0.00% <ø> (ø)
databricks_cli/libraries/distutils.py 77.77% <77.77%> (ø)
databricks_cli/utils.py 98.00% <100.00%> (+2.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f92a8c3...4a85751. Read the comment docs.

@nfx
Copy link
Contributor Author

nfx commented Apr 3, 2020

Once this functionality is ready, we can change our https://docs.databricks.com/dev-tools/ci-cd.html doc from

stage('Package') {
    sh """#!/bin/bash

          # Enable Conda environment for tests
          source ${CONDAPATH}/bin/activate ${CONDAENV}

          # Package Python library to wheel
          cd ${LIBRARYPATH}/python/dbxdemo
          python3 setup.py sdist bdist_wheel
       """
  }
  stage('Build Artifact') {
    sh """mkdir -p ${BUILDPATH}/Workspace
          mkdir -p ${BUILDPATH}/Libraries/python
          mkdir -p ${BUILDPATH}/Validation/Output
          #Get modified files
          git diff --name-only --diff-filter=AMR HEAD^1 HEAD | xargs -I '{}' cp --parents -r '{}' ${BUILDPATH}

          # Get packaged libs
          find ${LIBRARYPATH} -name '*.whl' | xargs -I '{}' cp '{}' ${BUILDPATH}/Libraries/python/

          # Generate artifact
          tar -czvf Builds/latest_build.tar.gz ${BUILDPATH}
       """
    archiveArtifacts artifacts: 'Builds/latest_build.tar.gz'
  }
  stage('Deploy') {
    sh """#!/bin/bash
          # Enable Conda environment for tests
          source ${CONDAPATH}/bin/activate ${CONDAENV}

          # Use Databricks CLI to deploy notebooks
          databricks workspace import_dir ${BUILDPATH}/Workspace ${WORKSPACEPATH}

          dbfs cp -r ${BUILDPATH}/Libraries/python ${DBFSPATH}
       """
    withCredentials([string(credentialsId: DBTOKEN, variable: 'TOKEN')]) {
        sh """#!/bin/bash

              #Get space delimited list of libraries
              LIBS=\$(find ${BUILDPATH}/Libraries/python/ -name '*.whl' | sed 's#.*/##' | paste -sd " ")

              #Script to uninstall, reboot if needed & instsall library
              python3 ${SCRIPTPATH}/installWhlLibrary.py --workspace=${DBURL}\
                        --token=$TOKEN\
                        --clusterid=${CLUSTERID}\
                        --libs=\$LIBS\
                        --dbfspath=${DBFSPATH}
           """
    }
  }

to python3 setup.py sdist databricks_install $CLUSTERID.

As mentioned by @ccstevens, this is for cluster-wide libraries and not for notebook-scoped libraries. CI/CD processes don't make sense for latter.

('cluster-id=', None, "cluster id to distribute it", None),
('cluster-tag=', None, "cluster tag to install library", None),
('cluster-name=', None, "cluster name to distribute it", None),
('databricks-cli-profile=', None, "Databricks CLI profile name", None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a overwrite flag to be consistent with the databricks fs commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to overwrite a wheel? makes perfect sense.

what do you think about profile flag - maybe shorten it to --profile as well? so the command will be something like python setup.py databricks_install --cluster-id abcd --profile staging --owerwrite


def _install_library(self, artifact):
api_client = self._configure_api()
if self.cluster_tag:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the options if they don't work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm about to add logic in next commit

@nfx nfx requested review from fjakobs and a team and removed request for fjakobs May 5, 2022 11:10
@pietern
Copy link
Contributor

pietern commented May 9, 2022

@nfx Do you want to merge this, or functionality similar to this, or should I take the request for review as an FYI?

@mgyucht mgyucht removed the request for review from a team March 22, 2023 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants