Skip to content
This repository has been archived by the owner on Nov 29, 2023. It is now read-only.

bodywork-ml/bodywork-pipeline-with-cicd

Repository files navigation

Bodywork ML Pipelines with CICD

bodywork

This repository contains a Bodywork project that demonstrates how to configure a ML pipeline with CICD. The example ML pipeline has two stages:

  1. Run a batch job to train a model.
  2. Deploy the trained model as service with a REST API.

To deploy this project manually, follow the steps below.

Get Access to a Kubernetes Cluster

In order to run this example project you will need access to a Kubernetes cluster. To setup a single-node test cluster on your local machine you can use minikube or docker-for-desktop. Check your access to Kubernetes by running,

$ kubectl cluster-info

Which should return the details of your cluster.

Install the Bodywork Python Package

$ pip install bodywork

Setup a Kubernetes Namespace for use with Bodywork

$ bodywork setup-namespace ml-pipeline

Run the ML Pipeline

To test the ML pipeline, using a workflow-controller running on your local machine and interacting with your Kubernetes cluster, run,

$ bodywork deployment create \
    --name=initial-deployment \
    --namespace=ml-pipeline \
    --git-repo-url=https://github.com/bodywork-ml/bodywork-pipeline-with-cicd \
    --git-repo-branch=master \
    --local-workflow-controller

The workflow-controller logs will be streamed to your shell's standard output until the job has been successfully completed.

Testing the Model-Scoring Service

Service deployments are accessible via HTTP from within the cluster - they are not exposed to the public internet, unless you have installed an ingress controller in your cluster. The simplest way to test a service from your local machine, is by using a local proxy server to enable access to your cluster. This can be achieved by issuing the following command,

$ kubectl proxy

Then in a new shell, you can use the curl tool to test the service. For example,

$ curl http://localhost:8001/api/v1/namespaces/ml-pipeline/services/bodywork-pipeline--serve-moel/proxy/iris/v1/score \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}'

Should return,

{
    "species_prediction":"setosa",
    "probabilities":"setosa=1.0|versicolor=0.0|virginica=0.0",
    "model_info": "DecisionTreeClassifier(class_weight='balanced', random_state=42)"
}

According to how the payload has been defined in the pipeline/serve_model.py module.

If an ingress controller is operational in your cluster, then the service can be tested via the public internet using,

$ curl http://YOUR_CLUSTERS_EXTERNAL_IP/ml-pipeline/bodywork-pipeline--serve-model/iris/v1/score \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}'

See here for instruction on how to retrieve YOUR_CLUSTERS_EXTERNAL_IP.

Running the ML Pipeline on a Schedule

If you're happy with the test results, you can schedule the workflow-controller to operate remotely on the cluster on a pre-defined schedule. For example, to setup the the workflow to run every hour, use the following command,

$ bodywork cronjob create \
    --namespace=ml-pipeline \
    --name=train-and-deploy \
    --schedule="0 * * * *" \
    --git-repo-url=https://github.com/bodywork-ml/bodywork-pipeline-with-cicd \
    --git-repo-branch=master

Each scheduled workflow will attempt to re-run the batch-job, as defined by the state of this repository's master branch at the time of execution.

To get the execution history for all train-and-deploy jobs use,

$ bodywork cronjob history \
    --namespace=ml-pipeline \
    --name=train-and-deploy

Which should return output along the lines of,

JOB_NAME                                START_TIME                    COMPLETION_TIME               ACTIVE      SUCCEEDED       FAILED
train-and-deploy-1605214260             2020-11-12 20:51:04+00:00     2020-11-12 20:52:34+00:00     0           1               0

Then to stream the logs from any given cronjob run (e.g. to debug and/or monitor for errors), use,

$ bodywork cronjob logs \
    --namespace=ml-pipeline \
    --name=train-and-deploy-1605214260

Cleaning Up

To clean-up the deployment in its entirety, delete the namespace using kubectl - e.g. by running,

$ kubectl delete ns ml-pipeline

Make this Project Your Own

This repository is a GitHub template repository that can be automatically copied into your own GitHub account by clicking the Use this template button above.

After you've cloned the template project, use official Bodywork documentation to help modify the project to meet your own requirements.