This repository contains a Bodywork project that demonstrates how to configure a ML pipeline with CICD. The example ML pipeline has two stages:
- Run a batch job to train a model.
- Deploy the trained model as service with a REST API.
To deploy this project manually, follow the steps below.
In order to run this example project you will need access to a Kubernetes cluster. To setup a single-node test cluster on your local machine you can use minikube or docker-for-desktop. Check your access to Kubernetes by running,
$ kubectl cluster-info
Which should return the details of your cluster.
$ pip install bodywork
$ bodywork setup-namespace ml-pipeline
To test the ML pipeline, using a workflow-controller running on your local machine and interacting with your Kubernetes cluster, run,
$ bodywork deployment create \
--name=initial-deployment \
--namespace=ml-pipeline \
--git-repo-url= \
--git-repo-branch=master \
The workflow-controller logs will be streamed to your shell's standard output until the job has been successfully completed.
Service deployments are accessible via HTTP from within the cluster - they are not exposed to the public internet, unless you have installed an ingress controller in your cluster. The simplest way to test a service from your local machine, is by using a local proxy server to enable access to your cluster. This can be achieved by issuing the following command,
$ kubectl proxy
Then in a new shell, you can use the curl tool to test the service. For example,
$ curl http://localhost:8001/api/v1/namespaces/ml-pipeline/services/bodywork-pipeline--serve-moel/proxy/iris/v1/score \
--request POST \
--header "Content-Type: application/json" \
--data '{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}'
Should return,
"model_info": "DecisionTreeClassifier(class_weight='balanced', random_state=42)"
According to how the payload has been defined in the pipeline/
If an ingress controller is operational in your cluster, then the service can be tested via the public internet using,
$ curl http://YOUR_CLUSTERS_EXTERNAL_IP/ml-pipeline/bodywork-pipeline--serve-model/iris/v1/score \
--request POST \
--header "Content-Type: application/json" \
--data '{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}'
See here for instruction on how to retrieve YOUR_CLUSTERS_EXTERNAL_IP
If you're happy with the test results, you can schedule the workflow-controller to operate remotely on the cluster on a pre-defined schedule. For example, to setup the the workflow to run every hour, use the following command,
$ bodywork cronjob create \
--namespace=ml-pipeline \
--name=train-and-deploy \
--schedule="0 * * * *" \
--git-repo-url= \
Each scheduled workflow will attempt to re-run the batch-job, as defined by the state of this repository's master
branch at the time of execution.
To get the execution history for all train-and-deploy
jobs use,
$ bodywork cronjob history \
--namespace=ml-pipeline \
Which should return output along the lines of,
train-and-deploy-1605214260 2020-11-12 20:51:04+00:00 2020-11-12 20:52:34+00:00 0 1 0
Then to stream the logs from any given cronjob run (e.g. to debug and/or monitor for errors), use,
$ bodywork cronjob logs \
--namespace=ml-pipeline \
To clean-up the deployment in its entirety, delete the namespace using kubectl - e.g. by running,
$ kubectl delete ns ml-pipeline
This repository is a GitHub template repository that can be automatically copied into your own GitHub account by clicking the Use this template
button above.
After you've cloned the template project, use official Bodywork documentation to help modify the project to meet your own requirements.