This allows you to specify a model object via the URI (Uniform Resource Identifier) of the model object exposed via an http
or https
endpoint.
This storageUri
option supports single file models, like sklearn
which is specified by a joblib file, or artifacts (e.g. tar
or zip
) which contain all the necessary dependencies for other model types (e.g. tensorflow
or pytorch
). Here, we'll show examples from both of the above.
- Your ~/.kube/config should point to a cluster with KFServing installed.
- Your cluster's Istio Ingress gateway must be network accessible.
- Your cluster's Istio Egress gateway must allow http / https traffic
Here, we'll train a simple iris model. Please note that kfserving
requires sklearn==0.20.3
.
from sklearn import svm
from sklearn import datasets
import joblib
def train(X, y):
clf = svm.SVC(gamma='auto')
clf.fit(X, y)
return clf
def freeze(clf, path='../frozen'):
joblib.dump(clf, f'{path}/model.joblib')
return True
if __name__ == '__main__':
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf = train(X, y)
freeze(clf)
Now, you'll need to take that frozen model object and put it somewhere on the web to expose it. For instance, pushing the model.joblib
file to some repo on GitHub.
apiVersion: serving.kubeflow.org/v1alpha2
kind: InferenceService
metadata:
name: sklearn-from-uri
spec:
default:
predictor:
sklearn:
storageUri: https://github.com/tduffy000/kfserving-uri-examples/blob/master/sklearn/frozen/model.joblib?raw=true
Apply the CRD,
kubectl apply -f sklearn_uri.yaml
Expected Output
$ inferenceservice.serving.kubeflow.org/sklearn-from-uri created
The first is to determine the ingress IP and ports and set INGRESS_HOST
and INGRESS_PORT
.
Now, if everything went according to plan you should be able to hit the endpoint exposing the model we just uploaded.
MODEL_NAME=sklearn-from-uri
INPUT_PATH=@./input.json
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict -d $INPUT_PATH
Expected Output
$ * Trying 10.0.1.16...
* TCP_NODELAY set
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to 10.0.1.16 (10.0.1.16) port 30749 (#0)
> POST /v1/models/sklearn-from-uri:predict HTTP/1.1
> Host: sklearn-from-uri.kfserving-uri-storage.example.com
> User-Agent: curl/7.58.0
> Accept: */*
> Content-Length: 86
> Content-Type: application/x-www-form-urlencoded
>
} [86 bytes data]
* upload completely sent off: 86 out of 86 bytes
< HTTP/1.1 200 OK
< content-length: 23
< content-type: application/json; charset=UTF-8
< date: Thu, 06 Aug 2020 23:13:42 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 7
<
{ [23 bytes data]
100 109 100 23 100 86 605 2263 --:--:-- --:--:-- --:--:-- 2868
* Connection #0 to host 10.0.1.16 left intact
{
"predictions": [
1,
1
]
}
This will serve as an example of the ability to also pull in a tarball containing all of the
required model dependencies, for instance tensorflow
requires multiple files in a strict directory structure in order to be servable.
from sklearn import datasets
import numpy as np
import tensorflow as tf
def _ohe(targets):
y = np.zeros((150, 3))
for i, label in enumerate(targets):
y[i, label] = 1.0
return y
def train(X, y, epochs, batch_size=16):
model = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(4,)),
tf.keras.layers.Dense(16, activation=tf.nn.relu),
tf.keras.layers.Dense(16, activation=tf.nn.relu),
tf.keras.layers.Dense(3, activation='softmax')
])
model.compile(tf.keras.optimizers.RMSprop(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X, y, epochs=epochs)
return model
def freeze(model, path='../frozen'):
model.save(f'{path}/0001')
return True
if __name__ == '__main__':
iris = datasets.load_iris()
X, targets = iris.data, iris.target
y = _ohe(targets)
model = train(X, y, epochs=50)
freeze(model)
The post-training procedure here is a bit different. Instead of directly pushing the frozen output to some URI, we'll need to package them into a tarball. To do so,
cd ../frozen
tar -cvf artifacts.tar 0001/
gzip < artifacts.tar > artifacts.tgz
Where we assume the 0001/
directory has the structure:
|-- 0001/
|-- saved_model.pb
|-- variables/
|--- variables.data-00000-of-00001
|--- variables.index
Note that building the tarball from the directory specifying a version number is required for tensorflow
.
Now, you can either push the .tar
or .tgz
file to some remote uri.
And again, if everything went to plan we should be able to pull down the tarball and expose the endpoint.
apiVersion: serving.kubeflow.org/v1alpha2
kind: InferenceService
metadata:
name: tensorflow-from-uri-gzip
spec:
default:
predictor:
tensorflow:
storageUri: https://raw.githubusercontent.com/tduffy000/kfserving-uri-examples/master/tensorflow/frozen/model_artifacts.tar.gz
Apply the CRD,
kubectl apply -f tensorflow_uri.yaml
Expected Output
$ inferenceservice.serving.kubeflow.org/tensorflow-from-uri created
Again, make sure to first determine the ingress IP and ports and set INGRESS_HOST
and INGRESS_PORT
.
Now that our endpoint is up and running, we can get some predictions.
MODEL_NAME=tensorflow-from-uri
INPUT_PATH=@./input.json
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict -d $INPUT_PATH
Expected Output
$ * Trying 10.0.1.16...
* TCP_NODELAY set
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to 10.0.1.16 (10.0.1.16) port 30749 (#0)
> POST /v1/models/tensorflow-from-uri:predict HTTP/1.1
> Host: tensorflow-from-uri.default.example.com
> User-Agent: curl/7.58.0
> Accept: */*
> Content-Length: 86
> Content-Type: application/x-www-form-urlencoded
>
} [86 bytes data]
* upload completely sent off: 86 out of 86 bytes
< HTTP/1.1 200 OK
< content-length: 112
< content-type: application/json
< date: Thu, 06 Aug 2020 23:21:19 GMT
< x-envoy-upstream-service-time: 151
< server: istio-envoy
<
{ [112 bytes data]
100 198 100 112 100 86 722 554 --:--:-- --:--:-- --:--:-- 1285
* Connection #0 to host 10.0.1.16 left intact
{
"predictions": [
[
0.0204100646,
0.680984616,
0.298605353
],
[
0.0296604875,
0.658412039,
0.311927497
]
]
}