diff --git a/modules/chapter1/pages/section1.adoc b/modules/chapter1/pages/section1.adoc index 888951a..d878608 100644 --- a/modules/chapter1/pages/section1.adoc +++ b/modules/chapter1/pages/section1.adoc @@ -47,7 +47,7 @@ A model server runtime is the execution environment or platform where a trained The model server runtime can be a part of a larger deployment framework or service that includes features such as scalability, versioning, monitoring, and security. Examples of model server runtimes include TensorFlow Serving, TorchServe, and ONNX Runtime. These runtimes support the deployment of models trained using popular machine learning frameworks and provide a standardized way to serve predictions over APIs (Application Programming Interfaces). -=== Inference Engine: +=== Inference Engine An inference engine is a component responsible for executing the forward pass of a machine learning model to generate predictions based on input data. It is a crucial part of the model server runtime and is specifically designed for performing inference tasks efficiently. The inference engine takes care of optimizations, such as hardware acceleration and parallelization, to ensure that predictions are made quickly and with minimal resource utilization. The inference engine may be integrated into the model server runtime or work alongside it, depending on the specific architecture. For example, TensorFlow Serving incorporates TensorFlow's inference engine, and ONNX Runtime serves as both a runtime and an inference engine for models in the Open Neural Network Exchange (ONNX) format. @@ -55,15 +55,15 @@ The inference engine may be integrated into the model server runtime or work alo **Relationship**: In summary, the model server runtime provides the overall environment for hosting and managing machine learning models in production, while the inference engine is responsible for the actual computation of predictions during inference. The two work together to deliver a scalable, efficient, and reliable solution for serving machine learning models in real-world applications. The choice of model server runtime and inference engine depends on factors such as the machine learning framework used, deployment requirements, and the specific optimizations needed for the target hardware. -=== Unravel The Runtime +== Unravel The Runtime When deploying machine learning models, we need to deploy a container that serves a **Runtime** and uses a **Model** to perform predictions, consider the following example: -==== Train a model +=== Train a Model Using a RHOAI instance, let us train and deploy an example. -. In a data science project, create a `Standard Data Science`workbench. +. In a data science project, create a `Standard Data Science` workbench. Then, open the workbench to go to the JupyterLab interface. + image::workbench_options.png[Workbench Options] @@ -102,10 +102,10 @@ There are different formats and libraries to export the model, in this case we a * Torch -The use of either of those formats depend on the target server runtime, some of them are proven to be more efficient than others for certain type of training algorithms and model sizes. +The use of either of those formats depends on the target server runtime, some of them are proven to be more efficient than others for certain type of training algorithms and model sizes. ==== -===== Use the model in another notebook +=== Use the Model in Another Notebook The model can be deserialized in another notebook, and used to generate a prediction: @@ -127,7 +127,7 @@ At this moment the model can be exported and imported in other projects for its For this section, you need Podman to create an image, and a registry to upload the resulting image. -=== web application that uses the model +=== Web application that uses the model The pickle model that we previously exported can be used in a Flask application. In this section we present an example Flask application that uses the model. @@ -251,33 +251,39 @@ CMD ["app:app"] . Build and push the image to an image registry + -```shell -podman login quay.io -podman build -t purchase-predictor:1.0 . -podman tag purchase-predictor:1.0 quay.io/user_name/purchase-predictor:1.0 -podman push quay.io/user_name/purchase-predictor:1.0 -``` + +[source,console] +---- +$ podman login quay.io +$ podman build -t purchase-predictor:1.0 . +$ podman tag purchase-predictor:1.0 quay.io/user_name/purchase-predictor:1.0 +$ podman push quay.io/user_name/purchase-predictor:1.0 +---- + After you push the image, open quay.io in your browser and make the image public. . Deploy the model image to **OpenShift** + -```shell -oc login api.cluster.example.com:6443 -oc new-project model-deploy -oc new-app --name purchase-predictor quay.io/user_name/purchase-predictor:1.0 -oc expose service purchase-predictor -``` +[source,console] +---- +$ oc login api.cluster.example.com:6443 +$ oc new-project model-deploy +$ oc new-app --name purchase-predictor quay.io/user_name/purchase-predictor:1.0 +$ oc expose service purchase-predictor +---- Now we can use the Flask application with some commands such as: -```shell -curl http://purchase-predictor-model-deploy.apps.cluster.example.com/health -ok% -curl http://purchase-predictor-model-deploy.apps.cluster.example.com/info +[source,console] +---- +$ curl http://purchase-predictor-model-deploy.apps.cluster.example.com/health +ok +$ curl http://purchase-predictor-model-deploy.apps.cluster.example.com/info {"name":"Time to purchase amount predictor","version":"v1.0.0"} -curl -d '{"time":4}' -H "Content-Type: application/json" -X POST http://purchase-predictor-model-deploy.apps.cluster.example.com/predict +$ curl -d '{"time":4}' -H "Content-Type: application/json" \ +> -X POST \ +> http://purchase-predictor-model-deploy.apps.cluster.example.com/predict {"prediction":34,"status":200} -``` +---- [IMPORTANT] ==== diff --git a/modules/chapter1/pages/section2.adoc b/modules/chapter1/pages/section2.adoc index 42a5b50..83c4165 100644 --- a/modules/chapter1/pages/section2.adoc +++ b/modules/chapter1/pages/section2.adoc @@ -8,40 +8,44 @@ https://min.io[MinIO] is a high-performance, S3 compatible object store. It is b We will need an S3 solution to share the model from training to deploy, in this exercise we will prepare MinIO to be such S3 solution. -. In OpenShift, create a new namespace with the name **object-datastore** +. In OpenShift, create a new namespace with the name **object-datastore**. + -```shell -oc new-project object-datastore -``` +[source,console] +---- +$ oc new-project object-datastore +---- -. Run the following yaml to install MinIO +. Run the following yaml to install MinIO: + -```shell -curl https://raw.githubusercontent.com/RedHatQuickCourses/rhods-qc-apps/main/4.rhods-deploy/chapter2/minio.yml -oc apply -f ./minio.yml -n object-datastore -``` +[source,console] +---- +$ curl https://raw.githubusercontent.com/RedHatQuickCourses/rhods-qc-apps/main/4.rhods-deploy/chapter2/minio.yml +$ oc apply -f ./minio.yml -n object-datastore +---- -. Get the route to the MinIO dashboard +. Get the route to the MinIO dashboard. + -```shell -oc get routes -n object-datastore | grep minio-ui | awk '{print $2}' -``` +[source,console] +---- +$ oc get routes -n object-datastore | grep minio-ui | awk '{print $2}' +---- + [INFO] ==== Use this route to navigate to the S3 dashboard using a browser. With the browser, you will be able to create buckets, upload files, and navigate the S3 contents. ==== -. Get the route to the MinIO API +. Get the route to the MinIO API. + -```shell -oc get routes -n object-datastore | grep minio-api | awk '{print $2}' -``` +[source,console] +---- +$ oc get routes -n object-datastore | grep minio-api | awk '{print $2}' +---- + [INFO] ==== Use this route as the S3 API endpoint. Basically, this is the URL that we will use when creating a data connection to the S3 in RHOAI. -==== +==== == Training The Model We will use the iris dataset model for this excercise. @@ -55,9 +59,10 @@ It is recommended to use a workbench that was created with the **Standard Data S . Make sure that the workbench environment serves the required python packages for the notebook to run, for this to happen, open a terminal and run the following command to verify that the packages are already installed: + -```shell - pip install -r /opt/app-root/src/rhods-qc-apps/4.rhods-deploy/chapter2/requirements.txt -``` +[source,console] +---- +$ pip install -r /opt/app-root/src/rhods-qc-apps/4.rhods-deploy/chapter2/requirements.txt +---- [TIP] ==== @@ -95,7 +100,7 @@ Make sure to create a new path in your bucket, and upload to such path, not to r . In the RHOAI dashboard, create a project named **iris-project**. -. In the **Data Connections** section, create a Data Connection to your S3 +. In the **Data Connections** section, create a Data Connection to your S3. + image::add-minio-iris-data-connection.png[Add iris data connection from minio] + @@ -104,6 +109,7 @@ image::add-minio-iris-data-connection.png[Add iris data connection from minio] - The credentials (Access Key/Secret Key) are `minio`/`minio123`. - Make sure to use the API route, not the UI route (`oc get routes -n object-datastore | grep minio-api | awk '{print $2}'`). - The region is not important when using MinIO, this is a property that has effects when using AWS S3. +However, you must enter a non-empty value to prevent problems with model serving. - Mind typos for the bucket name. - You don't have to select a workbench to attach this data connection to. ==== @@ -160,7 +166,14 @@ s3.download_file(bucket_name, s3_data_path, "my/local/path/dataset.csv") + image::add-server-button.png[add server] -. Fill the form with the example values: +. Fill the form with the following values: ++ +-- +* Server name: `iris-model-server`. +* Serving runtime: `OpenVINO Model Server`. +* Select the checkboxes to expose the models through an external route, and to enable token authentication. +Enter `iris-serviceaccount` as the service account name. +-- + image::add-server-form-example.png[Add Server Form] + @@ -198,7 +211,14 @@ image::model-server-with-token.png[Model Server with token] + image::deploy-model-button.png[Deploy Model button] -. Fill the **Deploy Model** from as in the example: +. Fill the **Deploy Model** form. ++ +-- +* Model name: `iris-model` +* Model framework: `onnx - 1` +* Model location data connection: `iris-data-connection` +* Model location path: `iris` +-- + image::deploy-model-form.png[Deploy Model form] @@ -208,11 +228,12 @@ image::deploy-model-success.png[Deploy model success] . Observe and monitor the assets created in your OpenShift **iris-project** namespace. + -```shell -oc get routes -n iris-project -oc get secrets -n iris-project | grep iris-model -oc get events -n iris-project -``` +[source,console] +---- +$ oc get routes -n iris-project +$ oc get secrets -n iris-project | grep iris-model +$ oc get events -n iris-project +---- + image::iris-project-events.png[Iris project events] + @@ -225,23 +246,28 @@ Deploying a **Model Server** triggers a **ReplicaSet** with **ModelMesh**, which Now that the model is ready to use, we can make an inference using the REST API -. Assign the route to an environment variable in your local machine, so that we can use it in our curl commands +. Assign the route to an environment variable in your local machine, so that we can use it in our curl commands. + -```shell -export IRIS_ROUTE=https://$(oc get routes -n iris-project | grep iris-model | awk '{print $2}') -``` +[source,console] +---- +$ export IRIS_ROUTE=https://$(oc get routes -n iris-project | grep iris-model | awk '{print $2}') +---- -. Assign an authentication token to an environment variable in your local machine +. Assign an authentication token to an environment variable in your local machine. + -```shell -export TOKEN=$(oc whoami -t) -``` +[source,console] +---- +$ export TOKEN=$(oc whoami -t) +---- -. Request an inference with the REST API +. Request an inference with the REST API. + -```shell -curl -H "Authorization: Bearer $TOKEN" $IRIS_ROUTE/v2/models/iris-model/infer -X POST --data '{"inputs" : [{"name" : "X","shape" : [ 1, 4 ],"datatype" : "FP32","data" : [ 3, 4, 3, 2 ]}],"outputs" : [{"name" : "output0"}]}' -``` +[source,console] +---- +$ curl -H "Authorization: Bearer $TOKEN" $IRIS_ROUTE/v2/models/iris-model/infer \ + -X POST \ + --data '{"inputs" : [{"name" : "X","shape" : [ 1, 4 ],"datatype" : "FP32","data" : [ 3, 4, 3, 2 ]}],"outputs" : [{"name" : "output0"}]}' +---- The result of using the inference service looks like the following output: ```json diff --git a/modules/chapter1/pages/section3.adoc b/modules/chapter1/pages/section3.adoc index ca65424..619a00e 100644 --- a/modules/chapter1/pages/section3.adoc +++ b/modules/chapter1/pages/section3.adoc @@ -137,7 +137,15 @@ image::runtimes-list.png[Runtimes List] + image::add-custom-model-server.png[Add server] -. Fill up the form as in the following example, notice how **Triton runtime 23.05** is one of the available options for the **Serving runtime** dropdown. +. Create the model server with the following values: ++ +-- +* Server name: `iris-custom-server`. +* Serving runtime: `Triton runtime 23.05`. +This is the newly added runtime. +* Activate the external route and the authentication. +Use `custom-server-sa` as the service account name. +-- + image:custom-model-server-form.png[Add model server form] @@ -151,7 +159,14 @@ image::custom-runtime.png[Iris custom server] + image::iris-custom-deploy-model.png[Deploy Model] -. Fill up the **Deploy Model** form as in the following example: +. Fill up the **Deploy Model** form: ++ +-- +* Model name: `iris-custom-model` +* Model framework: `onnx - 1` +* Model location data connection: `iris-data-connection` +* Model location path: `iris` +-- + image::iris-custom-deploy-model-form.png[Deploy model form] + @@ -169,23 +184,28 @@ image::triton-server-running.png[Triton server running] Now that the model is ready to use, we can make an inference using the REST API -. Assign the route to an environment variable in your local machine, so that we can use it in our curl commands +. Assign the route to an environment variable in your local machine, so that we can use it in our curl commands. + -```shell -export IRIS_ROUTE=https://$(oc get routes -n iris-project | grep iris-custom-model | awk '{print $2}') -``` +[source,console] +---- +$ export IRIS_ROUTE=https://$(oc get routes -n iris-project | grep iris-custom-model | awk '{print $2}') +---- -. Assign an authentication token to an environment variable in your local machine +. Assign an authentication token to an environment variable in your local machine. + -```shell -export TOKEN=$(oc whoami -t) -``` +[source,console] +---- +$ export TOKEN=$(oc whoami -t) +---- -. Request an inference with the REST API +. Request an inference with the REST API. + -```shell -curl -H "Authorization: Bearer $TOKEN" $IRIS_ROUTE/v2/models/iris-custom-model/infer -X POST --data '{"inputs" : [{"name" : "X","shape" : [ 1, 4 ],"datatype" : "FP32","data" : [ 3, 4, 3, 2 ]}]}' -``` +[source,console] +---- +$ curl -H "Authorization: Bearer $TOKEN" \ + $IRIS_ROUTE/v2/models/iris-custom-model/infer -X POST \ + --data '{"inputs" : [{"name" : "X","shape" : [ 1, 4 ],"datatype" : "FP32","data" : [ 3, 4, 3, 2 ]}]}' +---- . The result received from the inference service looks like the following: +