Skip to content

Commit

Permalink
Migrating Python model REST protocol test on triton for Kserve ( UI -…
Browse files Browse the repository at this point in the history
…> API ) (#2133)
  • Loading branch information
Raghul-M authored Jan 10, 2025
1 parent 4d10c7f commit 7a3ef57
Show file tree
Hide file tree
Showing 5 changed files with 207 additions and 2 deletions.
61 changes: 60 additions & 1 deletion ods_ci/tests/Resources/CLI/ModelServing/llm.resource
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ ${SERVICEMESH_CR_NS}= istio-system
... vllm-runtime=${LLM_RESOURCES_DIRPATH}/serving_runtimes/vllm_servingruntime_{{protocol}}.yaml
... ovms-runtime=${LLM_RESOURCES_DIRPATH}/serving_runtimes/ovms_servingruntime_{{protocol}}.yaml
... caikit-standalone-runtime=${LLM_RESOURCES_DIRPATH}/serving_runtimes/caikit_standalone_servingruntime_{{protocol}}.yaml # robocop: disable
... triton-kserve-runtime=${LLM_RESOURCES_DIRPATH}/serving_runtimes/triton_servingruntime_{{protocol}}.yaml # robocop: disable
${DOWNLOAD_PVC_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/download_model_in_pvc.yaml
${DOWNLOAD_PVC_FILLED_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/download_model_in_pvc_filled.yaml

Expand Down Expand Up @@ -140,7 +141,7 @@ Compile Inference Service YAML
[Arguments] ${isvc_name} ${model_storage_uri} ${model_format}=caikit ${serving_runtime}=caikit-tgis-runtime
... ${kserve_mode}=${NONE} ${sa_name}=${DEFAULT_BUCKET_SA_NAME} ${canaryTrafficPercent}=${EMPTY} ${min_replicas}=1
... ${scaleTarget}=1 ${scaleMetric}=concurrency ${auto_scale}=${NONE}
... ${requests_dict}=&{EMPTY} ${limits_dict}=&{EMPTY} ${overlays}=${EMPTY}
... ${requests_dict}=&{EMPTY} ${limits_dict}=&{EMPTY} ${overlays}=${EMPTY} ${version}=${EMPTY}
IF '${auto_scale}' == '${NONE}'
${scaleTarget}= Set Variable ${EMPTY}
${scaleMetric}= Set Variable ${EMPTY}
Expand All @@ -153,6 +154,7 @@ Compile Inference Service YAML
Set Test Variable ${scaleMetric}
Set Test Variable ${canaryTrafficPercent}
Set Test Variable ${model_format}
Set Test Variable ${version}
Set Test Variable ${serving_runtime}
IF len($overlays) > 0
FOR ${index} ${overlay} IN ENUMERATE @{overlays}
Expand Down Expand Up @@ -414,6 +416,46 @@ Query Model Multiple Times
END
END

Setup Test Variables # robocop: off=too-many-calls-in-keyword
[Documentation] Sets up variables for the Suite
[Arguments] ${model_name} ${kserve_mode}=Serverless ${use_pvc}=${FALSE} ${use_gpu}=${FALSE}
... ${model_path}=${model_name}
Set Test Variable ${model_name}
${models_names}= Create List ${model_name}
Set Test Variable ${models_names}
Set Test Variable ${model_path}
Set Test Variable ${test_namespace} ${TEST_NS}-${model_name}
IF ${use_pvc}
Set Test Variable ${storage_uri} pvc://${model_name}-claim/${model_path}
ELSE
Set Test Variable ${storage_uri} s3://${S3.BUCKET_1.NAME}/${model_path}
END
IF ${use_gpu}
${supported_gpu_type}= Convert To Lowercase ${GPU_TYPE}
Set Runtime Image ${supported_gpu_type}
IF "${supported_gpu_type}" == "nvidia"
${limits}= Create Dictionary nvidia.com/gpu=1
ELSE IF "${supported_gpu_type}" == "amd"
${limits}= Create Dictionary amd.com/gpu=1
ELSE
FAIL msg=Provided GPU type is not yet supported. Only nvidia and amd gpu type are supported
END
Set Test Variable ${limits}
ELSE
Set Test Variable ${limits} &{EMPTY}
END
IF "${KSERVE_MODE}" == "RawDeployment" # robocop: off=inconsistent-variable-name
Set Test Variable ${use_port_forwarding} ${TRUE}
ELSE
Set Test Variable ${use_port_forwarding} ${FALSE}
END
Set Log Level NONE
Set Test Variable ${access_key_id} ${S3.AWS_ACCESS_KEY_ID}
Set Test Variable ${access_key} ${S3.AWS_SECRET_ACCESS_KEY}
Set Test Variable ${endpoint} ${MODELS_BUCKET.ENDPOINT}
Set Test Variable ${region} ${MODELS_BUCKET.REGION}
Set Log Level INFO

Compile Deploy And Query LLM model
[Documentation] Group together the test steps for preparing, deploying
... and querying a model
Expand Down Expand Up @@ -909,3 +951,20 @@ Remove Model Mount Path From Runtime
${rc} ${out}= Run And Return Rc And Output
... oc patch servingruntime ${runtime} -n ${namespace} --type='json' -p='[{"op": "remove", "path": "/spec/containers/0/args/1"}]'
Should Be Equal As Integers ${rc} ${0} msg=${out}


Set Runtime Image
[Documentation] Sets up runtime variables for the Suite
[Arguments] ${gpu_type}
IF "${RUNTIME_IMAGE}" == "${EMPTY}"
IF "${gpu_type}" == "nvidia"
Set Test Variable ${runtime_image} quay.io/modh/vllm@sha256:c86ff1e89c86bc9821b75d7f2bbc170b3c13e3ccf538bf543b1110f23e056316
ELSE IF "${gpu_type}" == "amd"
Set Test Variable ${runtime_image} quay.io/modh/vllm@sha256:10f09eeca822ebe77e127aad7eca2571f859a5536a6023a1baffc6764bcadc6e
ELSE
FAIL msg=Provided GPU type is not yet supported. Only nvidia and amd gpu type are supported
END
ELSE
Log To Console msg= Using the image provided from terminal
END

Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ spec:
volumeMounts: []
modelFormat:
name: ${model_format}
version: ${version}
runtime: ${serving_runtime}
storageUri: ${model_storage_uri}
volumes: []
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
name: triton-kserve-runtime
spec:
annotations:
prometheus.kserve.io/path: /metrics
prometheus.kserve.io/port: "8002"
containers:
- args:
- tritonserver
- --model-store=/mnt/models
- --grpc-port=9000
- --http-port=8080
- --allow-grpc=true
- --allow-http=true
image: nvcr.io/nvidia/tritonserver:23.05-py3
name: kserve-container
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: "1"
memory: 2Gi
ports:
- containerPort: 8080
protocol: TCP
protocolVersions:
- v2
- grpc-v2
supportedModelFormats:
- autoSelect: true
name: tensorrt
version: "8"
- autoSelect: true
name: tensorflow
version: "1"
- autoSelect: true
name: tensorflow
version: "2"
- autoSelect: true
name: onnx
version: "1"
- name: pytorch
version: "1"
- autoSelect: true
name: triton
version: "2"
- autoSelect: true
name: xgboost
version: "1"
- autoSelect: true
name: python
version: "1"
Original file line number Diff line number Diff line change
Expand Up @@ -339,7 +339,7 @@ Get Model Inference
${rc} ${url}= Run And Return Rc And Output
... oc get ksvc ${model_name}-predictor -n ${project_title} -o jsonpath='{.status.url}'
Should Be Equal As Integers ${rc} 0
${curl_cmd}= Set Variable curl -s ${url}${end_point} -d ${inference_input}
${curl_cmd}= Set Variable curl -s ${url}${end_point} -d ${inference_input} --cacert openshift_ca_istio_knative.crt
ELSE IF '${kserve_mode}' == 'RawDeployment'
${url}= Set Variable http://localhost:${service_port}${end_point}
${curl_cmd}= Set Variable curl -s ${url} -d ${inference_input} --cacert openshift_ca_istio_knative.crt
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
*** Settings ***
Documentation Suite of test cases for Triton in Kserve
Library OperatingSystem
Library ../../../../libs/Helpers.py
Resource ../../../Resources/Page/ODH/JupyterHub/HighAvailability.robot
Resource ../../../Resources/Page/ODH/ODHDashboard/ODHModelServing.resource
Resource ../../../Resources/Page/ODH/ODHDashboard/ODHDataScienceProject/Projects.resource
Resource ../../../Resources/Page/ODH/ODHDashboard/ODHDataScienceProject/DataConnections.resource
Resource ../../../Resources/Page/ODH/ODHDashboard/ODHDataScienceProject/ModelServer.resource
Resource ../../../Resources/Page/ODH/ODHDashboard/ODHDashboardSettingsRuntimes.resource
Resource ../../../Resources/Page/ODH/Monitoring/Monitoring.resource
Resource ../../../Resources/OCP.resource
Resource ../../../Resources/CLI/ModelServing/modelmesh.resource
Resource ../../../Resources/Common.robot
Resource ../../../Resources/CLI/ModelServing/llm.resource
Suite Setup Suite Setup
Suite Teardown Suite Teardown
Test Tags Kserve

*** Variables ***
${PYTHON_MODEL_NAME}= python
${EXPECTED_INFERENCE_REST_OUTPUT_PYTHON}= {"model_name":"python","model_version":"1","outputs":[{"name":"OUTPUT0","datatype":"FP32","shape":[4],"data":[0.921442985534668,0.6223347187042236,0.8059385418891907,1.2578542232513428]},{"name":"OUTPUT1","datatype":"FP32","shape":[4],"data":[0.49091365933418274,-0.027157962322235107,-0.5641784071922302,0.6906309723854065]}]}
${INFERENCE_REST_INPUT_PYTHON}= @tests/Resources/Files/triton/kserve-triton-python-rest-input.json
${KSERVE_MODE}= Serverless # Serverless
${PROTOCOL}= http
${TEST_NS}= tritonmodel
${DOWNLOAD_IN_PVC}= ${FALSE}
${MODELS_BUCKET}= ${S3.BUCKET_1}
${LLM_RESOURCES_DIRPATH}= tests/Resources/Files/llm
${INFERENCESERVICE_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/serving_runtimes/base/isvc.yaml
${INFERENCESERVICE_FILEPATH_NEW}= ${LLM_RESOURCES_DIRPATH}/serving_runtimes/isvc
${INFERENCESERVICE_FILLED_FILEPATH}= ${INFERENCESERVICE_FILEPATH_NEW}/isvc_filled.yaml
${KSERVE_RUNTIME_REST_NAME}= triton-kserve-runtime


*** Test Cases ***
Test Python Model Rest Inference Via API (Triton on Kserve) # robocop: off=too-long-test-case
[Documentation] Test the deployment of python model in Kserve using Triton
[Tags] Tier2 RHOAIENG-16912
Setup Test Variables model_name=${PYTHON_MODEL_NAME} use_pvc=${FALSE} use_gpu=${FALSE}
... kserve_mode=${KSERVE_MODE} model_path=triton/model_repository/
Set Project And Runtime runtime=${KSERVE_RUNTIME_REST_NAME} protocol=${PROTOCOL} namespace=${test_namespace}
... download_in_pvc=${DOWNLOAD_IN_PVC} model_name=${PYTHON_MODEL_NAME}
... storage_size=100Mi memory_request=100Mi
${requests}= Create Dictionary memory=1Gi
Compile Inference Service YAML isvc_name=${PYTHON_MODEL_NAME}
... sa_name=models-bucket-sa
... model_storage_uri=${storage_uri}
... model_format=python serving_runtime=${KSERVE_RUNTIME_REST_NAME}
... version="1"
... limits_dict=${limits} requests_dict=${requests} kserve_mode=${KSERVE_MODE}
Deploy Model Via CLI isvc_filepath=${INFERENCESERVICE_FILLED_FILEPATH}
... namespace=${test_namespace}
# File is not needed anymore after applying
Remove File ${INFERENCESERVICE_FILLED_FILEPATH}
Wait For Pods To Be Ready label_selector=serving.kserve.io/inferenceservice=${PYTHON_MODEL_NAME}
... namespace=${test_namespace}
${pod_name}= Get Pod Name namespace=${test_namespace}
... label_selector=serving.kserve.io/inferenceservice=${PYTHON_MODEL_NAME}
${service_port}= Extract Service Port service_name=${PYTHON_MODEL_NAME}-predictor protocol=TCP
... namespace=${test_namespace}
IF "${KSERVE_MODE}"=="RawDeployment"
Start Port-forwarding namespace=${test_namespace} pod_name=${pod_name} local_port=${service_port}
... remote_port=${service_port} process_alias=triton-process
END
Verify Model Inference With Retries model_name=${PYTHON_MODEL_NAME} inference_input=${INFERENCE_REST_INPUT_PYTHON}
... expected_inference_output=${EXPECTED_INFERENCE_REST_OUTPUT_PYTHON} project_title=${test_namespace}
... deployment_mode=Cli kserve_mode=${KSERVE_MODE} service_port=${service_port}
... end_point=/v2/models/${model_name}/infer retries=3
[Teardown] Run Keywords
... Clean Up Test Project test_ns=${test_namespace}
... isvc_names=${models_names} wait_prj_deletion=${FALSE} kserve_mode=${KSERVE_MODE}
... AND
... Run Keyword If "${KSERVE_MODE}"=="RawDeployment" Terminate Process triton-process kill=true


*** Keywords ***
Suite Setup
[Documentation] Suite setup keyword
Set Library Search Order SeleniumLibrary
Skip If Component Is Not Enabled kserve
RHOSi Setup
Load Expected Responses
Set Default Storage Class In GCP default=ssd-csi

Suite Teardown
[Documentation] Suite teardown keyword
Set Default Storage Class In GCP default=standard-csi
RHOSi Teardown

0 comments on commit 7a3ef57

Please sign in to comment.