Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration of Python model kserve grpc testcase UI -> API #2155

Merged
merged 12 commits into from
Jan 13, 2025
5 changes: 1 addition & 4 deletions ods_ci/tests/Resources/CLI/ModelServing/llm.resource
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@
... triton-kserve-runtime=${LLM_RESOURCES_DIRPATH}/serving_runtimes/triton_servingruntime_{{protocol}}.yaml # robocop: disable
${DOWNLOAD_PVC_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/download_model_in_pvc.yaml
${DOWNLOAD_PVC_FILLED_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/download_model_in_pvc_filled.yaml

${DOWNLOAD_PROMPTS_PVC_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/download_prompts_in_pvc.yaml
${DOWNLOAD_PROMPTS_PVC_FILLED_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/download_prompts_in_pvc_filled.yaml
${MATCHING_RATIO}= ${60}
Expand Down Expand Up @@ -141,8 +140,8 @@
[Arguments] ${isvc_name} ${model_storage_uri} ${model_format}=caikit ${serving_runtime}=caikit-tgis-runtime
... ${kserve_mode}=${NONE} ${sa_name}=${DEFAULT_BUCKET_SA_NAME} ${canaryTrafficPercent}=${EMPTY} ${min_replicas}=1
... ${scaleTarget}=1 ${scaleMetric}=concurrency ${auto_scale}=${NONE}
... ${requests_dict}=&{EMPTY} ${limits_dict}=&{EMPTY} ${overlays}=${EMPTY} ${version}=${EMPTY}
... ${requests_dict}=&{EMPTY} ${limits_dict}=&{EMPTY} ${overlays}=${EMPTY} ${version}=${EMPTY}
IF '${auto_scale}' == '${NONE}'

Check notice

Code scanning / Robocop

There is too many arguments per continuation line ({{ arguments_count }} / {{ max_arguments_count }}) Note test

There is too many arguments per continuation line (4 / 1)
${scaleTarget}= Set Variable ${EMPTY}
${scaleMetric}= Set Variable ${EMPTY}
END
Expand Down Expand Up @@ -199,7 +198,6 @@
Log message=Using defaultDeploymentMode set in the DSC: ${mode}
END


Model Response Should Match The Expectation
[Documentation] Checks that the actual model response matches the expected answer.
... The goals are:
Expand Down Expand Up @@ -952,7 +950,6 @@
... oc patch servingruntime ${runtime} -n ${namespace} --type='json' -p='[{"op": "remove", "path": "/spec/containers/0/args/1"}]'
Should Be Equal As Integers ${rc} ${0} msg=${out}


Set Runtime Image
[Documentation] Sets up runtime variables for the Suite
[Arguments] ${gpu_type}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
name: triton-kserve-runtime
spec:
annotations:
prometheus.kserve.io/path: /metrics
prometheus.kserve.io/port: "8002"
containers:
- args:
- tritonserver
- --model-store=/mnt/models
- --grpc-port=9000
- --http-port=8080
- --allow-grpc=true
- --allow-http=true
image: nvcr.io/nvidia/tritonserver:24.10-py3
name: kserve-container
ports:
- containerPort: 9000
name: h2c
protocol: TCP
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: "1"
memory: 2Gi
protocolVersions:
- v2
- grpc-v2
supportedModelFormats:
- autoSelect: true
name: tensorrt
version: "8"
- autoSelect: true
name: tensorflow
version: "1"
- autoSelect: true
name: tensorflow
version: "2"
- autoSelect: true
name: onnx
version: "1"
- name: pytorch
version: "1"
- autoSelect: true
name: triton
version: "2"
- autoSelect: true
name: xgboost
version: "1"
- autoSelect: true
name: python
version: "1"
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@

*** Variables ***
${PYTHON_MODEL_NAME}= python
${EXPECTED_INFERENCE_GRPC_OUTPUT_PYTHON}= {"modelName":"python","modelVersion":"1","id":"1","outputs":[{"name":"OUTPUT0","datatype":"FP32","shape":["4"]},{"name":"OUTPUT1","datatype":"FP32","shape":["4"]}],"rawOutputContents":["AgAAAAAAAAAAAAAAAAAAAA==","AAQAAAAAAAAAAAAAAAAAAA=="]}

Check warning

Code scanning / Robocop

Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test

Line is too long (288/120)
${INFERENCE_GRPC_INPUT_PYTHONFILE}= tests/Resources/Files/triton/kserve-triton-python-grpc-input.json
${KSERVE_MODE}= Serverless # Serverless
${PROTOCOL_GRPC}= grpc
${EXPECTED_INFERENCE_REST_OUTPUT_PYTHON}= {"model_name":"python","model_version":"1","outputs":[{"name":"OUTPUT0","datatype":"FP32","shape":[4],"data":[0.921442985534668,0.6223347187042236,0.8059385418891907,1.2578542232513428]},{"name":"OUTPUT1","datatype":"FP32","shape":[4],"data":[0.49091365933418274,-0.027157962322235107,-0.5641784071922302,0.6906309723854065]}]}
${INFERENCE_REST_INPUT_PYTHON}= @tests/Resources/Files/triton/kserve-triton-python-rest-input.json
${KSERVE_MODE}= Serverless # Serverless
Expand All @@ -31,13 +35,15 @@
${INFERENCESERVICE_FILEPATH_NEW}= ${LLM_RESOURCES_DIRPATH}/serving_runtimes/isvc
${INFERENCESERVICE_FILLED_FILEPATH}= ${INFERENCESERVICE_FILEPATH_NEW}/isvc_filled.yaml
${KSERVE_RUNTIME_REST_NAME}= triton-kserve-runtime
${PATTERN}= https:\/\/([^\/:]+)
${PROTOBUFF_FILE}= tests/Resources/Files/triton/grpc_predict_v2.proto

Check warning

Code scanning / Robocop

Test case '{{ test_name }}' has too many keywords inside ({{ keyword_count }}/{{ max_allowed_count }}) Warning test

Test case 'Test Python Model Grpc Inference Via API (Triton on Kserve)' has too many keywords inside (18/10)

*** Test Cases ***
Test Python Model Rest Inference Via API (Triton on Kserve) # robocop: off=too-long-test-case
[Documentation] Test the deployment of python model in Kserve using Triton
[Tags] Tier2 RHOAIENG-16912
Setup Test Variables model_name=${PYTHON_MODEL_NAME} use_pvc=${FALSE} use_gpu=${FALSE}

Check warning

Code scanning / Robocop

Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test

Line is too long (128/120)
... kserve_mode=${KSERVE_MODE} model_path=triton/model_repository/
Set Project And Runtime runtime=${KSERVE_RUNTIME_REST_NAME} protocol=${PROTOCOL} namespace=${test_namespace}
... download_in_pvc=${DOWNLOAD_IN_PVC} model_name=${PYTHON_MODEL_NAME}
Expand All @@ -55,7 +61,7 @@
Remove File ${INFERENCESERVICE_FILLED_FILEPATH}
Wait For Pods To Be Ready label_selector=serving.kserve.io/inferenceservice=${PYTHON_MODEL_NAME}
... namespace=${test_namespace}
${pod_name}= Get Pod Name namespace=${test_namespace}

Check warning

Code scanning / Robocop

Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test

Line is too long (129/120)
... label_selector=serving.kserve.io/inferenceservice=${PYTHON_MODEL_NAME}
${service_port}= Extract Service Port service_name=${PYTHON_MODEL_NAME}-predictor protocol=TCP
... namespace=${test_namespace}
Expand All @@ -73,6 +79,49 @@
... AND
... Run Keyword If "${KSERVE_MODE}"=="RawDeployment" Terminate Process triton-process kill=true

Test Python Model Grpc Inference Via API (Triton on Kserve) # robocop: off=too-long-test-case
[Documentation] Test the deployment of python model in Kserve using Triton
[Tags] Tier2 RHOAIENG-16912

Setup Test Variables model_name=${PYTHON_MODEL_NAME} use_pvc=${FALSE} use_gpu=${FALSE}
... kserve_mode=${KSERVE_MODE} model_path=triton/model_repository/
Set Project And Runtime runtime=${KSERVE_RUNTIME_REST_NAME} protocol=${PROTOCOL_GRPC} namespace=${test_namespace}
... download_in_pvc=${DOWNLOAD_IN_PVC} model_name=${PYTHON_MODEL_NAME}
... storage_size=100Mi memory_request=100Mi
${requests}= Create Dictionary memory=1Gi

Check notice

Code scanning / Robocop

{{ create_keyword }} can be replaced with VAR Note test

Create Dictionary can be replaced with VAR
Compile Inference Service YAML isvc_name=${PYTHON_MODEL_NAME}
... sa_name=models-bucket-sa
... model_storage_uri=${storage_uri}
... model_format=python serving_runtime=${KSERVE_RUNTIME_REST_NAME}
... version="1"
... limits_dict=${limits} requests_dict=${requests} kserve_mode=${KSERVE_MODE}
Deploy Model Via CLI isvc_filepath=${INFERENCESERVICE_FILLED_FILEPATH}
... namespace=${test_namespace}
# File is not needed anymore after applying
Remove File ${INFERENCESERVICE_FILLED_FILEPATH}
Wait For Pods To Be Ready label_selector=serving.kserve.io/inferenceservice=${PYTHON_MODEL_NAME}
... namespace=${test_namespace}
${pod_name}= Get Pod Name namespace=${test_namespace}

Check notice

Code scanning / Robocop

Variable '{{ name }}' is assigned but not used Note test

Variable '${pod_name}' is assigned but not used
... label_selector=serving.kserve.io/inferenceservice=${PYTHON_MODEL_NAME}
${valued} ${host}= Run And Return Rc And Output oc get ksvc ${PYTHON_MODEL_NAME}-predictor -o jsonpath='{.status.url}'
Log ${valued}
${host}= Evaluate re.search(r"${PATTERN}", r"${host}").group(1) re
Log ${host}
${inference_output}= Query Model With GRPCURL host=${host} port=443
... endpoint=inference.GRPCInferenceService/ModelInfer
... json_body=@ input_filepath=${INFERENCE_GRPC_INPUT_PYTHONFILE}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why json_body=@

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed @ for passing the input file for grpcurl

Copy link
Contributor

@bdattoma bdattoma Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you should be able to run this kw with json_body=${INFERENCE_GRPC_INPUT_PYTHONFILE} and adding "@" in the INFERENCE_GRPC_INPUT_PYTHONFILE, without input_filepath

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I already tried that approach, but it didn’t work. That’s why I’m using this method. We already discussed this in another PR earlier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm that's very strange..thing is that as it is this keywork call does not make much sense with "json_body=@".
I can approve it since I don't think it's hurting or causing troubles, but i'd suggest to review the code and try to find the culprit

... insecure=${True} protobuf_file=${PROTOBUFF_FILE} json_header=${NONE}
${inference_output}= Evaluate json.dumps(${inference_output})
Log ${inference_output}
${result} ${list}= Inference Comparison ${EXPECTED_INFERENCE_GRPC_OUTPUT_PYTHON} ${inference_output}
Log ${result}
Log ${list}
[Teardown] Run Keywords
... Clean Up Test Project test_ns=${test_namespace}
... isvc_names=${models_names} wait_prj_deletion=${FALSE} kserve_mode=${KSERVE_MODE}
... AND
... Run Keyword If "${KSERVE_MODE}"=="RawDeployment" Terminate Process triton-process kill=true


*** Keywords ***
Suite Setup
Expand Down
Loading