-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migration of Python model kserve grpc testcase UI -> API #2155
Changes from all commits
8babf13
6f5cf12
2a191de
79593a7
bd602f6
2bc046a
9f8be17
ef24191
3986728
04f5711
b9ebbf9
9164530
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
apiVersion: serving.kserve.io/v1alpha1 | ||
kind: ServingRuntime | ||
metadata: | ||
name: triton-kserve-runtime | ||
spec: | ||
annotations: | ||
prometheus.kserve.io/path: /metrics | ||
prometheus.kserve.io/port: "8002" | ||
containers: | ||
- args: | ||
- tritonserver | ||
- --model-store=/mnt/models | ||
- --grpc-port=9000 | ||
- --http-port=8080 | ||
- --allow-grpc=true | ||
- --allow-http=true | ||
image: nvcr.io/nvidia/tritonserver:24.10-py3 | ||
name: kserve-container | ||
ports: | ||
- containerPort: 9000 | ||
name: h2c | ||
protocol: TCP | ||
resources: | ||
limits: | ||
cpu: "1" | ||
memory: 2Gi | ||
requests: | ||
cpu: "1" | ||
memory: 2Gi | ||
protocolVersions: | ||
- v2 | ||
- grpc-v2 | ||
supportedModelFormats: | ||
- autoSelect: true | ||
name: tensorrt | ||
version: "8" | ||
- autoSelect: true | ||
name: tensorflow | ||
version: "1" | ||
- autoSelect: true | ||
name: tensorflow | ||
version: "2" | ||
- autoSelect: true | ||
name: onnx | ||
version: "1" | ||
- name: pytorch | ||
version: "1" | ||
- autoSelect: true | ||
name: triton | ||
version: "2" | ||
- autoSelect: true | ||
name: xgboost | ||
version: "1" | ||
- autoSelect: true | ||
name: python | ||
version: "1" |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,10 @@ | |
|
||
*** Variables *** | ||
${PYTHON_MODEL_NAME}= python | ||
${EXPECTED_INFERENCE_GRPC_OUTPUT_PYTHON}= {"modelName":"python","modelVersion":"1","id":"1","outputs":[{"name":"OUTPUT0","datatype":"FP32","shape":["4"]},{"name":"OUTPUT1","datatype":"FP32","shape":["4"]}],"rawOutputContents":["AgAAAAAAAAAAAAAAAAAAAA==","AAQAAAAAAAAAAAAAAAAAAA=="]} | ||
${INFERENCE_GRPC_INPUT_PYTHONFILE}= tests/Resources/Files/triton/kserve-triton-python-grpc-input.json | ||
${KSERVE_MODE}= Serverless # Serverless | ||
${PROTOCOL_GRPC}= grpc | ||
${EXPECTED_INFERENCE_REST_OUTPUT_PYTHON}= {"model_name":"python","model_version":"1","outputs":[{"name":"OUTPUT0","datatype":"FP32","shape":[4],"data":[0.921442985534668,0.6223347187042236,0.8059385418891907,1.2578542232513428]},{"name":"OUTPUT1","datatype":"FP32","shape":[4],"data":[0.49091365933418274,-0.027157962322235107,-0.5641784071922302,0.6906309723854065]}]} | ||
${INFERENCE_REST_INPUT_PYTHON}= @tests/Resources/Files/triton/kserve-triton-python-rest-input.json | ||
${KSERVE_MODE}= Serverless # Serverless | ||
|
@@ -31,13 +35,15 @@ | |
${INFERENCESERVICE_FILEPATH_NEW}= ${LLM_RESOURCES_DIRPATH}/serving_runtimes/isvc | ||
${INFERENCESERVICE_FILLED_FILEPATH}= ${INFERENCESERVICE_FILEPATH_NEW}/isvc_filled.yaml | ||
${KSERVE_RUNTIME_REST_NAME}= triton-kserve-runtime | ||
${PATTERN}= https:\/\/([^\/:]+) | ||
${PROTOBUFF_FILE}= tests/Resources/Files/triton/grpc_predict_v2.proto | ||
|
||
Check warning Code scanning / Robocop Test case '{{ test_name }}' has too many keywords inside ({{ keyword_count }}/{{ max_allowed_count }}) Warning test
Test case 'Test Python Model Grpc Inference Via API (Triton on Kserve)' has too many keywords inside (18/10)
|
||
|
||
*** Test Cases *** | ||
Test Python Model Rest Inference Via API (Triton on Kserve) # robocop: off=too-long-test-case | ||
[Documentation] Test the deployment of python model in Kserve using Triton | ||
[Tags] Tier2 RHOAIENG-16912 | ||
Setup Test Variables model_name=${PYTHON_MODEL_NAME} use_pvc=${FALSE} use_gpu=${FALSE} | ||
Check warning Code scanning / Robocop Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test
Line is too long (128/120)
|
||
... kserve_mode=${KSERVE_MODE} model_path=triton/model_repository/ | ||
Set Project And Runtime runtime=${KSERVE_RUNTIME_REST_NAME} protocol=${PROTOCOL} namespace=${test_namespace} | ||
... download_in_pvc=${DOWNLOAD_IN_PVC} model_name=${PYTHON_MODEL_NAME} | ||
|
@@ -55,7 +61,7 @@ | |
Remove File ${INFERENCESERVICE_FILLED_FILEPATH} | ||
Wait For Pods To Be Ready label_selector=serving.kserve.io/inferenceservice=${PYTHON_MODEL_NAME} | ||
... namespace=${test_namespace} | ||
${pod_name}= Get Pod Name namespace=${test_namespace} | ||
Check warning Code scanning / Robocop Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test
Line is too long (129/120)
|
||
... label_selector=serving.kserve.io/inferenceservice=${PYTHON_MODEL_NAME} | ||
${service_port}= Extract Service Port service_name=${PYTHON_MODEL_NAME}-predictor protocol=TCP | ||
... namespace=${test_namespace} | ||
|
@@ -73,6 +79,49 @@ | |
... AND | ||
... Run Keyword If "${KSERVE_MODE}"=="RawDeployment" Terminate Process triton-process kill=true | ||
|
||
Test Python Model Grpc Inference Via API (Triton on Kserve) # robocop: off=too-long-test-case | ||
[Documentation] Test the deployment of python model in Kserve using Triton | ||
[Tags] Tier2 RHOAIENG-16912 | ||
|
||
Setup Test Variables model_name=${PYTHON_MODEL_NAME} use_pvc=${FALSE} use_gpu=${FALSE} | ||
... kserve_mode=${KSERVE_MODE} model_path=triton/model_repository/ | ||
Set Project And Runtime runtime=${KSERVE_RUNTIME_REST_NAME} protocol=${PROTOCOL_GRPC} namespace=${test_namespace} | ||
... download_in_pvc=${DOWNLOAD_IN_PVC} model_name=${PYTHON_MODEL_NAME} | ||
... storage_size=100Mi memory_request=100Mi | ||
${requests}= Create Dictionary memory=1Gi | ||
Check notice Code scanning / Robocop {{ create_keyword }} can be replaced with VAR Note test
Create Dictionary can be replaced with VAR
|
||
Compile Inference Service YAML isvc_name=${PYTHON_MODEL_NAME} | ||
... sa_name=models-bucket-sa | ||
... model_storage_uri=${storage_uri} | ||
... model_format=python serving_runtime=${KSERVE_RUNTIME_REST_NAME} | ||
... version="1" | ||
... limits_dict=${limits} requests_dict=${requests} kserve_mode=${KSERVE_MODE} | ||
Deploy Model Via CLI isvc_filepath=${INFERENCESERVICE_FILLED_FILEPATH} | ||
... namespace=${test_namespace} | ||
# File is not needed anymore after applying | ||
Remove File ${INFERENCESERVICE_FILLED_FILEPATH} | ||
Wait For Pods To Be Ready label_selector=serving.kserve.io/inferenceservice=${PYTHON_MODEL_NAME} | ||
... namespace=${test_namespace} | ||
${pod_name}= Get Pod Name namespace=${test_namespace} | ||
Check notice Code scanning / Robocop Variable '{{ name }}' is assigned but not used Note test
Variable '${pod_name}' is assigned but not used
|
||
... label_selector=serving.kserve.io/inferenceservice=${PYTHON_MODEL_NAME} | ||
${valued} ${host}= Run And Return Rc And Output oc get ksvc ${PYTHON_MODEL_NAME}-predictor -o jsonpath='{.status.url}' | ||
Log ${valued} | ||
${host}= Evaluate re.search(r"${PATTERN}", r"${host}").group(1) re | ||
Log ${host} | ||
${inference_output}= Query Model With GRPCURL host=${host} port=443 | ||
... endpoint=inference.GRPCInferenceService/ModelInfer | ||
... json_body=@ input_filepath=${INFERENCE_GRPC_INPUT_PYTHONFILE} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. needed @ for passing the input file for grpcurl There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think you should be able to run this kw with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I already tried that approach, but it didn’t work. That’s why I’m using this method. We already discussed this in another PR earlier. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hm that's very strange..thing is that as it is this keywork call does not make much sense with "json_body=@". |
||
... insecure=${True} protobuf_file=${PROTOBUFF_FILE} json_header=${NONE} | ||
${inference_output}= Evaluate json.dumps(${inference_output}) | ||
Log ${inference_output} | ||
${result} ${list}= Inference Comparison ${EXPECTED_INFERENCE_GRPC_OUTPUT_PYTHON} ${inference_output} | ||
Log ${result} | ||
Log ${list} | ||
[Teardown] Run Keywords | ||
... Clean Up Test Project test_ns=${test_namespace} | ||
... isvc_names=${models_names} wait_prj_deletion=${FALSE} kserve_mode=${KSERVE_MODE} | ||
... AND | ||
... Run Keyword If "${KSERVE_MODE}"=="RawDeployment" Terminate Process triton-process kill=true | ||
|
||
|
||
*** Keywords *** | ||
Suite Setup | ||
|
Check warning
Code scanning / Robocop
Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test