tests_l2: Added gpu hwinfo workload #155

vbedida79 · 2023-10-10T18:40:04Z

Added GPU hwinfo ubi9 based workload for OCP 4.13
Based on driver verification
Signed-off-by: vbedida79 veenadhari.bedida@intel.com

Signed-off-by: vbedida79 <veenadhari.bedida@intel.com>

hershpa · 2023-10-10T18:52:11Z

@vbedida79 great work, this looks good. If we plan to convert this to an automation test case in future, what is the expected behavior of the container?

Ideally, we want something that we can programmatically check at runtime, i.e. it would run hwinfo --display and then we can check if it succeeds or fails based on pod status or some exit code.

vbedida79 · 2023-10-10T19:00:12Z

Ideally, we want something that we can programmatically check at runtime, i.e. it would run hwinfo --display and then we can check if it succeeds or fails based on pod status or some exit code.

this would be a good case for automation. yes pod status and count of gpu resource. if possible, some part of output too

chaitanya1731 · 2023-10-10T19:02:26Z

@vbedida79 Can you also add the steps in the README to run and verify this workload? @uMartinXu @hershpa do you think we should create readme for specific feature instead of a common generic one? As we keep adding new workloads, it will increase the contents eventually making it difficult to navigate through readme.. for example three separate readmes for respective device plugins directories...

vbedida79 · 2023-10-10T19:05:15Z

@vbedida79 Can you also add the steps in the README to run and verify this workload? @uMartinXu @hershpa do you think we should create readme for specific feature instead of a common generic one? As we keep adding new workloads, it will increase the contents eventually making it difficult to navigate through readme.. for example three separate readmes for respective device plugins directories...

yes, will submit next. I plan to add in gpu section of tests. if not we can add in device plugin readme, either works.

chaitanya1731 · 2023-10-10T19:09:25Z

if not we can add in device plugin readme, either works.

Sorry for the confusion about device plugin readme I meant something like l2/dgpu/README.md, l2/qat/README.md and so on..

vbedida79 · 2023-10-10T19:35:26Z

if not we can add in device plugin readme, either works.

Sorry for the confusion about device plugin readme I meant something like l2/dgpu/README.md, l2/qat/README.md and so on..

np. good idea. for now added in #156. can change it to separate readme's

uMartinXu · 2023-10-10T23:04:51Z

@vbedida79 Can you also add the steps in the README to run and verify this workload? @uMartinXu @hershpa do you think we should create readme for specific feature instead of a common generic one? As we keep adding new workloads, it will increase the contents eventually making it difficult to navigate through readme.. for example three separate readmes for respective device plugins directories...

This is a good question. A specific readme for a single feature is a good idea. I think in the long run we should do that. But now just continue to use the current readme schema, and at the same time, let's listen to the users for feedback.

uMartinXu · 2023-10-10T23:20:29Z

This PR looks good to me. BTW should we also have an L1 test case for the dGPU OOT driver testing?

vbedida79 · 2023-10-10T23:22:15Z

But should we also have an L1 test case for the dGPU OOT driver testing?

what kind of tests are we looking at? I think we can use clinfo/hwinfo for that too

uMartinXu · 2023-10-10T23:34:10Z

But should we also have an L1 test case for the dGPU OOT driver testing?

what kind of tests are we looking at? I think we can use clinfo/hwinfo for that too

You are right we should use clinfo/hwinfo. But since for L1 testing, we have no provisioning stack there on cluster, so we can not claim the i915 resources.

vbedida79 · 2023-10-10T23:38:22Z

You are right we should use clinfo/hwinfo. But since for L1 testing, we have no provisioning stack there on cluster, so we can not claim the i915 resources.

How about we run it as a daemonset on all kmm labelled nodes- i.e nodes where driver has loaded. wdyt?

tests_l2: Added gpu hwinfo workload

df63ae1

Signed-off-by: vbedida79 <veenadhari.bedida@intel.com>

vbedida79 requested review from uMartinXu, chaitanya1731 and hershpa October 10, 2023 18:41

hershpa approved these changes Oct 10, 2023

View reviewed changes

vbedida79 mentioned this pull request Oct 10, 2023

tests_l2: Update gpu readme for hwinfo #156

Merged

uMartinXu merged commit f6265cd into intel:main Oct 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests_l2: Added gpu hwinfo workload #155

tests_l2: Added gpu hwinfo workload #155

vbedida79 commented Oct 10, 2023 •

edited

Loading

hershpa commented Oct 10, 2023

vbedida79 commented Oct 10, 2023

chaitanya1731 commented Oct 10, 2023

vbedida79 commented Oct 10, 2023 •

edited

Loading

chaitanya1731 commented Oct 10, 2023

vbedida79 commented Oct 10, 2023 •

edited

Loading

uMartinXu commented Oct 10, 2023

uMartinXu commented Oct 10, 2023 •

edited

Loading

vbedida79 commented Oct 10, 2023 •

edited

Loading

uMartinXu commented Oct 10, 2023

vbedida79 commented Oct 10, 2023 •

edited

Loading

tests_l2: Added gpu hwinfo workload #155

tests_l2: Added gpu hwinfo workload #155

Conversation

vbedida79 commented Oct 10, 2023 • edited Loading

hershpa commented Oct 10, 2023

vbedida79 commented Oct 10, 2023

chaitanya1731 commented Oct 10, 2023

vbedida79 commented Oct 10, 2023 • edited Loading

chaitanya1731 commented Oct 10, 2023

vbedida79 commented Oct 10, 2023 • edited Loading

uMartinXu commented Oct 10, 2023

uMartinXu commented Oct 10, 2023 • edited Loading

vbedida79 commented Oct 10, 2023 • edited Loading

uMartinXu commented Oct 10, 2023

vbedida79 commented Oct 10, 2023 • edited Loading

vbedida79 commented Oct 10, 2023 •

edited

Loading

vbedida79 commented Oct 10, 2023 •

edited

Loading

vbedida79 commented Oct 10, 2023 •

edited

Loading

uMartinXu commented Oct 10, 2023 •

edited

Loading

vbedida79 commented Oct 10, 2023 •

edited

Loading

vbedida79 commented Oct 10, 2023 •

edited

Loading