Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KFTO_MNIST training operator tests #2159

Merged
merged 1 commit into from
Jan 7, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

*** Test Cases ***
Run Training operator KFTO test with NVIDIA CUDA image
[Documentation] Run Go KFTO tests for Training operator using PyTorch job with NVIDIA CUDA image
[Documentation] Run Go KFTO test for Training operator using PyTorch job with NVIDIA CUDA image
[Tags] Resources-GPU NVIDIA-GPUs
... RHOAIENG-16035
... Tier1
Expand All @@ -20,7 +20,7 @@
Run Training Operator KFTO Test TestPyTorchJobWithCuda ${CUDA_TRAINING_IMAGE}

Run Training operator KFTO test with AMD ROCm image
[Documentation] Run Go KFTO tests for Training operator using PyTorch job with AMD ROCm image
[Documentation] Run Go KFTO test for Training operator using PyTorch job with AMD ROCm image
[Tags] Resources-GPU AMD-GPUs ROCm
... RHOAIENG-16035
... Tier1
Expand All @@ -30,7 +30,7 @@
Run Training Operator KFTO Test TestPyTorchJobWithROCm ${ROCM_TRAINING_IMAGE}

Run Training operator KFTO error handling test with NVIDIA CUDA image
[Documentation] Run Go KFTO error handling tests for Training operator using PyTorch job with NVIDIA CUDA image
[Documentation] Run Go KFTO error handling test for Training operator using PyTorch job with NVIDIA CUDA image
[Tags] RHOAIENG-14542
... Tier1
... DistributedWorkloads
Expand All @@ -39,10 +39,39 @@
Run Training Operator KFTO Test TestPyTorchJobFailureWithCuda ${CUDA_TRAINING_IMAGE}

Run Training operator KFTO error handling test with AMD ROCm image
[Documentation] Run Go KFTO error handling tests for Training operator using PyTorch job with AMD ROCm image
[Documentation] Run Go KFTO error handling test for Training operator using PyTorch job with AMD ROCm image
[Tags] RHOAIENG-14542
... Tier1
... DistributedWorkloads
... Training
... TrainingOperator
Run Training Operator KFTO Test TestPyTorchJobFailureWithROCm ${ROCM_TRAINING_IMAGE}

Run Training operator KFTO_MNIST multi-node CPU test with NVIDIA CUDA image
[Documentation] Run Go KFTO_MNIST multi-node CPU test for Training operator using PyTorch job with NVIDIA CUDA image

Check warning

Code scanning / Robocop

Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test

Line is too long (123/120)
[Tags] RHOAIENG-16556
... Sanity
... DistributedWorkloads
... Training
... TrainingOperator
Run Training Operator KFTO Test TestPyTorchJobMnistCpu ${CUDA_TRAINING_IMAGE}

Run Training operator KFTO_MNIST multi-node test with NVIDIA CUDA image
Fixed Show fixed Hide fixed
[Documentation] Run Go KFTO_MNIST multi-node test for Training operator using PyTorch job with NVIDIA CUDA image
[Tags] Resources-GPU NVIDIA-GPUs
... RHOAIENG-16556
... Tier1
... DistributedWorkloads
... Training
... TrainingOperator
Run Training Operator KFTO Test TestPyTorchJobMnistWithCuda ${CUDA_TRAINING_IMAGE}

Run Training operator KFTO_MNIST multi-node test with AMD ROCm image
[Documentation] Run Go KFTO_MNIST multi-node test for Training operator using PyTorch job with AMD ROCm image
[Tags] Resources-GPU AMD-GPUs ROCm
... RHOAIENG-16556
... Tier1
... DistributedWorkloads
... Training
... TrainingOperator
Run Training Operator KFTO Test TestPyTorchJobMnistWithROCm ${ROCM_TRAINING_IMAGE}
Loading