Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update KFTO multi-node test names according to recent updates in orig… #2164

Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Add KFTO pytorch multi-node multi-gpu tests for GPUs with AMD ROCm an…
…d NVIDIA Cuda
  • Loading branch information
abhijeet-dhumal committed Jan 22, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit c098ea021a74e73d35347ce0bbb6a9bc72906232
Original file line number Diff line number Diff line change
@@ -47,31 +47,54 @@
... TrainingOperator
Run Training Operator KFTO Test TestPyTorchJobFailureWithROCm ${ROCM_TRAINING_IMAGE}

Run Training operator KFTO_MNIST multi-node CPU test with NVIDIA CUDA image
[Documentation] Run Go KFTO_MNIST multi-node CPU test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with 2 CPUs each
Run Training operator KFTO_MNIST multi-node single-CPU test with NVIDIA CUDA image
[Documentation] Run Go KFTO_MNIST multi-node single-CPU test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with at least 1 CPUs each

Check warning

Code scanning / Robocop

Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test

Line is too long (186/120)
[Tags] RHOAIENG-16556
... Sanity
... DistributedWorkloads
... Training
... TrainingOperator
Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeCpu ${CUDA_TRAINING_IMAGE}
Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeSingleCpu ${CUDA_TRAINING_IMAGE}

Run Training operator KFTO_MNIST multi-node test with NVIDIA CUDA image
[Documentation] Run Go KFTO_MNIST multi-node test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with 1 GPUs each
Run Training operator KFTO_MNIST multi-node multi-CPU test with NVIDIA CUDA image
[Documentation] Run Go KFTO_MNIST multi-node multi-CPU test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with 2 CPUs each

Check warning

Code scanning / Robocop

Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test

Line is too long (176/120)
[Tags] RHOAIENG-16556
... Tier1
... DistributedWorkloads
... Training
... TrainingOperator
Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeMultiCpu ${CUDA_TRAINING_IMAGE}

Run Training operator KFTO_MNIST multi-node single-GPU test with NVIDIA CUDA image
[Documentation] Run Go KFTO_MNIST multi-node single-GPU test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with 1 GPU each

Check warning

Code scanning / Robocop

Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test

Line is too long (176/120)
[Tags] Resources-GPU NVIDIA-GPUs
... RHOAIENG-16556
... Tier1
... DistributedWorkloads
... Training
... TrainingOperator
Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeWithCuda ${CUDA_TRAINING_IMAGE}
Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeSingleGpuWithCuda ${CUDA_TRAINING_IMAGE}

Run Training operator KFTO_MNIST multi-node test with AMD ROCm image
[Documentation] Run Go KFTO_MNIST multi-node test for Training operator using PyTorch job with AMD ROCm image - It requires 2 cluster-nodes with 1 GPUs each
Run Training operator KFTO_MNIST multi-node single-GPU test with AMD ROCm image
[Documentation] Run Go KFTO_MNIST multi-node single-GPU test for Training operator using PyTorch job with AMD ROCm image - It requires 2 cluster-nodes with 1 GPU each

Check warning

Code scanning / Robocop

Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test

Line is too long (174/120)
[Tags] Resources-GPU AMD-GPUs ROCm
... RHOAIENG-16556
... Tier1
... DistributedWorkloads
... Training
... TrainingOperator
Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeWithROCm ${ROCM_TRAINING_IMAGE}
Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeSingleGpuWithROCm ${ROCM_TRAINING_IMAGE}

Run Training operator KFTO_MNIST multi-node multi-gpu test with NVIDIA CUDA image
[Documentation] Run Go KFTO_MNIST multi-node multi-gpu test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with 2 GPUs each

Check warning

Code scanning / Robocop

Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test

Line is too long (176/120)
[Tags] Kfto-MultiNodeMultiGpu
... Training
... TrainingOperator
Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeMultiGpuWithCuda ${CUDA_TRAINING_IMAGE}

Run Training operator KFTO_MNIST multi-node multi-gpu test with AMD ROCm image
[Documentation] Run Go KFTO_MNIST multi-node multi-gpu test for Training operator using PyTorch job with AMD ROCm image - It requires 2 cluster-nodes with 2 GPUs each

Check warning

Code scanning / Robocop

Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test

Line is too long (174/120)
[Tags] Kfto-MultiNodeMultiGpu
... Training
... TrainingOperator
Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeMultiGpuWithROCm ${ROCM_TRAINING_IMAGE}
Loading