gpu-sharing

History

Name		Name	Last commit message	Last commit date
parent directory ..
ai-jobs		ai-jobs
images		images
README.md		README.md

README.md

Alnair GPU sharing feature demo

Purpose

Demostrate running multiple jobs on one GPU, i.e., GPU sharing feature, provided by Alnair platform improves GPU utilization and reduce overall job completion time.

Alnair vGPU device plugin supports both fractional gpu memory and compute to indiviual job/process, together with vGPU scheduler gurantees no cross-node scheduling.

Prerequisite

A two-node Kubernetes cluster. At least one node has one Nvidia GPU card.

Steps

The following steps were executed and verified on a v1.21.4 Kubernetes cluster. Nvidia GPU is V100 with 32GB RAM.

Clone Alnair Git repo

git clone https://github.com/CentaurusInfra/alnair.git
Install Alnair vGPU device plugin (for GPU sharing)

kubectl apply -f alnair/alnair-device-plugin/manifests/alnair-device-plugin.yaml
Install Alnair scheduler (for alnair-vgpu resource scheduling)

sudo alnair/autonomous-scheduler/vGPUScheduler/install.sh
Install Alnair profiler (for GPU utilization visualization)

kubectl apply -f alnair/alnair-profiler/profiler-all-in-one.yaml

Launch jobs sequentially, record each job’s completion time (without sharing)

kubectl apply -f alnair/demo/gpu-sharing/ai-jobs/giraffe-demo.yaml
# wait after the pod is completed, roughly 1 minute
kubectl apply -f alnair/demo/gpu-sharing/ai-jobs/gan-mnist-demo.yaml

Launch jobs at the same time on the same job, record each job’s completion time (with GPU sharing)

kubectl apply -f alnair/demo/gpu-sharing/ai-jobs/giraffe-demo.yaml && kubectl apply -f alnair/demo/gpu-sharing/ai-jobs/gan-mnist-demo.yaml
Compare GPU utilization, and Job completion time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

gpu-sharing

gpu-sharing

README.md

Alnair GPU sharing feature demo

Purpose

Prerequisite

Steps

Results

Files

gpu-sharing

Directory actions

More options

Directory actions

More options

Latest commit

History

gpu-sharing

Folders and files

parent directory

README.md

Alnair GPU sharing feature demo

Purpose

Prerequisite

Steps

Results