Skip to content

Latest commit

 

History

History
177 lines (147 loc) · 6.19 KB

dynamic-mig-support.md

File metadata and controls

177 lines (147 loc) · 6.19 KB

Introduction

We now support dynamic-mig by using mig-parted to adjust mig-devices dynamically, including:

Dynamic MIG instance management: User don't need to operate on GPU node, using 'nvidia-smi -i 0 -mig 1' or other command to manage MIG instance, all will be done by HAMi-device-plugin.

Dynamic MIG Adjustment: Each MIG device managed by HAMi will dyamically adjust their MIG template according to tasks submitted when necessary.

Device MIG Observation: Each MIG instance generated by HAMi will be shown in scheduler-monitor, including task information. user can get a clear overview of MIG nodes.

Compatable with HAMi-core nodes: HAMi can manage a unified GPU pool of HAMi-core node and mig node. A task can be scheduled to either node if not appointed manually by using nvidia.com/vgpu-mode annotation.

Unified API with HAMi-core: Zero work needs to be done to make the job compatible with dynamic-mig feature.

Prerequisites

  • NVIDIA Blackwell and Hopper™ and Ampere Devices
  • HAMi > v2.5.0
  • Nvidia-container-toolkit

Enabling Dynamic-mig Support

  • Install the chart using helm, See 'enabling vGPU support in kubernetes' section here

  • Configure mode in device-plugin configMap to mig for MIG nodes

kubectl describe cm  hami-device-plugin -n kube-system
{
    "nodeconfig": [
        {
            "name": "MIG-NODE-A",
            "operatingmode": "mig",
            "filterdevices": {
              "uuid": [],
              "index": []
            }
        }
    ]
}
  • Restart the following pods for the change to take effect:
    • hami-scheduler
    • hami-device-plugin on 'MIG-NODE-A'

Custom mig configuration (Optional)

HAMi currently has a built-in mig configuration for MIG.

You can customize the mig configuration by following the steps below:

Change the content of 'device-configmap.yaml' in charts/hami/templates/scheduler, the as follows

  nvidia:
    resourceCountName: {{ .Values.resourceName }}
    resourceMemoryName: {{ .Values.resourceMem }}
    resourceMemoryPercentageName: {{ .Values.resourceMemPercentage }}
    resourceCoreName: {{ .Values.resourceCores }}
    resourcePriorityName: {{ .Values.resourcePriority }}
    overwriteEnv: false
    defaultMemory: 0
    defaultCores: 0
    defaultGPUNum: 1
    deviceSplitCount: {{ .Values.devicePlugin.deviceSplitCount }}
    deviceMemoryScaling: {{ .Values.devicePlugin.deviceMemoryScaling }}
    deviceCoreScaling: {{ .Values.devicePlugin.deviceCoreScaling }}
    knownMigGeometries:
    - models: [ "A30" ]
      allowedGeometries:
        - 
          - name: 1g.6gb
            memory: 6144
            count: 4
        - 
          - name: 2g.12gb
            memory: 12288
            count: 2
        - 
          - name: 4g.24gb
            memory: 24576
            count: 1
    - models: [ "A100-SXM4-40GB", "A100-40GB-PCIe", "A100-PCIE-40GB", "A100-SXM4-40GB" ]
      allowedGeometries:
        - 
          - name: 1g.5gb
            memory: 5120
            count: 7
        - 
          - name: 2g.10gb
            memory: 10240
            count: 3
          - name: 1g.5gb
            memory: 5120
            count: 1
        - 
          - name: 3g.20gb
            memory: 20480
            count: 2
        - 
          - name: 7g.40gb
            memory: 40960
            count: 1
    - models: [ "A100-SXM4-80GB", "A100-80GB-PCIe", "A100-PCIE-80GB"]
      allowedGeometries:
        - 
          - name: 1g.10gb
            memory: 10240
            count: 7
        - 
          - name: 2g.20gb
            memory: 20480
            count: 3
          - name: 1g.10gb
            memory: 10240
            count: 1
        - 
          - name: 3g.40gb
            memory: 40960
            count: 2
        - 
          - name: 7g.79gb
            memory: 80896
            count: 1

Note Helm installation and updates will be based on the configuration in this file, overwriting the built-in configuration of Helm

Note Be aware HAMi will find and use the first MIG template suitable to the task in the order of this configMap

Running MIG jobs

MIG instance can now be requested by a container the same way as using hami-core simply by specifying the nvidia.com/gpu and nvidia.com/gpumem resource type.

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
  annotations:
    nvidia.com/vgpu-mode: "mig" #(Optional), if not set, this pod can be assigned to a MIG instance or a hami-core instance
spec:
  containers:
    - name: ubuntu-container
      image: ubuntu:18.04
      command: ["bash", "-c", "sleep 86400"]
      resources:
        limits:
          nvidia.com/gpu: 2 
          nvidia.com/gpumem: 8000

In this example above, the task allocates two mig instances, each with at least 8G device memory.

Monitor MIG Instance

MIG Instance managed by HAMi will be displayed in scheduler monitor(scheduler node ip:31993/metrics), as follows:

# HELP nodeGPUMigInstance GPU Sharing mode. 0 for hami-core, 1 for mig, 2 for mps
# TYPE nodeGPUMigInstance gauge
nodeGPUMigInstance{deviceidx="0",deviceuuid="GPU-936619fc-f6a1-74a8-0bc6-ecf6b3269313",migname="3g.20gb-0",nodeid="aio-node15",zone="vGPU"} 1
nodeGPUMigInstance{deviceidx="0",deviceuuid="GPU-936619fc-f6a1-74a8-0bc6-ecf6b3269313",migname="3g.20gb-1",nodeid="aio-node15",zone="vGPU"} 0
nodeGPUMigInstance{deviceidx="1",deviceuuid="GPU-30f90f49-43ab-0a78-bf5c-93ed41ef2da2",migname="3g.20gb-0",nodeid="aio-node15",zone="vGPU"} 1
nodeGPUMigInstance{deviceidx="1",deviceuuid="GPU-30f90f49-43ab-0a78-bf5c-93ed41ef2da2",migname="3g.20gb-1",nodeid="aio-node15",zone="vGPU"} 1

Notes

  1. You don't need to do anything on MIG node, all are managed by mig-parted in hami-device-plugin.

  2. Nvidia devices before Ampere architect can't use 'mig' mode

  3. You won't see any mig resources(ie, nvidia.com/mig-1g.10gb) on node, hami uses a unified resource name for both 'mig' and 'hami-core' node