Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(helm): update gpu-operator ( v24.9.1 → v24.9.2 ) #544

Merged
merged 1 commit into from
Jan 28, 2025

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Nov 1, 2024

This PR contains the following updates:

Package Update Change
gpu-operator (source) patch v24.9.1 -> v24.9.2

Release Notes

NVIDIA/gpu-operator (gpu-operator)

v24.9.2

Compare Source


Configuration

📅 Schedule: Branch creation - "* 0-4,22-23 * * 1-5,* * * * 0,6" in timezone America/Los_Angeles, Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

Copy link

github-actions bot commented Nov 1, 2024

--- HelmRelease: gpu-operator/gpu-operator ClusterRole: gpu-operator/gpu-operator

+++ HelmRelease: gpu-operator/gpu-operator ClusterRole: gpu-operator/gpu-operator

@@ -52,27 +52,12 @@

   - update
   - patch
   - delete
 - apiGroups:
   - ''
   resources:
-  - events
-  - pods
-  - pods/eviction
-  - services
-  verbs:
-  - create
-  - get
-  - list
-  - watch
-  - update
-  - patch
-  - delete
-- apiGroups:
-  - ''
-  resources:
   - nodes
   verbs:
   - get
   - list
   - watch
   - update
@@ -86,39 +71,33 @@

   - list
   - create
   - watch
   - update
   - patch
 - apiGroups:
+  - ''
+  resources:
+  - events
+  - pods
+  - pods/eviction
+  verbs:
+  - create
+  - get
+  - list
+  - watch
+  - update
+  - patch
+  - delete
+- apiGroups:
   - apps
   resources:
   - daemonsets
   verbs:
   - get
   - list
   - watch
-- apiGroups:
-  - apps
-  resources:
-  - controllerrevisions
-  verbs:
-  - get
-  - list
-  - watch
-- apiGroups:
-  - monitoring.coreos.com
-  resources:
-  - servicemonitors
-  - prometheusrules
-  verbs:
-  - get
-  - list
-  - create
-  - watch
-  - update
-  - delete
 - apiGroups:
   - nvidia.com
   resources:
   - clusterpolicies
   - clusterpolicies/finalizers
   - clusterpolicies/status
@@ -141,24 +120,12 @@

   verbs:
   - get
   - list
   - watch
   - create
 - apiGroups:
-  - coordination.k8s.io
-  resources:
-  - leases
-  verbs:
-  - get
-  - list
-  - watch
-  - create
-  - update
-  - patch
-  - delete
-- apiGroups:
   - node.k8s.io
   resources:
   - runtimeclasses
   verbs:
   - get
   - list
--- HelmRelease: gpu-operator/gpu-operator Role: gpu-operator/gpu-operator

+++ HelmRelease: gpu-operator/gpu-operator Role: gpu-operator/gpu-operator

@@ -22,12 +22,20 @@

   - update
   - patch
   - delete
 - apiGroups:
   - apps
   resources:
+  - controllerrevisions
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - apps
+  resources:
   - daemonsets
   verbs:
   - create
   - get
   - list
   - watch
@@ -35,17 +43,46 @@

   - patch
   - delete
 - apiGroups:
   - ''
   resources:
   - configmaps
+  - endpoints
+  - pods
+  - pods/eviction
   - secrets
+  - services
+  - services/finalizers
   - serviceaccounts
   verbs:
   - create
   - get
   - list
   - watch
   - update
   - patch
   - delete
+- apiGroups:
+  - coordination.k8s.io
+  resources:
+  - leases
+  verbs:
+  - get
+  - list
+  - watch
+  - create
+  - update
+  - patch
+  - delete
+- apiGroups:
+  - monitoring.coreos.com
+  resources:
+  - servicemonitors
+  - prometheusrules
+  verbs:
+  - get
+  - list
+  - create
+  - watch
+  - update
+  - delete
 
--- HelmRelease: gpu-operator/gpu-operator Deployment: gpu-operator/gpu-operator

+++ HelmRelease: gpu-operator/gpu-operator Deployment: gpu-operator/gpu-operator

@@ -44,13 +44,13 @@

           value: ''
         - name: OPERATOR_NAMESPACE
           valueFrom:
             fieldRef:
               fieldPath: metadata.namespace
         - name: DRIVER_MANAGER_IMAGE
-          value: nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.6.10
+          value: nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.7.0
         volumeMounts:
         - name: host-os-release
           mountPath: /host-etc/os-release
           readOnly: true
         livenessProbe:
           httpGet:
--- HelmRelease: gpu-operator/gpu-operator ClusterPolicy: gpu-operator/cluster-policy

+++ HelmRelease: gpu-operator/gpu-operator ClusterPolicy: gpu-operator/cluster-policy

@@ -15,30 +15,30 @@

   operator:
     defaultRuntime: docker
     runtimeClass: nvidia
     initContainer:
       repository: nvcr.io/nvidia
       image: cuda
-      version: 12.6.1-base-ubi8
+      version: 12.6.3-base-ubi9
       imagePullPolicy: IfNotPresent
   daemonsets:
     labels:
-      helm.sh/chart: gpu-operator-v24.6.2
+      helm.sh/chart: gpu-operator-v24.9.1
       app.kubernetes.io/managed-by: gpu-operator
     tolerations:
     - effect: NoSchedule
       key: nvidia.com/gpu
       operator: Exists
     priorityClassName: system-node-critical
     updateStrategy: RollingUpdate
     rollingUpdate:
       maxUnavailable: '1'
   validator:
     repository: nvcr.io/nvidia/cloud-native
     image: gpu-operator-validator
-    version: v24.6.2
+    version: v24.9.1
     imagePullPolicy: IfNotPresent
     plugin:
       env:
       - name: WITH_WORKLOAD
         value: 'false'
   mig:
@@ -52,26 +52,26 @@

     enabled: false
     useNvidiaDriverCRD: false
     useOpenKernelModules: false
     usePrecompiled: false
     repository: nvcr.io/nvidia
     image: driver
-    version: 550.90.07
+    version: 550.127.08
     imagePullPolicy: IfNotPresent
     startupProbe:
       failureThreshold: 120
       initialDelaySeconds: 60
       periodSeconds: 10
       timeoutSeconds: 60
     rdma:
       enabled: false
       useHostMofed: false
     manager:
       repository: nvcr.io/nvidia/cloud-native
       image: k8s-driver-manager
-      version: v0.6.10
+      version: v0.7.0
       imagePullPolicy: IfNotPresent
       env:
       - name: ENABLE_GPU_POD_EVICTION
         value: 'true'
       - name: ENABLE_AUTO_DRAIN
         value: 'false'
@@ -113,13 +113,13 @@

     enabled: false
     image: vgpu-manager
     imagePullPolicy: IfNotPresent
     driverManager:
       repository: nvcr.io/nvidia/cloud-native
       image: k8s-driver-manager
-      version: v0.6.10
+      version: v0.7.0
       imagePullPolicy: IfNotPresent
       env:
       - name: ENABLE_GPU_POD_EVICTION
         value: 'false'
       - name: ENABLE_AUTO_DRAIN
         value: 'false'
@@ -138,35 +138,35 @@

           url: nvcr.io/nvidia/cloud-native/kata-gpu-artifacts:ubuntu22.04-535.86.10-snp
         name: kata-nvidia-gpu-snp
         nodeSelector:
           nvidia.com/cc.capable: 'true'
     repository: nvcr.io/nvidia/cloud-native
     image: k8s-kata-manager
-    version: v0.2.1
+    version: v0.2.2
     imagePullPolicy: IfNotPresent
   vfioManager:
     enabled: true
     repository: nvcr.io/nvidia
     image: cuda
-    version: 12.6.1-base-ubi8
+    version: 12.6.3-base-ubi9
     imagePullPolicy: IfNotPresent
     driverManager:
       repository: nvcr.io/nvidia/cloud-native
       image: k8s-driver-manager
-      version: v0.6.10
+      version: v0.7.0
       imagePullPolicy: IfNotPresent
       env:
       - name: ENABLE_GPU_POD_EVICTION
         value: 'false'
       - name: ENABLE_AUTO_DRAIN
         value: 'false'
   vgpuDeviceManager:
     enabled: true
     repository: nvcr.io/nvidia/cloud-native
     image: vgpu-device-manager
-    version: v0.2.7
+    version: v0.2.8
     imagePullPolicy: IfNotPresent
     config:
       default: default
       name: ''
   ccManager:
     enabled: false
@@ -189,13 +189,13 @@

       value: none
     installDir: /var/nvidia
   devicePlugin:
     enabled: true
     repository: nvcr.io/nvidia
     image: k8s-device-plugin
-    version: v0.16.2-ubi8
+    version: v0.17.0
     imagePullPolicy: IfNotPresent
     env:
     - name: PASS_DEVICE_SPECS
       value: 'true'
     - name: FAIL_ON_INIT_ERROR
       value: 'true'
@@ -211,19 +211,19 @@

       name: time-slicing-config-all
       default: any
   dcgm:
     enabled: false
     repository: nvcr.io/nvidia/cloud-native
     image: dcgm
-    version: 3.3.7-1-ubuntu22.04
+    version: 3.3.9-1-ubuntu22.04
     imagePullPolicy: IfNotPresent
   dcgmExporter:
     enabled: true
     repository: nvcr.io/nvidia/k8s
     image: dcgm-exporter
-    version: 3.3.7-3.5.0-ubuntu22.04
+    version: 3.3.9-3.6.1-ubuntu22.04
     imagePullPolicy: IfNotPresent
     env:
     - name: DCGM_EXPORTER_LISTEN
       value: :9400
     - name: DCGM_EXPORTER_KUBERNETES
       value: 'true'
@@ -236,24 +236,24 @@

       interval: 15s
       relabelings: []
   gfd:
     enabled: true
     repository: nvcr.io/nvidia
     image: k8s-device-plugin
-    version: v0.16.2-ubi8
+    version: v0.17.0
     imagePullPolicy: IfNotPresent
     env:
     - name: GFD_SLEEP_INTERVAL
       value: 60s
     - name: GFD_FAIL_ON_INIT_ERROR
       value: 'true'
   migManager:
     enabled: true
     repository: nvcr.io/nvidia/cloud-native
     image: k8s-mig-manager
-    version: v0.8.0-ubuntu20.04
+    version: v0.10.0-ubuntu20.04
     imagePullPolicy: IfNotPresent
     env:
     - name: WITH_REBOOT
       value: 'false'
     config:
       name: null
@@ -261,24 +261,24 @@

     gpuClientsConfig:
       name: ''
   nodeStatusExporter:
     enabled: false
     repository: nvcr.io/nvidia/cloud-native
     image: gpu-operator-validator
-    version: v24.6.2
+    version: v24.9.1
     imagePullPolicy: IfNotPresent
   gdrcopy:
     enabled: false
     repository: nvcr.io/nvidia/cloud-native
     image: gdrdrv
-    version: v2.4.1-1
+    version: v2.4.1-2
     imagePullPolicy: IfNotPresent
   sandboxWorkloads:
     enabled: false
     defaultWorkload: container
   sandboxDevicePlugin:
     enabled: true
     repository: nvcr.io/nvidia
     image: kubevirt-gpu-device-plugin
-    version: v1.2.9
+    version: v1.2.10
     imagePullPolicy: IfNotPresent
 
--- HelmRelease: gpu-operator/gpu-operator ServiceAccount: gpu-operator/gpu-operator-upgrade-crd-hook-sa

+++ HelmRelease: gpu-operator/gpu-operator ServiceAccount: gpu-operator/gpu-operator-upgrade-crd-hook-sa

@@ -0,0 +1,10 @@

+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: gpu-operator-upgrade-crd-hook-sa
+  annotations:
+    helm.sh/hook: pre-upgrade
+    helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation
+    helm.sh/hook-weight: '0'
+
--- HelmRelease: gpu-operator/gpu-operator ClusterRole: gpu-operator/gpu-operator-upgrade-crd-hook-role

+++ HelmRelease: gpu-operator/gpu-operator ClusterRole: gpu-operator/gpu-operator-upgrade-crd-hook-role

@@ -0,0 +1,22 @@

+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: gpu-operator-upgrade-crd-hook-role
+  annotations:
+    helm.sh/hook: pre-upgrade
+    helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation
+    helm.sh/hook-weight: '0'
+rules:
+- apiGroups:
+  - apiextensions.k8s.io
+  resources:
+  - customresourcedefinitions
+  verbs:
+  - create
+  - get
+  - list
+  - watch
+  - patch
+  - update
+
--- HelmRelease: gpu-operator/gpu-operator ClusterRoleBinding: gpu-operator/gpu-operator-upgrade-crd-hook-binding

+++ HelmRelease: gpu-operator/gpu-operator ClusterRoleBinding: gpu-operator/gpu-operator-upgrade-crd-hook-binding

@@ -0,0 +1,18 @@

+---
+kind: ClusterRoleBinding
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+  name: gpu-operator-upgrade-crd-hook-binding
+  annotations:
+    helm.sh/hook: pre-upgrade
+    helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation
+    helm.sh/hook-weight: '0'
+subjects:
+- kind: ServiceAccount
+  name: gpu-operator-upgrade-crd-hook-sa
+  namespace: gpu-operator
+roleRef:
+  kind: ClusterRole
+  name: gpu-operator-upgrade-crd-hook-role
+  apiGroup: rbac.authorization.k8s.io
+
--- HelmRelease: gpu-operator/gpu-operator Job: gpu-operator/gpu-operator-upgrade-crd

+++ HelmRelease: gpu-operator/gpu-operator Job: gpu-operator/gpu-operator-upgrade-crd

@@ -0,0 +1,46 @@

+---
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: gpu-operator-upgrade-crd
+  namespace: gpu-operator
+  annotations:
+    helm.sh/hook: pre-upgrade
+    helm.sh/hook-weight: '1'
+    helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation
+  labels:
+    app.kubernetes.io/name: gpu-operator
+    app.kubernetes.io/instance: gpu-operator
+    app.kubernetes.io/managed-by: Helm
+    app.kubernetes.io/component: gpu-operator
+spec:
+  template:
+    metadata:
+      name: gpu-operator-upgrade-crd
+      labels:
+        app.kubernetes.io/name: gpu-operator
+        app.kubernetes.io/instance: gpu-operator
+        app.kubernetes.io/managed-by: Helm
+        app.kubernetes.io/component: gpu-operator
+    spec:
+      serviceAccountName: gpu-operator-upgrade-crd-hook-sa
+      tolerations:
+      - effect: NoSchedule
+        key: node-role.kubernetes.io/master
+        operator: Equal
+        value: ''
+      - effect: NoSchedule
+        key: node-role.kubernetes.io/control-plane
+        operator: Equal
+        value: ''
+      containers:
+      - name: upgrade-crd
+        image: ghcr.io/jfroy/gpu-operator:v24.6.2-ubi8
+        imagePullPolicy: IfNotPresent
+        command:
+        - /bin/sh
+        - -c
+        - |
+          kubectl apply -f /opt/gpu-operator/nvidia.com_clusterpolicies.yaml; kubectl apply -f /opt/gpu-operator/nvidia.com_nvidiadrivers.yaml;
+      restartPolicy: OnFailure
+

Copy link

github-actions bot commented Nov 1, 2024

--- kubernetes/apps/gpu-operator/gpu-operator/app Kustomization: flux-system/gpu-operator HelmRelease: gpu-operator/gpu-operator

+++ kubernetes/apps/gpu-operator/gpu-operator/app Kustomization: flux-system/gpu-operator HelmRelease: gpu-operator/gpu-operator

@@ -13,13 +13,13 @@

     spec:
       chart: gpu-operator
       sourceRef:
         kind: HelmRepository
         name: nvidia
         namespace: flux-system
-      version: v24.6.2
+      version: v24.9.1
   driftDetection:
     mode: enabled
   install:
     crds: CreateReplace
     disableOpenAPIValidation: true
     remediation:

@jfroy jfroy force-pushed the main branch 10 times, most recently from 44a8b71 to e2e1ece Compare November 7, 2024 18:10
@renovate renovate bot force-pushed the renovate/gpu-operator-24.x branch from fee98bb to 177fe6f Compare November 9, 2024 10:15
@renovate renovate bot force-pushed the renovate/gpu-operator-24.x branch from 177fe6f to 57e2949 Compare November 10, 2024 03:52
@jfroy jfroy force-pushed the main branch 2 times, most recently from 05848cb to 4f6fd94 Compare November 10, 2024 04:00
@renovate renovate bot force-pushed the renovate/gpu-operator-24.x branch from 57e2949 to 5af188a Compare November 10, 2024 04:00
@renovate renovate bot force-pushed the renovate/gpu-operator-24.x branch from 5af188a to c982c9b Compare November 10, 2024 04:02
@renovate renovate bot force-pushed the renovate/gpu-operator-24.x branch 2 times, most recently from 326afa0 to 65b089a Compare November 10, 2024 04:04
@renovate renovate bot force-pushed the renovate/gpu-operator-24.x branch from 65b089a to d0604dc Compare November 10, 2024 04:16
@jfroy jfroy force-pushed the main branch 2 times, most recently from 8522a8e to 2c1a094 Compare November 13, 2024 18:46
@jfroy jfroy force-pushed the main branch 7 times, most recently from aab24c0 to 068747d Compare January 13, 2025 19:19
@renovate renovate bot force-pushed the renovate/gpu-operator-24.x branch from 53d16eb to 15e86b9 Compare January 14, 2025 16:24
Copy link

github-actions bot commented Jan 14, 2025

--- kubernetes/apps/gpu-operator/gpu-operator/app Kustomization: flux-system/gpu-operator HelmRelease: gpu-operator/gpu-operator

+++ kubernetes/apps/gpu-operator/gpu-operator/app Kustomization: flux-system/gpu-operator HelmRelease: gpu-operator/gpu-operator

@@ -13,13 +13,13 @@

     spec:
       chart: gpu-operator
       sourceRef:
         kind: HelmRepository
         name: nvidia
         namespace: flux-system
-      version: v24.9.1
+      version: v24.9.2
   driftDetection:
     mode: enabled
   install:
     crds: CreateReplace
     disableOpenAPIValidation: true
     remediation:

Copy link

github-actions bot commented Jan 14, 2025

--- HelmRelease: gpu-operator/gpu-operator ClusterPolicy: gpu-operator/cluster-policy

+++ HelmRelease: gpu-operator/gpu-operator ClusterPolicy: gpu-operator/cluster-policy

@@ -19,26 +19,26 @@

       repository: nvcr.io/nvidia
       image: cuda
       version: 12.6.3-base-ubi9
       imagePullPolicy: IfNotPresent
   daemonsets:
     labels:
-      helm.sh/chart: gpu-operator-v24.9.1
+      helm.sh/chart: gpu-operator-v24.9.2
       app.kubernetes.io/managed-by: gpu-operator
     tolerations:
     - effect: NoSchedule
       key: nvidia.com/gpu
       operator: Exists
     priorityClassName: system-node-critical
     updateStrategy: RollingUpdate
     rollingUpdate:
       maxUnavailable: '1'
   validator:
     repository: nvcr.io/nvidia/cloud-native
     image: gpu-operator-validator
-    version: v24.9.1
+    version: v24.9.2
     imagePullPolicy: IfNotPresent
     plugin:
       env:
       - name: WITH_WORKLOAD
         value: 'false'
   mig:
@@ -52,13 +52,13 @@

     enabled: false
     useNvidiaDriverCRD: false
     useOpenKernelModules: false
     usePrecompiled: false
     repository: nvcr.io/nvidia
     image: driver
-    version: 550.127.08
+    version: 550.144.03
     imagePullPolicy: IfNotPresent
     startupProbe:
       failureThreshold: 120
       initialDelaySeconds: 60
       periodSeconds: 10
       timeoutSeconds: 60
@@ -267,13 +267,13 @@

     gpuClientsConfig:
       name: ''
   nodeStatusExporter:
     enabled: false
     repository: nvcr.io/nvidia/cloud-native
     image: gpu-operator-validator
-    version: v24.9.1
+    version: v24.9.2
     imagePullPolicy: IfNotPresent
   gdrcopy:
     enabled: false
     repository: nvcr.io/nvidia/cloud-native
     image: gdrdrv
     version: v2.4.1-2

@renovate renovate bot force-pushed the renovate/gpu-operator-24.x branch 2 times, most recently from e69c671 to a7c5865 Compare January 16, 2025 19:10
@renovate renovate bot changed the title feat(helm): update gpu-operator ( v24.6.2 → v24.9.1 ) feat(helm): update gpu-operator ( v24.6.2 → v24.9.1 ) - autoclosed Jan 21, 2025
@renovate renovate bot closed this Jan 21, 2025
@renovate renovate bot deleted the renovate/gpu-operator-24.x branch January 21, 2025 21:17
@renovate renovate bot changed the title feat(helm): update gpu-operator ( v24.6.2 → v24.9.1 ) - autoclosed feat(helm): update gpu-operator ( v24.6.2 → v24.9.1 ) Jan 28, 2025
@renovate renovate bot reopened this Jan 28, 2025
@renovate renovate bot force-pushed the renovate/gpu-operator-24.x branch from f1930b9 to a7c5865 Compare January 28, 2025 07:06
@renovate renovate bot changed the title feat(helm): update gpu-operator ( v24.6.2 → v24.9.1 ) fix(helm): update gpu-operator ( v24.9.1 → v24.9.2 ) Jan 28, 2025
@renovate renovate bot force-pushed the renovate/gpu-operator-24.x branch from a7c5865 to 8525c22 Compare January 28, 2025 07:24
@jfroy jfroy force-pushed the main branch 2 times, most recently from 44d75b9 to aa3aa4f Compare January 28, 2025 07:36
@renovate renovate bot force-pushed the renovate/gpu-operator-24.x branch from 8525c22 to 58239cf Compare January 28, 2025 07:36
@renovate renovate bot force-pushed the renovate/gpu-operator-24.x branch from 58239cf to eb2b766 Compare January 28, 2025 07:49
@renovate renovate bot force-pushed the renovate/gpu-operator-24.x branch from eb2b766 to 7c90f99 Compare January 28, 2025 08:00
@jfroy jfroy merged commit 792e5bf into main Jan 28, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant