From abceea2621f401b5b2173fc47a3ad1c2199e4bb2 Mon Sep 17 00:00:00 2001 From: Kaarthikeyan Subramanian Date: Wed, 29 Jan 2025 16:43:01 -0800 Subject: [PATCH] Blocked nodes label action --- articles/aks/upgrade-cluster.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/articles/aks/upgrade-cluster.md b/articles/aks/upgrade-cluster.md index a0383c54a..9770707ea 100644 --- a/articles/aks/upgrade-cluster.md +++ b/articles/aks/upgrade-cluster.md @@ -88,16 +88,22 @@ kubectl get nodes --show-labels=true The blocked nodes are unscheduled for pods and marked with the label `"kubernetes.azure.com/upgrade-status: Quarantined"`. The maximum number of nodes that can be left blocked can't be more than the `Max-Surge` value. -### How do I remove the blocked nodes? +### What action can i do from here on? -First resolve the issue causing the drain. The following example removes the responsible PDB: +First resolve the underlying issue causing the drain. The following example removes the responsible PDB: ```bash kubectl delete pdb nginx-pdb poddisruptionbudget.policy "nginx-pdb" deleted. ``` +If you are confident the issue is now resolved , then you can go ahead and remove the label `"kubernetes.azure.com/upgrade-status: Quarantined"` placed on undrainable nodes. This can be done as follows: -Then delete the blocked node using the `az aks nodepool delete-machines` command. This command is useful if you intend to reduce the node pool footprint by removing nodes left behind in older versions. +```bash +kubectl label nodes - +``` +Any subsequent 'PUT' operation will attempt to reconcile the 'failed provisioning status' on the cluster to 'success' first. The quarantined nodes shall not be considered for any subsequent put or reconcile. You have to explicitly remove the labels as mentioned previously for any blocked nodes to be considered. + +You can also delete the blocked node using the `az aks nodepool delete-machines` command. This command is useful if you intend to reduce the node pool footprint by removing nodes left behind in older versions. ```azurecli-interactive az aks nodepool delete-machines --cluster-name MyCluster --machine-names aks-nodepool1-test123-vmss000000 --name nodepool1 --resource-group TestRG