From 1de200956a978e04299ee658f459c64aed723400 Mon Sep 17 00:00:00 2001
From: Zespre Chang <zespre.chang@suse.com>
Date: Thu, 5 Sep 2024 20:49:10 +0800
Subject: [PATCH] doc(upgrade): add known issue for v1.3.1 to v1.3.2 upgrades
 (#635)

For two-node cluster upgrades, the RKE2's agent load balancer on the
worker node will stop functioning when the management node's RKE2 is
under upgrade. This commit describes how to triage and recover from
this upgrade incident.

Signed-off-by: Zespre Chang <zespre.chang@suse.com>
Co-authored-by: Jillian <67180770+jillian-maroket@users.noreply.github.com>
---
 docs/upgrade/v1-1-2-to-v1-2-0.md              |   2 +-
 docs/upgrade/v1-2-0-to-v1-2-1.md              |   2 +-
 docs/upgrade/v1-2-1-to-v1-2-2.md              |   2 +-
 docs/upgrade/v1-2-2-to-v1-3-1.md              |   2 +-
 docs/upgrade/v1-3-1-to-v1-3-2.md              | 222 ++++++++++++++++++
 .../version-v1.3/upgrade/v1-1-2-to-v1-2-0.md  |   2 +-
 .../version-v1.3/upgrade/v1-2-0-to-v1-2-1.md  |   2 +-
 .../version-v1.3/upgrade/v1-2-1-to-v1-2-2.md  |   2 +-
 .../version-v1.3/upgrade/v1-2-2-to-v1-3-1.md  |   2 +-
 .../version-v1.3/upgrade/v1-3-1-to-v1-3-2.md  | 222 ++++++++++++++++++
 10 files changed, 452 insertions(+), 8 deletions(-)
 create mode 100644 docs/upgrade/v1-3-1-to-v1-3-2.md
 create mode 100644 versioned_docs/version-v1.3/upgrade/v1-3-1-to-v1-3-2.md
diff --git a/docs/upgrade/v1-1-2-to-v1-2-0.md b/docs/upgrade/v1-1-2-to-v1-2-0.md
index 934ab40e6b6..61f0bc1912f 100644
--- a/docs/upgrade/v1-1-2-to-v1-2-0.md
+++ b/docs/upgrade/v1-1-2-to-v1-2-0.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 5
+sidebar_position: 6
 sidebar_label: Upgrade from v1.1.2 to v1.2.0 (not recommended)
 title: "Upgrade from v1.1.2 to v1.2.0 (not recommended)"
 ---
diff --git a/docs/upgrade/v1-2-0-to-v1-2-1.md b/docs/upgrade/v1-2-0-to-v1-2-1.md
index a017ed68c4d..fe4bdaf36de 100644
--- a/docs/upgrade/v1-2-0-to-v1-2-1.md
+++ b/docs/upgrade/v1-2-0-to-v1-2-1.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 4
+sidebar_position: 5
 sidebar_label: Upgrade from v1.1.2/v1.1.3/v1.2.0 to v1.2.1
 title: "Upgrade from v1.1.2/v1.1.3/v1.2.0 to v1.2.1"
 ---
diff --git a/docs/upgrade/v1-2-1-to-v1-2-2.md b/docs/upgrade/v1-2-1-to-v1-2-2.md
index 24d2f64524c..e3f01e9980d 100644
--- a/docs/upgrade/v1-2-1-to-v1-2-2.md
+++ b/docs/upgrade/v1-2-1-to-v1-2-2.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 3
+sidebar_position: 4
 sidebar_label: Upgrade from v1.2.1 to v1.2.2
 title: "Upgrade from v1.2.1 to v1.2.2"
 ---
diff --git a/docs/upgrade/v1-2-2-to-v1-3-1.md b/docs/upgrade/v1-2-2-to-v1-3-1.md
index 9810d79eddf..5c4e5a0bbea 100644
--- a/docs/upgrade/v1-2-2-to-v1-3-1.md
+++ b/docs/upgrade/v1-2-2-to-v1-3-1.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 2
+sidebar_position: 3
 sidebar_label: Upgrade from v1.2.2/v1.3.0 to v1.3.1
 title: "Upgrade from v1.2.2/v1.3.0 to v1.3.1"
 ---
diff --git a/docs/upgrade/v1-3-1-to-v1-3-2.md b/docs/upgrade/v1-3-1-to-v1-3-2.md
new file mode 100644
index 00000000000..106a7fd3333
--- /dev/null
+++ b/docs/upgrade/v1-3-1-to-v1-3-2.md
@@ -0,0 +1,222 @@
+---
+sidebar_position: 2
+sidebar_label: Upgrade from v1.3.1 to v1.3.2
+title: "Upgrade from v1.3.1 to v1.3.2"
+---
+
+<head>
+  <link rel="canonical" href="https://docs.harvesterhci.io/v1.3/upgrade/v1-3-1-to-v1-3-2"/>
+</head>
+
+## General information
+
+An **Upgrade** button appears on the **Dashboard** screen whenever a new Harvester version that you can upgrade to becomes available. For more information, see [Start an upgrade](./automatic.md#start-an-upgrade).
+
+For air-gapped environments, see [Prepare an air-gapped upgrade](./automatic.md#prepare-an-air-gapped-upgrade).
+
+
+## Known issues
+
+---
+
+### 1. Two-node cluster upgrade stuck after the first node is pre-drained
+
+:::info important
+
+Shut down all workload VMs before upgrading **two-node clusters** to prevent data loss.
+
+:::
+
+The worker node can falsely transition to a not-ready state when RKE2 is upgraded on the management node. Consequently, the existing pods on the worker node are evicted and new pods cannot be scheduled on any nodes. These ultimately cause a chained failure in the whole cluster and prevent completion of the upgrade process.
+
+Check the cluster status when the following occur:
+
+- The upgrade process becomes stuck for some time.
+- You are unable to access the Harvester UI and receive an HTTP 503 error.
+
+1. Check the conditions and node statuses of the latest `Upgrade` custom resource.
+
+    Proceed to the next step if the following conditions are met:
+
+    - `SystemServicesUpgraded` is set to `True`, indicating that the system services upgrade is completed.
+    - In `nodeStatuses`, the state of the management node is either `Pre-drained` or `Waiting Reboot`.
+    - In `nodeStatuses`, the state of the worker node is `Images preloaded`.
+
+    Example:
+
+    ```
+    # Find out the latest Upgrade custom resource
+    $ kubectl -n harvester-system get upgrades.harvesterhci -l harvesterhci.io/latestUpgrade=true        
+    NAME                 AGE
+    hvst-upgrade-szlg8   48m
+
+    # Check the conditions and node statuses
+    $ kubectl -n harvester-system get upgrades hvst-upgrade-szlg8 -o yaml
+    apiVersion: harvesterhci.io/v1beta1
+    kind: Upgrade
+    metadata:
+      ...
+      labels:
+        harvesterhci.io/latestUpgrade: "true"
+        harvesterhci.io/upgradeState: UpgradingNodes
+      name: hvst-upgrade-szlg8
+      namespace: harvester-system
+      ...
+    spec:
+      image: ""
+      logEnabled: false
+      version: v1.3.2-rc2
+    status:
+      conditions:
+      - status: Unknown
+        type: Completed
+      - lastUpdateTime: "2024-09-02T11:57:04Z"
+        message: Upgrade observability is administratively disabled
+        reason: Disabled
+        status: "False"
+        type: LogReady
+      - lastUpdateTime: "2024-09-02T11:58:01Z"
+        status: "True"
+        type: ImageReady
+      - lastUpdateTime: "2024-09-02T12:02:31Z"
+        status: "True"
+        type: RepoReady
+      - lastUpdateTime: "2024-09-02T12:18:44Z"
+        status: "True"
+        type: NodesPrepared
+      - lastUpdateTime: "2024-09-02T12:31:25Z"
+        status: "True"
+        type: SystemServicesUpgraded
+      - status: Unknown
+        type: NodesUpgraded
+      imageID: harvester-system/hvst-upgrade-szlg8
+      nodeStatuses:
+        harvester-c6phd:
+          state: Pre-drained
+        harvester-jkqhq:
+          state: Images preloaded
+      previousVersion: v1.3.1
+      ...
+    ```
+
+2. Check the node status.
+
+    Proceed to the next step if the following conditions are met:
+
+    - The status of the worker node is `NotReady`.
+    - The status of the management node is `Ready,SchedulingDisabled`.
+
+    Example:
+
+    ```
+    $ kubectl get nodes
+    NAME              STATUS                     ROLES                       AGE    VERSION
+    harvester-c6phd   Ready,SchedulingDisabled   control-plane,etcd,master   174m   v1.28.12+rke2r1
+    harvester-jkqhq   NotReady                   <none>                      166m   v1.27.13+rke2r1
+    ```
+
+3. Check the pods on the worker node.
+
+    The issue exists in the cluster if the status of most pods is `Terminating`.
+
+    Example:
+
+    ```
+    # Assume harvester-jkqhq is the worker node
+    $ kubectl get pods -A --field-selector spec.nodeName=harvester-jkqhq
+    NAMESPACE                         NAME                                                    READY   STATUS        RESTARTS       AGE
+    cattle-fleet-local-system         fleet-agent-6779fb5dd9-dkpjz                            1/1     Terminating   0              18m
+    cattle-fleet-system               fleet-agent-86db8d9954-qgcpq                            1/1     Terminating   2 (18m ago)    61m
+    cattle-fleet-system               fleet-controller-696d4b8878-ddctd                       1/1     Terminating   1 (19m ago)    29m
+    cattle-fleet-system               gitjob-694dd97686-s4z68                                 1/1     Terminating   1 (19m ago)    29m
+    cattle-provisioning-capi-system   capi-controller-manager-6f497d5574-wkrnf                1/1     Terminating   0              20m
+    cattle-system                     cattle-cluster-agent-76db9cf9fc-5hhsx                   1/1     Terminating   0              20m
+    cattle-system                     cattle-cluster-agent-76db9cf9fc-dnr6m                   1/1     Terminating   0              20m
+    cattle-system                     harvester-cluster-repo-7458c7c69d-p982g                 1/1     Terminating   0              27m
+    cattle-system                     rancher-7d65df9bd4-77n7w                                1/1     Terminating   0              31m
+    cattle-system                     rancher-webhook-cfc66d5d7-fd6gm                         1/1     Terminating   0              28m
+    harvester-system                  harvester-85ff674986-wxkl4                              1/1     Terminating   0              26m
+    harvester-system                  harvester-load-balancer-54cd9754dc-cwtxg                1/1     Terminating   0              20m
+    harvester-system                  harvester-load-balancer-webhook-c8699b786-x6clw         1/1     Terminating   0              20m
+    harvester-system                  harvester-network-controller-manager-b69bf6b69-9f99x    1/1     Terminating   0              178m
+    harvester-system                  harvester-network-controller-vs4jg                      1/1     Running       0              178m
+    harvester-system                  harvester-network-webhook-7b98f8cd98-gjl8b              1/1     Terminating   0              20m
+    harvester-system                  harvester-node-disk-manager-tbh4b                       1/1     Running       0              26m
+    harvester-system                  harvester-node-manager-7pqcp                            1/1     Running       0              178m
+    harvester-system                  harvester-node-manager-webhook-9cfccc84c-68tgp          1/1     Running       0              20m
+    harvester-system                  harvester-node-manager-webhook-9cfccc84c-6bbvg          1/1     Running       0              20m
+    harvester-system                  harvester-webhook-565dc698b6-np89r                      1/1     Terminating   0              26m
+    harvester-system                  hvst-upgrade-szlg8-apply-manifests-4rmjw                0/1     Completed     0              33m
+    harvester-system                  virt-api-6fb7d97b68-cbc5m                               1/1     Terminating   0              20m
+    harvester-system                  virt-api-6fb7d97b68-gqg5c                               1/1     Terminating   0              23m
+    harvester-system                  virt-controller-67d8b4c75c-5qz9x                        1/1     Terminating   0              24m
+    harvester-system                  virt-controller-67d8b4c75c-bdf8w                        1/1     Terminating   2 (18m ago)    23m
+    harvester-system                  virt-handler-xw98h                                      1/1     Running       0              24m
+    harvester-system                  virt-operator-6c98db546-brgnx                           1/1     Terminating   2 (18m ago)    26m
+    kube-system                       harvester-snapshot-validation-webhook-b75f94bcb-95zlb   1/1     Terminating   0              20m
+    kube-system                       harvester-snapshot-validation-webhook-b75f94bcb-xfrmf   1/1     Terminating   0              20m
+    kube-system                       harvester-whereabouts-tdr5g                             1/1     Running       1 (178m ago)   178m
+    kube-system                       helm-install-rke2-ingress-nginx-4wt4j                   0/1     Terminating   0              15m
+    kube-system                       helm-install-rke2-metrics-server-jn58m                  0/1     Terminating   0              15m
+    kube-system                       kube-proxy-harvester-jkqhq                              1/1     Running       0              178m
+    kube-system                       rke2-canal-wfpch                                        2/2     Running       0              178m
+    kube-system                       rke2-coredns-rke2-coredns-864fbd7785-t7k6t              1/1     Terminating   0              178m
+    kube-system                       rke2-coredns-rke2-coredns-autoscaler-6c87968579-rg6g4   1/1     Terminating   0              20m
+    kube-system                       rke2-ingress-nginx-controller-d4h25                     1/1     Running       0              178m
+    kube-system                       rke2-metrics-server-7f745dbddf-2mp5j                    1/1     Terminating   0              20m
+    kube-system                       rke2-multus-fsp94                                       1/1     Running       0              178m
+    kube-system                       snapshot-controller-65d5f465d9-5b2sb                    1/1     Terminating   0              20m
+    kube-system                       snapshot-controller-65d5f465d9-c264r                    1/1     Terminating   0              20m
+    longhorn-system                   backing-image-manager-c16a-7c90                         1/1     Terminating   0              54m
+    longhorn-system                   csi-attacher-5fbd66cf8-674vc                            1/1     Terminating   0              20m
+    longhorn-system                   csi-attacher-5fbd66cf8-725mn                            1/1     Terminating   0              20m
+    longhorn-system                   csi-attacher-5fbd66cf8-85k5d                            1/1     Terminating   0              20m
+    longhorn-system                   csi-provisioner-5b6ff8f4d4-97wsf                        1/1     Terminating   0              20m
+    longhorn-system                   csi-provisioner-5b6ff8f4d4-cbpm9                        1/1     Terminating   0              20m
+    longhorn-system                   csi-provisioner-5b6ff8f4d4-q7z58                        1/1     Terminating   0              19m
+    longhorn-system                   csi-resizer-74c5555748-6rmbf                            1/1     Terminating   0              20m
+    longhorn-system                   csi-resizer-74c5555748-fw2cw                            1/1     Terminating   0              20m
+    longhorn-system                   csi-resizer-74c5555748-p4nph                            1/1     Terminating   0              20m
+    longhorn-system                   csi-snapshotter-6bc4bcf4c5-6858b                        1/1     Terminating   0              20m
+    longhorn-system                   csi-snapshotter-6bc4bcf4c5-cqkbw                        1/1     Terminating   0              20m
+    longhorn-system                   csi-snapshotter-6bc4bcf4c5-mkqtg                        1/1     Terminating   0              20m
+    longhorn-system                   engine-image-ei-b0369a5d-2t4k4                          1/1     Running       0              178m
+    longhorn-system                   instance-manager-a5bd20597b82bcf3ba9d314620b7e670       1/1     Terminating   0              178m
+    longhorn-system                   longhorn-csi-plugin-x6bdg                               3/3     Running       0              178m
+    longhorn-system                   longhorn-driver-deployer-85cf4b4849-5lc52               1/1     Terminating   0              20m
+    longhorn-system                   longhorn-loop-device-cleaner-hhvgv                      1/1     Running       0              178m
+    longhorn-system                   longhorn-manager-5h2zw                                  1/1     Running       0              178m
+    longhorn-system                   longhorn-ui-6b677889f8-hrg8j                            1/1     Terminating   0              20m
+    longhorn-system                   longhorn-ui-6b677889f8-w5hng                            1/1     Terminating   0              20m
+    ```
+
+To resolve the issue, you must restart the `rke2-agent` service on the worker node.
+
+```
+# On the worker node
+sudo systemctl restart rke2-agent.service
+```
+
+The upgrade should resume after the `rke2-agent` service is fully restarted.
+
+:::note
+
+This issue occurs because the agent load balancer on the worker node is unable to connect to the API server on the management node after the `rke2-server` service is restarted. Because the `rke2-server` service can be restarted multiple times when nodes are upgraded, the upgrade process is likely to become stuck again. You may need to restart the `rke2-agent` service multiple times.
+
+To determine if the agent load balancer is functioning, run the following commands:
+
+```
+# On the management node, check if the `rke2-server` service is running.
+sudo systemctl status rke2-server.service
+
+# On the worker node, check if the agent load balancer is functioning.
+sudo /var/lib/rancher/rke2/bin/kubectl --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig get nodes
+```
+
+If the kubectl command does not return a response, the kubelet is unable to access the API server via the agent load balancer. You must restart the `rke2-agent` service.
+
+:::
+
+For more information, see [Issue #6432](https://github.com/harvester/harvester/issues/6432#issuecomment-2325488465).
+
+---
diff --git a/versioned_docs/version-v1.3/upgrade/v1-1-2-to-v1-2-0.md b/versioned_docs/version-v1.3/upgrade/v1-1-2-to-v1-2-0.md
index 934ab40e6b6..61f0bc1912f 100644
--- a/versioned_docs/version-v1.3/upgrade/v1-1-2-to-v1-2-0.md
+++ b/versioned_docs/version-v1.3/upgrade/v1-1-2-to-v1-2-0.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 5
+sidebar_position: 6
 sidebar_label: Upgrade from v1.1.2 to v1.2.0 (not recommended)
 title: "Upgrade from v1.1.2 to v1.2.0 (not recommended)"
 ---
diff --git a/versioned_docs/version-v1.3/upgrade/v1-2-0-to-v1-2-1.md b/versioned_docs/version-v1.3/upgrade/v1-2-0-to-v1-2-1.md
index a017ed68c4d..fe4bdaf36de 100644
--- a/versioned_docs/version-v1.3/upgrade/v1-2-0-to-v1-2-1.md
+++ b/versioned_docs/version-v1.3/upgrade/v1-2-0-to-v1-2-1.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 4
+sidebar_position: 5
 sidebar_label: Upgrade from v1.1.2/v1.1.3/v1.2.0 to v1.2.1
 title: "Upgrade from v1.1.2/v1.1.3/v1.2.0 to v1.2.1"
 ---
diff --git a/versioned_docs/version-v1.3/upgrade/v1-2-1-to-v1-2-2.md b/versioned_docs/version-v1.3/upgrade/v1-2-1-to-v1-2-2.md
index 24d2f64524c..e3f01e9980d 100644
--- a/versioned_docs/version-v1.3/upgrade/v1-2-1-to-v1-2-2.md
+++ b/versioned_docs/version-v1.3/upgrade/v1-2-1-to-v1-2-2.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 3
+sidebar_position: 4
 sidebar_label: Upgrade from v1.2.1 to v1.2.2
 title: "Upgrade from v1.2.1 to v1.2.2"
 ---
diff --git a/versioned_docs/version-v1.3/upgrade/v1-2-2-to-v1-3-1.md b/versioned_docs/version-v1.3/upgrade/v1-2-2-to-v1-3-1.md
index 9810d79eddf..5c4e5a0bbea 100644
--- a/versioned_docs/version-v1.3/upgrade/v1-2-2-to-v1-3-1.md
+++ b/versioned_docs/version-v1.3/upgrade/v1-2-2-to-v1-3-1.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 2
+sidebar_position: 3
 sidebar_label: Upgrade from v1.2.2/v1.3.0 to v1.3.1
 title: "Upgrade from v1.2.2/v1.3.0 to v1.3.1"
 ---
diff --git a/versioned_docs/version-v1.3/upgrade/v1-3-1-to-v1-3-2.md b/versioned_docs/version-v1.3/upgrade/v1-3-1-to-v1-3-2.md
new file mode 100644
index 00000000000..106a7fd3333
--- /dev/null
+++ b/versioned_docs/version-v1.3/upgrade/v1-3-1-to-v1-3-2.md
@@ -0,0 +1,222 @@
+---
+sidebar_position: 2
+sidebar_label: Upgrade from v1.3.1 to v1.3.2
+title: "Upgrade from v1.3.1 to v1.3.2"
+---
+
+<head>
+  <link rel="canonical" href="https://docs.harvesterhci.io/v1.3/upgrade/v1-3-1-to-v1-3-2"/>
+</head>
+
+## General information
+
+An **Upgrade** button appears on the **Dashboard** screen whenever a new Harvester version that you can upgrade to becomes available. For more information, see [Start an upgrade](./automatic.md#start-an-upgrade).
+
+For air-gapped environments, see [Prepare an air-gapped upgrade](./automatic.md#prepare-an-air-gapped-upgrade).
+
+
+## Known issues
+
+---
+
+### 1. Two-node cluster upgrade stuck after the first node is pre-drained
+
+:::info important
+
+Shut down all workload VMs before upgrading **two-node clusters** to prevent data loss.
+
+:::
+
+The worker node can falsely transition to a not-ready state when RKE2 is upgraded on the management node. Consequently, the existing pods on the worker node are evicted and new pods cannot be scheduled on any nodes. These ultimately cause a chained failure in the whole cluster and prevent completion of the upgrade process.
+
+Check the cluster status when the following occur:
+
+- The upgrade process becomes stuck for some time.
+- You are unable to access the Harvester UI and receive an HTTP 503 error.
+
+1. Check the conditions and node statuses of the latest `Upgrade` custom resource.
+
+    Proceed to the next step if the following conditions are met:
+
+    - `SystemServicesUpgraded` is set to `True`, indicating that the system services upgrade is completed.
+    - In `nodeStatuses`, the state of the management node is either `Pre-drained` or `Waiting Reboot`.
+    - In `nodeStatuses`, the state of the worker node is `Images preloaded`.
+
+    Example:
+
+    ```
+    # Find out the latest Upgrade custom resource
+    $ kubectl -n harvester-system get upgrades.harvesterhci -l harvesterhci.io/latestUpgrade=true        
+    NAME                 AGE
+    hvst-upgrade-szlg8   48m
+
+    # Check the conditions and node statuses
+    $ kubectl -n harvester-system get upgrades hvst-upgrade-szlg8 -o yaml
+    apiVersion: harvesterhci.io/v1beta1
+    kind: Upgrade
+    metadata:
+      ...
+      labels:
+        harvesterhci.io/latestUpgrade: "true"
+        harvesterhci.io/upgradeState: UpgradingNodes
+      name: hvst-upgrade-szlg8
+      namespace: harvester-system
+      ...
+    spec:
+      image: ""
+      logEnabled: false
+      version: v1.3.2-rc2
+    status:
+      conditions:
+      - status: Unknown
+        type: Completed
+      - lastUpdateTime: "2024-09-02T11:57:04Z"
+        message: Upgrade observability is administratively disabled
+        reason: Disabled
+        status: "False"
+        type: LogReady
+      - lastUpdateTime: "2024-09-02T11:58:01Z"
+        status: "True"
+        type: ImageReady
+      - lastUpdateTime: "2024-09-02T12:02:31Z"
+        status: "True"
+        type: RepoReady
+      - lastUpdateTime: "2024-09-02T12:18:44Z"
+        status: "True"
+        type: NodesPrepared
+      - lastUpdateTime: "2024-09-02T12:31:25Z"
+        status: "True"
+        type: SystemServicesUpgraded
+      - status: Unknown
+        type: NodesUpgraded
+      imageID: harvester-system/hvst-upgrade-szlg8
+      nodeStatuses:
+        harvester-c6phd:
+          state: Pre-drained
+        harvester-jkqhq:
+          state: Images preloaded
+      previousVersion: v1.3.1
+      ...
+    ```
+
+2. Check the node status.
+
+    Proceed to the next step if the following conditions are met:
+
+    - The status of the worker node is `NotReady`.
+    - The status of the management node is `Ready,SchedulingDisabled`.
+
+    Example:
+
+    ```
+    $ kubectl get nodes
+    NAME              STATUS                     ROLES                       AGE    VERSION
+    harvester-c6phd   Ready,SchedulingDisabled   control-plane,etcd,master   174m   v1.28.12+rke2r1
+    harvester-jkqhq   NotReady                   <none>                      166m   v1.27.13+rke2r1
+    ```
+
+3. Check the pods on the worker node.
+
+    The issue exists in the cluster if the status of most pods is `Terminating`.
+
+    Example:
+
+    ```
+    # Assume harvester-jkqhq is the worker node
+    $ kubectl get pods -A --field-selector spec.nodeName=harvester-jkqhq
+    NAMESPACE                         NAME                                                    READY   STATUS        RESTARTS       AGE
+    cattle-fleet-local-system         fleet-agent-6779fb5dd9-dkpjz                            1/1     Terminating   0              18m
+    cattle-fleet-system               fleet-agent-86db8d9954-qgcpq                            1/1     Terminating   2 (18m ago)    61m
+    cattle-fleet-system               fleet-controller-696d4b8878-ddctd                       1/1     Terminating   1 (19m ago)    29m
+    cattle-fleet-system               gitjob-694dd97686-s4z68                                 1/1     Terminating   1 (19m ago)    29m
+    cattle-provisioning-capi-system   capi-controller-manager-6f497d5574-wkrnf                1/1     Terminating   0              20m
+    cattle-system                     cattle-cluster-agent-76db9cf9fc-5hhsx                   1/1     Terminating   0              20m
+    cattle-system                     cattle-cluster-agent-76db9cf9fc-dnr6m                   1/1     Terminating   0              20m
+    cattle-system                     harvester-cluster-repo-7458c7c69d-p982g                 1/1     Terminating   0              27m
+    cattle-system                     rancher-7d65df9bd4-77n7w                                1/1     Terminating   0              31m
+    cattle-system                     rancher-webhook-cfc66d5d7-fd6gm                         1/1     Terminating   0              28m
+    harvester-system                  harvester-85ff674986-wxkl4                              1/1     Terminating   0              26m
+    harvester-system                  harvester-load-balancer-54cd9754dc-cwtxg                1/1     Terminating   0              20m
+    harvester-system                  harvester-load-balancer-webhook-c8699b786-x6clw         1/1     Terminating   0              20m
+    harvester-system                  harvester-network-controller-manager-b69bf6b69-9f99x    1/1     Terminating   0              178m
+    harvester-system                  harvester-network-controller-vs4jg                      1/1     Running       0              178m
+    harvester-system                  harvester-network-webhook-7b98f8cd98-gjl8b              1/1     Terminating   0              20m
+    harvester-system                  harvester-node-disk-manager-tbh4b                       1/1     Running       0              26m
+    harvester-system                  harvester-node-manager-7pqcp                            1/1     Running       0              178m
+    harvester-system                  harvester-node-manager-webhook-9cfccc84c-68tgp          1/1     Running       0              20m
+    harvester-system                  harvester-node-manager-webhook-9cfccc84c-6bbvg          1/1     Running       0              20m
+    harvester-system                  harvester-webhook-565dc698b6-np89r                      1/1     Terminating   0              26m
+    harvester-system                  hvst-upgrade-szlg8-apply-manifests-4rmjw                0/1     Completed     0              33m
+    harvester-system                  virt-api-6fb7d97b68-cbc5m                               1/1     Terminating   0              20m
+    harvester-system                  virt-api-6fb7d97b68-gqg5c                               1/1     Terminating   0              23m
+    harvester-system                  virt-controller-67d8b4c75c-5qz9x                        1/1     Terminating   0              24m
+    harvester-system                  virt-controller-67d8b4c75c-bdf8w                        1/1     Terminating   2 (18m ago)    23m
+    harvester-system                  virt-handler-xw98h                                      1/1     Running       0              24m
+    harvester-system                  virt-operator-6c98db546-brgnx                           1/1     Terminating   2 (18m ago)    26m
+    kube-system                       harvester-snapshot-validation-webhook-b75f94bcb-95zlb   1/1     Terminating   0              20m
+    kube-system                       harvester-snapshot-validation-webhook-b75f94bcb-xfrmf   1/1     Terminating   0              20m
+    kube-system                       harvester-whereabouts-tdr5g                             1/1     Running       1 (178m ago)   178m
+    kube-system                       helm-install-rke2-ingress-nginx-4wt4j                   0/1     Terminating   0              15m
+    kube-system                       helm-install-rke2-metrics-server-jn58m                  0/1     Terminating   0              15m
+    kube-system                       kube-proxy-harvester-jkqhq                              1/1     Running       0              178m
+    kube-system                       rke2-canal-wfpch                                        2/2     Running       0              178m
+    kube-system                       rke2-coredns-rke2-coredns-864fbd7785-t7k6t              1/1     Terminating   0              178m
+    kube-system                       rke2-coredns-rke2-coredns-autoscaler-6c87968579-rg6g4   1/1     Terminating   0              20m
+    kube-system                       rke2-ingress-nginx-controller-d4h25                     1/1     Running       0              178m
+    kube-system                       rke2-metrics-server-7f745dbddf-2mp5j                    1/1     Terminating   0              20m
+    kube-system                       rke2-multus-fsp94                                       1/1     Running       0              178m
+    kube-system                       snapshot-controller-65d5f465d9-5b2sb                    1/1     Terminating   0              20m
+    kube-system                       snapshot-controller-65d5f465d9-c264r                    1/1     Terminating   0              20m
+    longhorn-system                   backing-image-manager-c16a-7c90                         1/1     Terminating   0              54m
+    longhorn-system                   csi-attacher-5fbd66cf8-674vc                            1/1     Terminating   0              20m
+    longhorn-system                   csi-attacher-5fbd66cf8-725mn                            1/1     Terminating   0              20m
+    longhorn-system                   csi-attacher-5fbd66cf8-85k5d                            1/1     Terminating   0              20m
+    longhorn-system                   csi-provisioner-5b6ff8f4d4-97wsf                        1/1     Terminating   0              20m
+    longhorn-system                   csi-provisioner-5b6ff8f4d4-cbpm9                        1/1     Terminating   0              20m
+    longhorn-system                   csi-provisioner-5b6ff8f4d4-q7z58                        1/1     Terminating   0              19m
+    longhorn-system                   csi-resizer-74c5555748-6rmbf                            1/1     Terminating   0              20m
+    longhorn-system                   csi-resizer-74c5555748-fw2cw                            1/1     Terminating   0              20m
+    longhorn-system                   csi-resizer-74c5555748-p4nph                            1/1     Terminating   0              20m
+    longhorn-system                   csi-snapshotter-6bc4bcf4c5-6858b                        1/1     Terminating   0              20m
+    longhorn-system                   csi-snapshotter-6bc4bcf4c5-cqkbw                        1/1     Terminating   0              20m
+    longhorn-system                   csi-snapshotter-6bc4bcf4c5-mkqtg                        1/1     Terminating   0              20m
+    longhorn-system                   engine-image-ei-b0369a5d-2t4k4                          1/1     Running       0              178m
+    longhorn-system                   instance-manager-a5bd20597b82bcf3ba9d314620b7e670       1/1     Terminating   0              178m
+    longhorn-system                   longhorn-csi-plugin-x6bdg                               3/3     Running       0              178m
+    longhorn-system                   longhorn-driver-deployer-85cf4b4849-5lc52               1/1     Terminating   0              20m
+    longhorn-system                   longhorn-loop-device-cleaner-hhvgv                      1/1     Running       0              178m
+    longhorn-system                   longhorn-manager-5h2zw                                  1/1     Running       0              178m
+    longhorn-system                   longhorn-ui-6b677889f8-hrg8j                            1/1     Terminating   0              20m
+    longhorn-system                   longhorn-ui-6b677889f8-w5hng                            1/1     Terminating   0              20m
+    ```
+
+To resolve the issue, you must restart the `rke2-agent` service on the worker node.
+
+```
+# On the worker node
+sudo systemctl restart rke2-agent.service
+```
+
+The upgrade should resume after the `rke2-agent` service is fully restarted.
+
+:::note
+
+This issue occurs because the agent load balancer on the worker node is unable to connect to the API server on the management node after the `rke2-server` service is restarted. Because the `rke2-server` service can be restarted multiple times when nodes are upgraded, the upgrade process is likely to become stuck again. You may need to restart the `rke2-agent` service multiple times.
+
+To determine if the agent load balancer is functioning, run the following commands:
+
+```
+# On the management node, check if the `rke2-server` service is running.
+sudo systemctl status rke2-server.service
+
+# On the worker node, check if the agent load balancer is functioning.
+sudo /var/lib/rancher/rke2/bin/kubectl --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig get nodes
+```
+
+If the kubectl command does not return a response, the kubelet is unable to access the API server via the agent load balancer. You must restart the `rke2-agent` service.
+
+:::
+
+For more information, see [Issue #6432](https://github.com/harvester/harvester/issues/6432#issuecomment-2325488465).
+
+---