You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// if stateless CNI fail to get the endpoint from CNS for any reason other than Endpoint Not found or CNS connection failure
1073
1073
// return a retriable error so the container runtime will retry this DEL later
1074
-
// the implementation of this function returns nil if the endpoint doens't exist, so
1074
+
// the implementation of this function returns nil if the endpoint doesn't exist, so
1075
1075
// we don't have to check that here
1076
1076
iferr!=nil {
1077
1077
switch {
1078
1078
caseerrors.Is(err, network.ErrConnectionFailure):
1079
1079
logger.Error("Failed to connect to CNS", zap.Error(err))
1080
1080
logger.Info("Endpoint will be deleted from state file asynchronously", zap.String("containerID", args.ContainerID))
1081
+
// In SwiftV2 Linux stateless CNI mode, if the plugin cannot connect to CNS,
1082
+
// we asynchronously remove the secondary (delegated) interface from the pod’s network namespace in the absence of the endpoint state.
1083
+
// This is necessary because leaving the delegated NIC in the pod netns can cause the kernel to block rtnetlink operations.
1084
+
// When that happens, kubelet and containerd hang during sandbox creation or teardown.
1085
+
// The delegated NIC (SR-IOV VF) used by SwiftV2 for multitenant pods remains tied to the pod namespace,
1086
+
// triggering hot-unplug/re-register events and leaving the node in an unhealthy state.
1087
+
// This workaround mitigates the issue by removing the secondary NIC from the pod netns when CNS is unreachable during DEL to provide the endpoint state.
// for Stateful CNI when the endpoint is not created, but the ips are already allocated (only works if single network, single infra)
1099
-
// this block is applied to stateless CNI only if there was a connection failure in previous block
1110
+
// this block is applied to stateless CNI only if there was a connection failure in previous block and asynchronous delete by CNS will remover the endpoint from state file
0 commit comments