You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-scheduling/5721-semver-operators/README.md
+43-2Lines changed: 43 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -208,11 +208,51 @@ spec:
208
208
209
209
#### Invalid SemVer Node Label or Taint
210
210
211
-
**Risk**: Node labels or taints are currently free-form strings and are not validated for SemVer compliance at registration time (e.g. a node may carry a taint like this `node.kubernetes.io/containerRuntimeVersion=containerd://2.1.4` instead of `node.kubernetes.io/containerRuntimeVersion=2.1.4`). Since taint values are not validated at node registration time, these misconfigurations are only detected during scheduling when a pod with `SemverLt`/`SemverGt`/`SemverEq` tolerations attempts to match. This can lead to pods remaining in `Pending` state without clear indication of the root cause.
211
+
**Risk**: Node labels and taints are currently free-form strings and are not validated for SemVer compliance at registration time. This creates two problematic scenarios:
212
+
213
+
1. **Invalid node-side values**: A node may carry a taint like `node.kubernetes.io/containerRuntimeVersion=containerd://2.1.4` instead of `node.kubernetes.io/containerRuntimeVersion=2.1.4`, or a label like `kernel.version=5.15.0-generic` instead of `kernel.version=5.15.0`.
214
+
215
+
2. **Delayed detection**: Since node labels/taints are not validated at node registration time, misconfigurations are only detected during scheduling when a pod with `SemverLt`/`SemverGt`/`SemverEq` operators attempts to match against them.
216
+
217
+
This can lead to:
218
+
- Pods stuck in `Pending` state indefinitely
219
+
- Unclear error messages for cluster operators
220
+
- Silent scheduling failures for `preferredDuringSchedulingIgnoredDuringExecution` affinity (pod schedules but ignores the preference)
212
221
213
222
**Mitigation**:
214
223
215
-
- Pod validation: Current validation strictly enforces that only `Equal` and `Exists` operators are allowed. Users with version taint values today must explicitly change the operator to `SemverLt` or `SemverGt`, at which point pod-side validation will catch non-version toleration values and reject the pod spec before scheduling.
224
+
**1. Pod-Side Validation (Admission Time)**
225
+
226
+
- **Tolerations**: API server validation strictly requires that toleration values using `SemverLt`, `SemverGt`, or `SemverEq` operators must be valid SemVer strings. Invalid values are rejected during pod admission:
Invalid character(s) found in major number "containerd:"
230
+
```
231
+
232
+
- **Node Affinity**: API server validation strictly requires that node affinity requirement values using `SemverLt`, `SemverGt`, or `SemverEq` operators must be valid SemVer strings. Invalid values are rejected during pod admission:
This ensures that users cannot create pods with invalid SemVer values on the pod side.
239
+
240
+
**2. Node-Side Handling (Scheduling Time)**
241
+
242
+
When a pod with valid SemVer operators encounters a node with invalid taint/label values:
243
+
244
+
- **Tolerations**: If a node taint value cannot be parsed as SemVer, the `compareSemVerValues` function returns `false`, meaning the toleration does not match:
245
+
- For `NoSchedule`/`NoExecute` taints: Pod cannot schedule on that node
246
+
- For `PreferNoSchedule` taints: Node receives lower score
247
+
- Scheduler event: `0/N nodes are available: X node(s) had untolerated taint {key: invalid-value}`
248
+
- Scheduler logs (Error level): `"failed to parse taint value as semantic version" taint="invalid-value"`
249
+
250
+
- **Node Affinity**: If a node label value cannot be parsed as SemVer, the affinity matching returns `false`:
251
+
- For `requiredDuringSchedulingIgnoredDuringExecution`: Pod cannot schedule on that node
252
+
- For `preferredDuringSchedulingIgnoredDuringExecution`: Node receives 0 score contribution for that term (pod can still schedule)
253
+
- Scheduler logs (V(10) level): `"Parse semver failed for value X in label Y"`
254
+
255
+
The behavior is **fail-safe**: Invalid values cause matching to fail, preventing pods from scheduling on potentially incompatible nodes.
216
256
217
257
#### Controller Hot-Loop When Feature Gate is Disabled
0 commit comments