Skip to content

Commit 7008144

Browse files
committed
Fix risk scenario to include node affinity
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
1 parent f453968 commit 7008144

File tree

1 file changed

+43
-2
lines changed
  • keps/sig-scheduling/5721-semver-operators

1 file changed

+43
-2
lines changed

keps/sig-scheduling/5721-semver-operators/README.md

Lines changed: 43 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -208,11 +208,51 @@ spec:
208208

209209
#### Invalid SemVer Node Label or Taint
210210

211-
**Risk**: Node labels or taints are currently free-form strings and are not validated for SemVer compliance at registration time (e.g. a node may carry a taint like this `node.kubernetes.io/containerRuntimeVersion=containerd://2.1.4` instead of `node.kubernetes.io/containerRuntimeVersion=2.1.4`). Since taint values are not validated at node registration time, these misconfigurations are only detected during scheduling when a pod with `SemverLt`/`SemverGt`/`SemverEq` tolerations attempts to match. This can lead to pods remaining in `Pending` state without clear indication of the root cause.
211+
**Risk**: Node labels and taints are currently free-form strings and are not validated for SemVer compliance at registration time. This creates two problematic scenarios:
212+
213+
1. **Invalid node-side values**: A node may carry a taint like `node.kubernetes.io/containerRuntimeVersion=containerd://2.1.4` instead of `node.kubernetes.io/containerRuntimeVersion=2.1.4`, or a label like `kernel.version=5.15.0-generic` instead of `kernel.version=5.15.0`.
214+
215+
2. **Delayed detection**: Since node labels/taints are not validated at node registration time, misconfigurations are only detected during scheduling when a pod with `SemverLt`/`SemverGt`/`SemverEq` operators attempts to match against them.
216+
217+
This can lead to:
218+
- Pods stuck in `Pending` state indefinitely
219+
- Unclear error messages for cluster operators
220+
- Silent scheduling failures for `preferredDuringSchedulingIgnoredDuringExecution` affinity (pod schedules but ignores the preference)
212221

213222
**Mitigation**:
214223

215-
- Pod validation: Current validation strictly enforces that only `Equal` and `Exists` operators are allowed. Users with version taint values today must explicitly change the operator to `SemverLt` or `SemverGt`, at which point pod-side validation will catch non-version toleration values and reject the pod spec before scheduling.
224+
**1. Pod-Side Validation (Admission Time)**
225+
226+
- **Tolerations**: API server validation strictly requires that toleration values using `SemverLt`, `SemverGt`, or `SemverEq` operators must be valid SemVer strings. Invalid values are rejected during pod admission:
227+
```
228+
spec.tolerations[0].value: Invalid value: "containerd://2.1.4":
229+
Invalid character(s) found in major number "containerd:"
230+
```
231+
232+
- **Node Affinity**: API server validation strictly requires that node affinity requirement values using `SemverLt`, `SemverGt`, or `SemverEq` operators must be valid SemVer strings. Invalid values are rejected during pod admission:
233+
```
234+
spec.affinity.nodeAffinity...matchExpressions[0].values[0]: Invalid value: "v1.2.x":
235+
Invalid character(s) found in patch number "x"
236+
```
237+
238+
This ensures that users cannot create pods with invalid SemVer values on the pod side.
239+
240+
**2. Node-Side Handling (Scheduling Time)**
241+
242+
When a pod with valid SemVer operators encounters a node with invalid taint/label values:
243+
244+
- **Tolerations**: If a node taint value cannot be parsed as SemVer, the `compareSemVerValues` function returns `false`, meaning the toleration does not match:
245+
- For `NoSchedule`/`NoExecute` taints: Pod cannot schedule on that node
246+
- For `PreferNoSchedule` taints: Node receives lower score
247+
- Scheduler event: `0/N nodes are available: X node(s) had untolerated taint {key: invalid-value}`
248+
- Scheduler logs (Error level): `"failed to parse taint value as semantic version" taint="invalid-value"`
249+
250+
- **Node Affinity**: If a node label value cannot be parsed as SemVer, the affinity matching returns `false`:
251+
- For `requiredDuringSchedulingIgnoredDuringExecution`: Pod cannot schedule on that node
252+
- For `preferredDuringSchedulingIgnoredDuringExecution`: Node receives 0 score contribution for that term (pod can still schedule)
253+
- Scheduler logs (V(10) level): `"Parse semver failed for value X in label Y"`
254+
255+
The behavior is **fail-safe**: Invalid values cause matching to fail, preventing pods from scheduling on potentially incompatible nodes.
216256
217257
#### Controller Hot-Loop When Feature Gate is Disabled
218258
@@ -447,6 +487,7 @@ func compareSemVerValues(logger klog.Logger, tolerationVal, taintVal string, op
447487
taintVersion, err := semver.ParseTolerant(taintVal)
448488
if err != nil {
449489
logger.Error(err, "failed to parse taint value as semantic version", "taint", taintVal)
490+
return false
450491
}
451492

452493
switch op {

0 commit comments

Comments
 (0)