-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Description
Summary
Preemption pipeline failures can cause victim task evictions to be applied even when preemption does not successfully complete, breaking the intended atomicity of scheduler Statement operations.
Impact
- Victim tasks may be evicted without the preemptor being scheduled
- Gang scheduling guarantees (e.g.
MinAvailable) can be violated - Failed preemption attempts can cause unintended workload disruption
Background
In the current preemption flow:
- Evictions are recorded on the scheduler
Statement - The preemptor is then pipelined
- If the pipeline fails, the preemptor state is rolled back, but recorded eviction operations may still be committed
This allows eviction side effects to escape failed preemption attempts.
Affected Code
-
pkg/scheduler/actions/preempt/preempt.go- Preemption pipeline execution and failure handling
-
pkg/scheduler/framework/statement.go- Operation recording and commit semantics
-
pkg/scheduler/framework/statement_test.go- Boundary behavior around rollback / discard scenarios
Design Question
What is the correct architectural approach to ensure eviction side effects are only committed when preemption succeeds?
Option A: Full Statement Save / Discard (Preferred Pattern)
- Save the statement before eviction
- Discard or recover it on pipeline failure
- Only merge the statement when preemption completes successfully
Option B: Operation-Level Rollback
- Track and explicitly roll back eviction operations on failure
This fixes the issue locally but may introduce undo semantics on Statement that were not originally intended.
Goal
Ensure failed preemption attempts never apply eviction side effects, while staying consistent with existing Statement design patterns.
Context
Identified while implementing a fix for eviction rollback on preemption pipeline failure. The initial implementation raised concerns about exposing rollback behavior on Statement, prompting this issue to align on the intended design before proceeding.
Steps to reproduce the issue
-
Configure a workload where preemption is required to schedule a pending task
(e.g. a gang-scheduled job that cannot be placed without evicting victim tasks). -
Trigger the preemption workflow so that victim task evictions are recorded on the scheduler
Statement. -
Force the preemption pipeline to fail after eviction operations are recorded but before the preemptor is successfully scheduled
(e.g. pipeline error, scheduling failure, or preemptor pod creation failure). -
Allow the scheduler
Statementto be committed. -
Observe that victim tasks are evicted even though the preemptor was never scheduled.
Describe the results you received and expected
Expected Behavior
- If the preemption pipeline fails, no eviction side effects should be committed
- Victim tasks should remain running
- Scheduler state should remain unchanged after a failed preemption attempt
Actual Behavior
- Eviction operations recorded prior to pipeline failure are committed
- Victim tasks are evicted despite preemption not succeeding
What version of Volcano are you using?
main branch @ current HEAD (pre-merge)
Any other relevant information
This issue is independent of Kubernetes version, OS, or kernel configuration.
The behavior is triggered by the scheduler preemption control flow and statement commit semantics, and can be reproduced in unit tests without a running cluster.
No additional logs or manifests are attached, as the issue is reproducible via scheduler logic and unit tests.