Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: fix a bug where force GC wasn't respected #24456

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .changelog/24456.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
```release-note:bug
scheduler: Fix bug where forced garbage collection does not ignore GC thresholds
```
108 changes: 69 additions & 39 deletions nomad/core_sched.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,29 +28,42 @@ type CoreScheduler struct {
snap *state.StateSnapshot
logger log.Logger

// custom GC Threshold values can be used by unit tests to simulate time
// manipulation
customJobGCThreshold time.Duration
customEvalGCThreshold time.Duration
customBatchEvalGCThreshold time.Duration
customNodeGCThreshold time.Duration
customDeploymentGCThreshold time.Duration
customCSIVolumeClaimGCThreshold time.Duration
customCSIPluginGCThreshold time.Duration
customACLTokenExpirationGCThreshold time.Duration
customRootKeyGCThreshold time.Duration
// custom GC Threshold is a map of object names to threshold values which can be
// used by unit tests to simulate time manipulation, and is used by forceGC to
// set minimal threshold so that virtually all eligible objects will get GCd
customGCThreshold map[string]time.Duration
}

// NewCoreScheduler is used to return a new system scheduler instance
func NewCoreScheduler(srv *Server, snap *state.StateSnapshot) scheduler.Scheduler {
s := &CoreScheduler{
srv: srv,
snap: snap,
logger: srv.logger.ResetNamed("core.sched"),
srv: srv,
snap: snap,
logger: srv.logger.ResetNamed("core.sched"),
customGCThreshold: make(map[string]time.Duration),
}
return s
}

func (c *CoreScheduler) setCustomThresholdForObject(objectName string, threshold time.Duration) {
c.customGCThreshold[objectName] = threshold
}

func (c *CoreScheduler) setCustomThresholdForAllObjects(threshold time.Duration) {
for _, objectName := range []string{
"job",
"eval",
"batchEval",
"deployment",
"csiPlugin",
"csiVolume",
"token",
"node",
} {
c.setCustomThresholdForObject(objectName, threshold)
}
}
Comment on lines +52 to +65
Copy link
Member

@schmichael schmichael Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern with this kind of approach is that we're relying on matching strings to tables/objects to funcs. If we add a new table, and remember to add a new GC func, we also have to remember adding this string. Since we don't add new gc-able objects often, it seems even more likely we'd forget the force gc path.

What if instead we passed the threshold into the funcs directly from Process? Process already does string parsing of the Eval (which is a different string than this map keys off of!), so I don't think it would make Process too much more complicated to pass the appropriate threshold from config into funcs. It might make testing easier to? You can pass thresholds in directly to gc funcs or you can customize the config. I would hope that would cover every case.

Copy link
Contributor Author

@pkazmierczak pkazmierczak Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't add new gc-able objects often, it seems even more likely we'd forget the force gc path.

That is, sadly, very likely.

What if instead we passed the threshold into the funcs directly from Process?

I looked at that but it's a major change. Not only would we have to change every call to Process (of which there are many), but we'd also need to change the scheduler interface. Not saying we shouldn't do this, just saying it's a big change.

Copy link
Member

@schmichael schmichael Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only would we have to change every call to Process

Do we have to change the Process call, or just change the code inside CoreScheduler.Process?

For example could the call to c.jobGC(eval) be changed to c.jobGC(eval, c.srv.config.JobGCThreshold) to avoid the config lookup inside jobGC itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a solution, but it would break unit tests that rely on manipulating threshold values. These tests don't call {object}GC(eval), they call higher-level methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The passed in threshold could still be overridden by custom values, so I don't think it would break tests. Although it does feel weird to pass in a struct field thats already available to the receiver.
Which makes me wonder if (I think your original idea) of just passing in a forceGC boolean might be the best way to go here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Passing a force boolean to every GC method will still be very messy. Because I can't think of a better way of forcing than setting a very very small threshold value. So then for every GC method we'd have to check 3 places where a threshold might've been set: config, custom field in the scheduler, or force bool induced minimal value. That, to me, is even less elegant.


// Process is used to implement the scheduler.Scheduler interface
func (c *CoreScheduler) Process(eval *structs.Evaluation) error {
job := strings.Split(eval.JobID, ":") // extra data can be smuggled in w/ JobID
Expand Down Expand Up @@ -86,6 +99,11 @@ func (c *CoreScheduler) Process(eval *structs.Evaluation) error {

// forceGC is used to garbage collect all eligible objects.
func (c *CoreScheduler) forceGC(eval *structs.Evaluation) error {
// set a minimal threshold for all objects to make force GC possible, and
// remember to reset it when we're done
c.setCustomThresholdForAllObjects(time.Millisecond)
defer c.setCustomThresholdForAllObjects(0)

if err := c.jobGC(eval); err != nil {
return err
}
Expand Down Expand Up @@ -113,6 +131,7 @@ func (c *CoreScheduler) forceGC(eval *structs.Evaluation) error {
if err := c.rootKeyGC(eval, time.Now()); err != nil {
return err
}

// Node GC must occur after the others to ensure the allocations are
// cleared.
return c.nodeGC(eval)
Expand All @@ -131,8 +150,10 @@ func (c *CoreScheduler) jobGC(eval *structs.Evaluation) error {
threshold = c.srv.config.JobGCThreshold

// custom threshold override
if c.customJobGCThreshold != 0 {
threshold = c.customJobGCThreshold
if val, ok := c.customGCThreshold["job"]; ok {
if val != 0 {
threshold = val
}
}

cutoffTime := c.getCutoffTime(threshold)
Expand Down Expand Up @@ -276,11 +297,15 @@ func (c *CoreScheduler) evalGC() error {
batchThreshold = c.srv.config.BatchEvalGCThreshold

// custom threshold override
if c.customEvalGCThreshold != 0 {
threshold = c.customEvalGCThreshold
if val, ok := c.customGCThreshold["eval"]; ok {
if val != 0 {
threshold = val
}
}
if c.customBatchEvalGCThreshold != 0 {
batchThreshold = c.customBatchEvalGCThreshold
if val, ok := c.customGCThreshold["batchEval"]; ok {
if val != 0 {
batchThreshold = val
}
}

cutoffTime := c.getCutoffTime(threshold)
Expand Down Expand Up @@ -376,8 +401,7 @@ func (c *CoreScheduler) gcEval(eval *structs.Evaluation, cutoffTime time.Time, a
var gcAllocIDs []string
for _, alloc := range allocs {
if !allocGCEligible(alloc, job, time.Now().UTC(), cutoffTime) {
// Can't GC the evaluation since not all of the allocations are
// terminal
// Can't GC the evaluation since not all the allocations are terminal
gcEval = false
} else {
// The allocation is eligible to be GC'd
Expand Down Expand Up @@ -474,8 +498,10 @@ func (c *CoreScheduler) nodeGC(eval *structs.Evaluation) error {
threshold = c.srv.config.NodeGCThreshold

// custom threshold override
if c.customNodeGCThreshold != 0 {
threshold = c.customNodeGCThreshold
if val, ok := c.customGCThreshold["node"]; ok {
if val != 0 {
threshold = val
}
}
cutoffTime := c.getCutoffTime(threshold)

Expand Down Expand Up @@ -578,8 +604,10 @@ func (c *CoreScheduler) deploymentGC() error {
threshold = c.srv.config.DeploymentGCThreshold

// custom threshold override
if c.customDeploymentGCThreshold != 0 {
threshold = c.customDeploymentGCThreshold
if val, ok := c.customGCThreshold["deployment"]; ok {
if val != 0 {
threshold = val
}
}
cutoffTime := c.getCutoffTime(threshold)

Expand Down Expand Up @@ -778,8 +806,10 @@ func (c *CoreScheduler) csiVolumeClaimGC(eval *structs.Evaluation) error {
threshold = c.srv.config.CSIVolumeClaimGCThreshold

// custom threshold override
if c.customCSIVolumeClaimGCThreshold != 0 {
threshold = c.customCSIVolumeClaimGCThreshold
if val, ok := c.customGCThreshold["csiVolume"]; ok {
if val != 0 {
threshold = val
}
}
cutoffTime := c.getCutoffTime(threshold)

Expand Down Expand Up @@ -825,8 +855,10 @@ func (c *CoreScheduler) csiPluginGC(eval *structs.Evaluation) error {
threshold = c.srv.config.CSIPluginGCThreshold

// custom threshold override
if c.customCSIPluginGCThreshold != 0 {
threshold = c.customCSIPluginGCThreshold
if val, ok := c.customGCThreshold["csiPlugin"]; ok {
if val != 0 {
threshold = val
}
}
cutoffTime := c.getCutoffTime(threshold)

Expand Down Expand Up @@ -893,8 +925,10 @@ func (c *CoreScheduler) expiredACLTokenGC(eval *structs.Evaluation, global bool)
threshold = c.srv.config.ACLTokenExpirationGCThreshold

// custom threshold override
if c.customACLTokenExpirationGCThreshold != 0 {
threshold = c.customACLTokenExpirationGCThreshold
if val, ok := c.customGCThreshold["token"]; ok {
if val != 0 {
threshold = val
}
}
cutoffTime := c.getCutoffTime(threshold)

Expand Down Expand Up @@ -1003,13 +1037,9 @@ func (c *CoreScheduler) rootKeyGC(eval *structs.Evaluation, now time.Time) error
return err
}

var threshold time.Duration
threshold = c.srv.config.RootKeyGCThreshold

// custom threshold override
if c.customRootKeyGCThreshold != 0 {
threshold = c.customRootKeyGCThreshold
}
// we don't do custom overrides for root keys because they are never subject to
// force GC
threshold := c.srv.config.RootKeyGCThreshold

// the threshold is longer than we can support with the time table, and we
// never want to force-GC keys because that will orphan signed Workload
Expand Down
4 changes: 1 addition & 3 deletions nomad/core_sched_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -527,9 +527,7 @@ func TestCoreScheduler_EvalGC_Batch(t *testing.T) {

// set a shorter GC threshold this time
gc = s1.coreJobEval(structs.CoreJobEvalGC, jobModifyIdx*2)
core.(*CoreScheduler).customBatchEvalGCThreshold = time.Minute
//core.(*CoreScheduler).customEvalGCThreshold = time.Minute
//core.(*CoreScheduler).customJobGCThreshold = time.Minute
core.(*CoreScheduler).setCustomThresholdForObject("batchEval", time.Minute)
must.NoError(t, core.Process(gc))

// We expect the following:
Expand Down
8 changes: 4 additions & 4 deletions nomad/system_endpoint_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,15 +34,15 @@ func TestSystemEndpoint_GarbageCollect(t *testing.T) {
job := mock.Job()
job.Type = structs.JobTypeBatch
job.Stop = true
// submit time must be older than default job GC
job.SubmitTime = time.Now().Add(-6 * time.Hour).UnixNano()
// set submit time older than now but still newer than default GC threshold
job.SubmitTime = time.Now().Add(-10 * time.Millisecond).UnixNano()
must.NoError(t, state.UpsertJob(structs.MsgTypeTestSetup, 1000, nil, job))

eval := mock.Eval()
eval.Status = structs.EvalStatusComplete
eval.JobID = job.ID
// modify time must be older than default eval GC
eval.ModifyTime = time.Now().Add(-5 * time.Hour).UnixNano()
// set modify time older than now but still newer than default GC threshold
eval.ModifyTime = time.Now().Add(-10 * time.Millisecond).UnixNano()
must.NoError(t, state.UpsertEvals(structs.MsgTypeTestSetup, 1001, []*structs.Evaluation{eval}))

// Make the GC request
Expand Down