Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: fix a bug where force GC wasn't respected #24456

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

pkazmierczak
Copy link
Contributor

@pkazmierczak pkazmierczak commented Nov 13, 2024

This PR fixes a bug where System.GarbageCollect endpoint didn't work on objects that weren't older than their respective GC thresholds. System.GarbageCollect is used to force garbage collection (also used by the system gc command) and should ignore any GC threshold settings.

Fixes #24455
Internal ref: https://hashicorp.atlassian.net/browse/NET-11747

@pkazmierczak pkazmierczak self-assigned this Nov 13, 2024
@pkazmierczak pkazmierczak added the backport/1.9.x backport to 1.9.x release line label Nov 13, 2024
@pkazmierczak pkazmierczak added this to the 1.9.x milestone Nov 13, 2024
Comment on lines +52 to +65
func (c *CoreScheduler) setCustomThresholdForAllObjects(threshold time.Duration) {
for _, objectName := range []string{
"job",
"eval",
"batchEval",
"deployment",
"csiPlugin",
"csiVolume",
"token",
"node",
} {
c.setCustomThresholdForObject(objectName, threshold)
}
}
Copy link
Member

@schmichael schmichael Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern with this kind of approach is that we're relying on matching strings to tables/objects to funcs. If we add a new table, and remember to add a new GC func, we also have to remember adding this string. Since we don't add new gc-able objects often, it seems even more likely we'd forget the force gc path.

What if instead we passed the threshold into the funcs directly from Process? Process already does string parsing of the Eval (which is a different string than this map keys off of!), so I don't think it would make Process too much more complicated to pass the appropriate threshold from config into funcs. It might make testing easier to? You can pass thresholds in directly to gc funcs or you can customize the config. I would hope that would cover every case.

Copy link
Contributor Author

@pkazmierczak pkazmierczak Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't add new gc-able objects often, it seems even more likely we'd forget the force gc path.

That is, sadly, very likely.

What if instead we passed the threshold into the funcs directly from Process?

I looked at that but it's a major change. Not only would we have to change every call to Process (of which there are many), but we'd also need to change the scheduler interface. Not saying we shouldn't do this, just saying it's a big change.

Copy link
Member

@schmichael schmichael Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only would we have to change every call to Process

Do we have to change the Process call, or just change the code inside CoreScheduler.Process?

For example could the call to c.jobGC(eval) be changed to c.jobGC(eval, c.srv.config.JobGCThreshold) to avoid the config lookup inside jobGC itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a solution, but it would break unit tests that rely on manipulating threshold values. These tests don't call {object}GC(eval), they call higher-level methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The passed in threshold could still be overridden by custom values, so I don't think it would break tests. Although it does feel weird to pass in a struct field thats already available to the receiver.
Which makes me wonder if (I think your original idea) of just passing in a forceGC boolean might be the best way to go here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Passing a force boolean to every GC method will still be very messy. Because I can't think of a better way of forcing than setting a very very small threshold value. So then for every GC method we'd have to check 3 places where a threshold might've been set: config, custom field in the scheduler, or force bool induced minimal value. That, to me, is even less elegant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.9.x backport to 1.9.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nomad system gc not working as expected
3 participants