-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throttler: fix race conditions in Operate() termination and in tests #14971
Throttler: fix race conditions in Operate() termination and in tests #14971
Conversation
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
@@ -382,3 +383,51 @@ func TestProbesWhileOperating(t *testing.T) { | |||
}) | |||
}) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following is mostly copy+paste from the now deleted throttler_exclude_race_test.go
file, and with a few minor changes. Notably atomic.LoadInt64(...)
and some cosmetics.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #14971 +/- ##
==========================================
+ Coverage 47.26% 47.28% +0.02%
==========================================
Files 1138 1138
Lines 238842 238846 +4
==========================================
+ Hits 112880 112937 +57
+ Misses 117368 117318 -50
+ Partials 8594 8591 -3 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I only had a few minor questions/comments. Thank you for working on this! ❤️
throttler.aggregatedMetrics.Flush() | ||
throttler.recentApps.Flush() | ||
throttler.nonLowPriorityAppRequestsThrottled.Flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are done in the reverse order now because they are in defers. I'm assuming that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's fine. Ordering is irrelevant here.
defer wg.Done() | ||
defer throttler.aggregatedMetrics.Flush() | ||
defer throttler.recentApps.Flush() | ||
defer throttler.nonLowPriorityAppRequestsThrottled.Flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use a single deferred function, at least for the flushes? It's fine as-is, just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason at all. I can consolidate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consolidated.
// While the throttler is disabled, it is technically safe to iterate those structures. However, `go test -race` disagrees, | ||
// which is why this test is in this *exclude_race* file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is now off, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Description
This PR fixes the following race conditions in the tablet throttler:
Disable()
, caches were flushed (cleared) before context as cancelled, and soOperate()
's goroutine could still re-populate them before actually terminating.Operate()
's goroutine to terminate. At the time we excluded those tests via//go:build !race
, but:We introduce a
WaitCondition
with which the unit tests canWait()
for the complete termination ofOperate()
's goroutine. We ensure to flush the caches at that very goroutine. We fix and improve the unit test, and we now include it in race tests.Related Issue(s)
#14967
Checklist
Deployment Notes