-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ActionQueue retry on failure #1944
Conversation
ee/control/control_test.go
Outdated
// Verify error consumer was called | ||
assert.Equal(t, 1, errConsumer.updates) | ||
|
||
// Verify hash was still recorded despite error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, maybe I'm missing something -- is this the desired behavior? I thought we didn't want to store/record the hash, so that we don't have to wait for a change in configuration (i.e. a new hash) to perform the retry. (I think that'd look like changing the behavior around here: https://github.com/kolide/launcher/blob/main/ee/control/control.go#L338-L346.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't fully understanding that part but I think I get it now, the hash not being saved would be what will retrigger the fetch and update. Let me adjust that
Co-authored-by: James Pickett <James-Pickett@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
Resolves #1708
We are implementing a retry pattern by returning an error from func (aq *ActionQueue) Update(data io.Reader) error if any action fails to process. Then the updated data for that subsystem shouldn't be stored, so Update would be called again in 1 minute by the control system, effectively handling the retry for us.
This pull request introduces error handling improvements and additional test cases for the
ActionQueue
andControlService
components. The main changes include adding error handling for action processing failures, updating test cases to handle these errors, and adding new test cases for retry logic in theControlService
.Error handling improvements:
ee/control/actionqueue/actionqueue.go
: IntroducedprocessError
to capture errors during action processing and return it at the end of theUpdate
method. [1] [2] [3]Updates to existing tests:
ee/control/actionqueue/actionqueue_test.go
: Modified theTestActionQueue_HandlesDuplicatesWhenFirstActionCouldNotBeSent
test to check for errors on the first attempt and ensure success on the second attempt. [1] [2]New test cases:
ee/control/control_test.go
: AddedTestControlServiceUpdateErr
to verify that theControlService
handles update errors correctly and records the hash despite the error.ee/control/control_test.go
: AddedTestControlServiceRetryAfterUpdateErr
to test the retry logic after an update error, ensuring that subsequent updates succeed and the final hash is recorded.Mock updates:
ee/control/control_test.go
: Modified themockConsumer
to support returning an error in theUpdate
method. [1] [2]