Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ActionQueue retry on failure #1944

Merged
merged 6 commits into from
Nov 8, 2024
Merged

ActionQueue retry on failure #1944

merged 6 commits into from
Nov 8, 2024

Conversation

cesarfda
Copy link
Contributor

@cesarfda cesarfda commented Nov 7, 2024

Resolves #1708

We are implementing a retry pattern by returning an error from func (aq *ActionQueue) Update(data io.Reader) error if any action fails to process. Then the updated data for that subsystem shouldn't be stored, so Update would be called again in 1 minute by the control system, effectively handling the retry for us.

This pull request introduces error handling improvements and additional test cases for the ActionQueue and ControlService components. The main changes include adding error handling for action processing failures, updating test cases to handle these errors, and adding new test cases for retry logic in the ControlService.

Error handling improvements:

Updates to existing tests:

New test cases:

  • ee/control/control_test.go: Added TestControlServiceUpdateErr to verify that the ControlService handles update errors correctly and records the hash despite the error.
  • ee/control/control_test.go: Added TestControlServiceRetryAfterUpdateErr to test the retry logic after an update error, ensuring that subsequent updates succeed and the final hash is recorded.

Mock updates:

// Verify error consumer was called
assert.Equal(t, 1, errConsumer.updates)

// Verify hash was still recorded despite error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, maybe I'm missing something -- is this the desired behavior? I thought we didn't want to store/record the hash, so that we don't have to wait for a change in configuration (i.e. a new hash) to perform the retry. (I think that'd look like changing the behavior around here: https://github.com/kolide/launcher/blob/main/ee/control/control.go#L338-L346.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't fully understanding that part but I think I get it now, the hash not being saved would be what will retrigger the fetch and update. Let me adjust that

Copy link
Contributor

@RebeccaMahany RebeccaMahany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@cesarfda cesarfda added this pull request to the merge queue Nov 8, 2024
Merged via the queue into kolide:main with commit 8f090c9 Nov 8, 2024
29 checks passed
@cesarfda cesarfda deleted the actionqueue branch November 8, 2024 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

actionqueue should perform retries for failed actions without being prompted by control data changes
3 participants