Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry launching osquery instance on failure #1937

Closed
Tracked by #1827
RebeccaMahany opened this issue Nov 5, 2024 · 0 comments · Fixed by #1952
Closed
Tracked by #1827

Retry launching osquery instance on failure #1937

RebeccaMahany opened this issue Nov 5, 2024 · 0 comments · Fixed by #1952
Assignees

Comments

@RebeccaMahany
Copy link
Contributor

RebeccaMahany commented Nov 5, 2024

When the osquery runner cannot launch an osquery instance, we currently return an error, which will shut down launcher entirely.

Looking over the logs and past issues we've investigated, I see two primary errors: 1) timeout waiting for osqueryd to create socket, indicating the osquery process did not start up, and 2) could not create an extension client where the socket file does not exist or the connection is refused.

In both of these cases, restarting launcher is overkill, and even detrimental to solving the issue. In some cases, we can see these errors happen when the current osquery version is old and not compatible with the current database; restarting launcher in this case is actively harmful because it resets the autoupdate delay, preventing a newer osquery version from being downloaded.

So! We want to change the runner behavior to repeatedly retry starting osquery instances and not exit from the runner.

  1. If osquery instance launch fails, retry launching the instance -- potentially with backoff
  2. If osquery instance launch fails, also consider triggering an autoupdate check for osquery
  3. Runner should still be responsive to calls to Shutdown
@RebeccaMahany RebeccaMahany changed the title When the osquery runner cannot launch an osquery instance, do not return an error. Instead, retry launching osquery instances (maybe with a bit of backoff?). We should tackle this issue first and earlier, to handle the edge case noted in 1 above. We could also consider terminating the autoupdate delay in this case, to download a new osquery version quicker. Could also potentially tackle the item below ("Make the osquery instance status available in knapsack") at the same time, if easier. Retry launching osquery issues on failure Nov 5, 2024
@RebeccaMahany RebeccaMahany changed the title Retry launching osquery issues on failure Retry launching osquery instance on failure Nov 5, 2024
@RebeccaMahany RebeccaMahany self-assigned this Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant