Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime.update can block all checkins and state processing while waiting for shutdown #3738

Closed
leehinman opened this issue Nov 10, 2023 · 1 comment · Fixed by #3747
Closed
Assignees
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team

Comments

@leehinman
Copy link
Contributor

leehinman commented Nov 10, 2023

update can block because it waits for waitForStopped to finish. waitForStopped can take up to 15 seconds. Because this function is called by the Coordinator, the entire Coordinator can block for that time.

This block causes problems because the runtime Manager update channel needs to write to coordinator.watchRuntimeComponets. If it cannot do that it will block. This in turn blocks runtime.stateChange which prevents runtime.runLoop from processing checkins and state changes, leading to stalled communication and "missed checkin" errors for components that are still sending checkins.

This can lead to the bugs seen in #3617 and #3654

@leehinman leehinman added the bug Something isn't working label Nov 10, 2023
@leehinman leehinman added the Team:Elastic-Agent Label for the Agent team label Nov 10, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants