-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add node timeout #109
Conversation
Signed-off-by: Jack Yu <jack.yu@suse.com>
@@ -308,7 +332,7 @@ func (m *SupportBundleManager) completeNode(node string) { | |||
if len(m.expectedNodes) == 0 { | |||
if !m.done { | |||
logrus.Debugf("All nodes are completed") | |||
m.ch <- struct{}{} | |||
close(m.ch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could just close it because there is no other one to write it.
Signed-off-by: Jack Yu <jack.yu@suse.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. thanks.
Signed-off-by: Jack Yu <jack.yu@suse.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you.
Problem
Sometime nodes spend too much time collecting logs, even forever. It makes users can't download support bundle kit.
Solution
Add collecting node timeout. When collecting node reach timeout, it will skip it instead of stuck. Then users are able download support bundle without waiting. But, in the situation, there is no node's logs in support bundle file.
For example, if A node is finished before timeout, but B node isn't finished. There is only A node's logs in support bundle file. So, we're still able to check something.
Related Issue
harvester/harvester#1646
Test
This is test case when collecting node reach out the timeout.
It's hard to simulate the node stuck for 30 minutes, so I suggest following steps to test:
TODO
If we're okay with this default timeout, these following features can be postponed.