-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix closed chan read #3423
Fix closed chan read #3423
Conversation
This libbpfgo issue aquasecurity/libbpfgo#122 it's actually a Tracee problem. It was caused by read attempts on closed channels, when libbpgo had already closed them.
e2e test (1042) is green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM,
If this fixes the mentioned libbpfgo issue, don't forget to close it there as well
logger.Warnw(fmt.Sprintf("Lost %d ebpf logs events", lost)) | ||
case lost, ok := <-t.lostBPFLogChannel: | ||
if !ok { // channel closed | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding - if we continue here since the channel is closed, will we ever reach the ctx.Done() case below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yanivagman Yes, since <-ctx.Done()
is another select case clause. But your question made me ponder whether we should continue (putting a warning log) or return; I said this because the channel is controlled by another party and when closed it is the end of its useful life. @josedonizetti when you pick this for review, please take that in consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geyslan Although it will get to the ctx done, if the ctx is not done and the channel is closed it will trigger a never blocking loop, because the select will always match on the closed channel, and continue, consuming CPU. When would this chan be closed? Why do we care about the closing for lostBPFLogChannel
but not for bpfLogsChannel
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it will get to the ctx done, if the ctx is not done and the channel is closed it will trigger a never blocking loop, because the select will always match on the closed channel, and continue, consuming CPU.
Indeed.
When would this chan be closed?
They're finally closed by libbpfgo PerfBuffer.Stop() which is called by PerfBuffer.Close(), which in its turn is called after tracee's ctx.Done()
:
Why do we care about the closing for lostBPFLogChannel but not for bpfLogsChannel?
lostBPFLogChannel is not buffered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josedonizetti your review opened my eyes to the correct cause of the "lost 0 events" output. Although the problem is mentioned in different goroutines, the cause is only in processLostEvents
. Please take a look at this new PR: #3438
@geyslan before merging I want to review this one, sorry I dind't have the time yet |
68f0948
to
b95f499
Compare
Closed due to #3438 |
Close: aquasecurity/libbpfgo#122
1. Explain what the PR does
68f0948 fix(ebpf): close perfbuf instead of just stop
32a028c fix(ebpf): closed channel read
68f0948 fix(ebpf): close perfbuf instead of just stop
32a028c fix(ebpf): closed channel read
2. Explain how to test it
Run tracee and kill it via
^C
.One must want to put a print call after the
!ok
check:3. Other comments