Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix closed chan read #3423

Closed
wants to merge 2 commits into from
Closed

Conversation

geyslan
Copy link
Member

@geyslan geyslan commented Aug 29, 2023

Close: aquasecurity/libbpfgo#122

1. Explain what the PR does

68f0948 fix(ebpf): close perfbuf instead of just stop
32a028c fix(ebpf): closed channel read

68f0948 fix(ebpf): close perfbuf instead of just stop

This was an old TODO comment removed by https://github.com/aquasecurity/tracee/pull/296/files#diff-82aa2eb77c3ce782b8ebfb70eaba8a1a5a8a2dce9f3349352a3a3c0f909478f1L604

32a028c fix(ebpf): closed channel read

This libbpfgo issue https://github.com/aquasecurity/libbpfgo/issues/122
it's actually a Tracee problem. It was caused by read attempts on closed
channels, when libbpgo had already closed them.

2. Explain how to test it

Run tracee and kill it via ^C.

One must want to put a print call after the !ok check:

func (t *Tracee) processLostEvents() {
	logger.Debugw("Starting processLostEvents goroutine")
	defer logger.Debugw("Stopped processLostEvents goroutine")

	for {
		select {
		case lost, ok := <-t.lostEvChannel:
			if !ok { // channel closed
				return
			}

			// Won't be printed as before.
			fmt.Printf("Lost %d events\n", lost)

			if err := t.stats.LostEvCount.Increment(lost); err != nil {
				logger.Errorw("Incrementing lost event count", "error", err)
			}
			logger.Warnw(fmt.Sprintf("Lost %d events", lost))

		// Since this is an end-state goroutine, it should be terminated only
		// when Tracee done channel is closed.
		case <-t.done:
			return
		}
	}
}

3. Other comments

This libbpfgo issue aquasecurity/libbpfgo#122
it's actually a Tracee problem. It was caused by read attempts on closed
channels, when libbpgo had already closed them.
@geyslan
Copy link
Member Author

geyslan commented Sep 1, 2023

e2e test (1042) is green.

Copy link
Collaborator

@yanivagman yanivagman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,
If this fixes the mentioned libbpfgo issue, don't forget to close it there as well

logger.Warnw(fmt.Sprintf("Lost %d ebpf logs events", lost))
case lost, ok := <-t.lostBPFLogChannel:
if !ok { // channel closed
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding - if we continue here since the channel is closed, will we ever reach the ctx.Done() case below?

Copy link
Member Author

@geyslan geyslan Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanivagman Yes, since <-ctx.Done() is another select case clause. But your question made me ponder whether we should continue (putting a warning log) or return; I said this because the channel is controlled by another party and when closed it is the end of its useful life. @josedonizetti when you pick this for review, please take that in consideration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geyslan Although it will get to the ctx done, if the ctx is not done and the channel is closed it will trigger a never blocking loop, because the select will always match on the closed channel, and continue, consuming CPU. When would this chan be closed? Why do we care about the closing for lostBPFLogChannel but not for bpfLogsChannel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it will get to the ctx done, if the ctx is not done and the channel is closed it will trigger a never blocking loop, because the select will always match on the closed channel, and continue, consuming CPU.

Indeed.

When would this chan be closed?

They're finally closed by libbpfgo PerfBuffer.Stop() which is called by PerfBuffer.Close(), which in its turn is called after tracee's ctx.Done():

Screenshot from 2023-09-05 08-06-00

Why do we care about the closing for lostBPFLogChannel but not for bpfLogsChannel?

lostBPFLogChannel is not buffered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josedonizetti your review opened my eyes to the correct cause of the "lost 0 events" output. Although the problem is mentioned in different goroutines, the cause is only in processLostEvents. Please take a look at this new PR: #3438

pkg/ebpf/tracee.go Outdated Show resolved Hide resolved
@josedonizetti
Copy link
Contributor

@geyslan before merging I want to review this one, sorry I dind't have the time yet

@geyslan
Copy link
Member Author

geyslan commented Sep 11, 2023

Closed due to #3438

@geyslan geyslan closed this Sep 11, 2023
@geyslan geyslan deleted the fix-closed-chan-read branch September 18, 2023 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zero lost events counter on callback
3 participants