Skip to content

Conversation

@liuchangyan
Copy link

Currently, the perf module has a requirement to retrieve all available data from multiple kernel perf reader ring buffers in a single operation. Therefore, we would like to extend the functionality to support this.
add functions:

// ReadAllRings iterates through all ready rings and reads events,
// similar to reader_event_read.
ReadAllRings()
// EpollWait wraps the epoll waiting logic and allows specifying the timeout as needed.
EpollWait(d time.Duration)

@florianl
Copy link
Contributor

Thanks for your contribution. Some questions:

// ReadAllRings iterates through all ready rings and reads events,
// similar to reader_event_read.
ReadAllRings()

Can you elaborate why using (*Reader) Read() in a loop does not work for you.

// EpollWait wraps the epoll waiting logic and allows specifying the timeout as needed.
EpollWait(d time.Duration)

Can you also elaborate on this suggestion? Can you maybe share how you intend to use the perf buffer? and why the current API limits you?

@liuchangyan
Copy link
Author

Thanks for your contribution. Some questions:

// ReadAllRings iterates through all ready rings and reads events,
// similar to reader_event_read.
ReadAllRings()

Can you elaborate why using (*Reader) Read() in a loop does not work for you.

// EpollWait wraps the epoll waiting logic and allows specifying the timeout as needed.
EpollWait(d time.Duration)

Can you also elaborate on this suggestion? Can you maybe share how you intend to use the perf buffer? and why the current API limits you?

I need an externally controlled batch-processing model:
call EpollWait(d) with my own timeout, then read all ready rings at once and process them in batches. This reduces system calls, minimizes wakeups, and lets me fully control the scheduling logic.

ReadInto, however, is an internally driven, record-by-record model. It automatically performs Wait, manages its own state machine (pendingErr, epollRings), and prevents me from controlling the waiting strategy or batch-processing flow. Therefore, it doesn’t fit my high-throughput continuous sampling scenario.

If there’s anything wrong with the way I implemented this, please feel free to let me know at any time:)

@ti-mo
Copy link
Collaborator

ti-mo commented Dec 1, 2025

@liuchangyan Have you considered using a bpf ringbuf instead of a perf event map? It's much less complicated to consume from user space since there's only 1 ring, which also makes it scale better on nodes with large amounts of CPUs.

It has a built-in wakeup scheduler/coalescing algorithm that should fit 99% of use cases, and should address your batch processing concern.

Also, if you use bpf_ringbuf_reserve + bpf_ringbuf_commit, there's less copying needed on the bpf side and less stack pressure, which should all improve the performance and reduce CPU usage of your program.

@liuchangyan
Copy link
Author

bpf_ringbuf_reserve

This feature is only available on Linux 5.x kernels, while most of our current use cases are on Linux 4.x kernels.
cc @florianl

@lmb
Copy link
Collaborator

lmb commented Dec 5, 2025

If I understand correctly the library already does the things you want.

  • SetDeadline allows you to control the wait duration.
  • ReadInto already opportunistically polls all rings:

    ebpf/perf/reader.go

    Lines 380 to 385 in f150ced

    // Waking up userspace is expensive, make the most of it by checking
    // all rings.
    for _, ring := range pr.rings {
    ring.loadHead()
    pr.epollRings = append(pr.epollRings, ring)
    }

Please note that 4.x series kernels are not supported by this library anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants