Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-10201] Reduce allocation profiling overhead by replacing tracepoint with lower-level API #3805

Merged
merged 1 commit into from
Jul 24, 2024

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Jul 24, 2024

What does this PR do?

This PR reduces the allocation profiling overhead by replacing the Ruby tracepoint API with the lower-level rb_add_event_hook2 API.

The key insight here is that while benchmarking allocation profiling and looking at what the VM was doing, I discovered that tracepoints are just a thin user-friendlier wrapper around the lower-level API.

The lower level API is publicly-available (in "debug.h") but it's listed as "undocumented advanced tracing APIs".

Motivation:

As we're trying to squeeze every bit of performance from the allocation profiling hot-path, it makes sense to make use of the lower-level API.

Additional Notes:

I'm considering experimenting with moving the tracepoint we use for GC profiling to this lower-level API as well, since that's another performance-sensitive code path.

How to test the change?

Functionality-wise, nothing changes, so existing test coverage is enough (and shows this alternative is working correctly).

Here's some benchmarking numbers from
benchmarks/profiler_allocation.rb:

ruby 2.7.7p221 (2022-11-24 revision 168ec2b1e5) [x86_64-linux]
Warming up --------------------------------------
Allocations (baseline)   1.565M i/100ms
Calculating -------------------------------------
Allocations (baseline)   15.263M (± 1.4%) i/s -    153.400M in  10.052624s

Warming up --------------------------------------
Allocations (event_hook) 1.240M i/100ms
Calculating -------------------------------------
Allocations (event_hook) 12.571M (± 2.1%) i/s -    126.456M in  10.064297s

Warming up --------------------------------------
Allocations (tracepoint) 1.183M i/100ms
Calculating -------------------------------------
Allocations (tracepoint) 12.225M (± 0.5%) i/s -    123.072M in  10.067487s

Comparison:
Allocations (baseline): 15262756.4 i/s
Allocations (event_hook): 12570772.3 i/s - 1.21x  slower
Allocations (tracepoint): 12225052.0 i/s - 1.25x  slower

Here, event_hook is with the optimization, whereas tracepoint is without it.

I am aware these numbers are close to the margin of error. I re-ran my benchmarks a number of times and consistently observed the event_hook version coming out ahead of the tracecpoint version, even if by little.

…oint with lower-level API

**What does this PR do?**

This PR reduces the allocation profiling overhead by replacing the
Ruby tracepoint API with the lower-level `rb_add_event_hook2` API.

The key insight here is that while benchmarking allocation profiling and
looking at what the VM was doing, I discovered that tracepoints are just
a thin user-friendlier wrapper around the lower-level API.

The lower level API is publicly-available (in "debug.h") but it's listed
as "undocumented advanced tracing APIs".

**Motivation:**

As we're trying to squeeze every bit of performance from the allocation
profiling hot-path, it makes sense to make use of the lower-level API.

**Additional Notes:**

I'm considering experimenting with moving the tracepoint we use for
GC profiling to this lower-level API as well, since that's another
performance-sensitive code path.

**How to test the change?**

Functionality-wise, nothing changes, so existing test coverage is enough
(and shows this alternative is working correctly).

Here's some benchmarking numbers from
`benchmarks/profiler_allocation.rb`:

```
ruby 2.7.7p221 (2022-11-24 revision 168ec2b1e5) [x86_64-linux]
Warming up --------------------------------------
Allocations (baseline)   1.565M i/100ms
Calculating -------------------------------------
Allocations (baseline)   15.263M (± 1.4%) i/s -    153.400M in  10.052624s

Warming up --------------------------------------
Allocations (event_hook) 1.240M i/100ms
Calculating -------------------------------------
Allocations (event_hook) 12.571M (± 2.1%) i/s -    126.456M in  10.064297s

Warming up --------------------------------------
Allocations (tracepoint) 1.183M i/100ms
Calculating -------------------------------------
Allocations (tracepoint) 12.225M (± 0.5%) i/s -    123.072M in  10.067487s

Comparison:
Allocations (baseline): 15262756.4 i/s
Allocations (event_hook): 12570772.3 i/s - 1.21x  slower
Allocations (tracepoint): 12225052.0 i/s - 1.25x  slower
```

Here, `event_hook` is with the optimization, whereas `tracepoint` is
without it.

I am aware these numbers are close to the margin of error. I re-ran my
benchmarks a number of times and consistently observed the event_hook
version coming out ahead of the tracecpoint version, even if by little.
@ivoanjo ivoanjo requested a review from a team as a code owner July 24, 2024 10:05
@github-actions github-actions bot added the profiling Involves Datadog profiling label Jul 24, 2024
Copy link
Member

@anmarchenko anmarchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will make a note to implement the same optimisation in datadog-ci

@pr-commenter
Copy link

pr-commenter bot commented Jul 24, 2024

Benchmarks

Benchmark execution time: 2024-07-24 10:15:09

Comparing candidate commit 9431929 in PR branch ivoanjo/prof-10201-reduce-alloc-tracepoint-overhead with baseline commit 9a7002c in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 2 unstable metrics.

@ivoanjo
Copy link
Member Author

ivoanjo commented Jul 24, 2024

I saw a new flaky profiler spec: https://app.circleci.com/pipelines/github/DataDog/dd-trace-rb/15605/workflows/5b32e4bb-155e-4a94-bf8b-980599eaafae/jobs/568892 . I've looked into it and it's related to #3792 and not this PR so I'll open a separate PR to tackle it.

@ivoanjo ivoanjo merged commit 2f591a1 into master Jul 24, 2024
171 checks passed
@ivoanjo ivoanjo deleted the ivoanjo/prof-10201-reduce-alloc-tracepoint-overhead branch July 24, 2024 11:06
@github-actions github-actions bot added this to the 2.3.0 milestone Jul 24, 2024
@TonyCTHsu TonyCTHsu mentioned this pull request Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants