[prometheusremotewriteexporter] reduce allocations in createAttributes #35184

edma2 · 2024-09-13T23:37:11Z

Description:

While profiling the collector, we found that the createAttributes function was responsible for a significant chunk of allocations (30-40%) which was leading to a high CPU usage spent in GC.

createAttributes is responsible for converting attributes of a given data point to Prometheus labels. For simplicity, it allocates a new labels slice for every data point. We found that reducing allocations here significantly reduced GC time in our environment (in some deployments as much as ~50%).

The strategy in this PR is to reuse the slice array as much as possible. The backing array will automatically resize as needed (batching with a batch processor will effectively set an upper bound). Note: we don't need to synchronize access to this (e.g. sync.Pool) since the exporter is configured with 1 consumer.

Link to tracking Issue:

Testing:

Modified unit tests and ran benchmarks locally.
Works in our production environment.

benchstat output

cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
                    │ /tmp/old.txt  │            /tmp/new.txt             │
                    │    sec/op     │   sec/op     vs base                │
CreateAttributes-16   1010.0n ± 25%   804.0n ± 7%  -20.40% (p=0.000 n=10)

                    │ /tmp/old.txt │            /tmp/new.txt            │
                    │     B/op     │    B/op     vs base                │
CreateAttributes-16    371.00 ± 0%   91.00 ± 0%  -75.47% (p=0.000 n=10)

                    │ /tmp/old.txt │            /tmp/new.txt            │
                    │  allocs/op   │ allocs/op   vs base                │
CreateAttributes-16     7.000 ± 0%   5.000 ± 0%  -28.57% (p=0.000 n=10)

Documentation:

linux-foundation-easycla · 2024-09-13T23:37:16Z

The committers listed above are authorized under a signed CLA.

✅ login: edma2 / name: Eugene Ma (9c0b9b9, 3333130, 315b11d, c570b75, 863436d, 16f3fbd, 62cfc35, b4b80d4, 2f5c758, 928529a, f184393, cf66976, f13e02a, e7ce10e, 31186a8, 185c721, f276037, dbb42e9, 8108fab, e407444)

dashpole · 2024-09-16T14:18:19Z

cc @jmichalek132 @ArthurSens

pkg/translator/prometheusremotewrite/metrics_to_prw.go

dashpole · 2024-09-16T14:23:56Z

pkg/translator/prometheusremotewrite/metrics_to_prw.go

+// best to keep it around for the lifetime of the Go process. Due to this shared
+// state, PrometheusConverter is NOT thread-safe and is only intended to be used by
+// a single go-routine at a time.
+// Each FromMetrics call should be followed by a Reset when the metrics can be safely


Should we emit a warning log or something if someone calls FromMetrics without Resetting?

Moved the reset to always be called inside FromMetrics so this is no longer a user concern.

Now that the user doesn't need to call reset, should we remove this part of the comment?
// Each FromMetrics call should be followed by a Reset.....

pkg/translator/prometheusremotewrite/metrics_to_prw.go

ArthurSens · 2024-09-17T05:37:05Z

Note: we don't need to synchronize access to this (e.g. sync.Pool) since the exporter is configured with 1 consumer.

We don't plan to keep this forever, right? Ideally we'll be able to shard this to improve throughput, we're just hardcoding this to 1 because OTel's exporter helper doesn't ensure ordering.

On the other hand, I agree that we shouldn't block optimizations based on something we want to do in the future 😬. @edma2, knowing that we'll eventually shard the output, any suggestions on how to do this without sacrificing your optimization?

jmichalek132 · 2024-09-24T21:32:56Z

Note: we don't need to synchronize access to this (e.g. sync.Pool) since the exporter is configured with 1 consumer.

We don't plan to keep this forever, right? Ideally we'll be able to shard this to improve throughput, we're just hardcoding this to 1 because OTel's exporter helper doesn't ensure ordering.

On the other hand, I agree that we shouldn't block optimizations based on something we want to do in the future 😬. @edma2, knowing that we'll eventually shard the output, any suggestions on how to do this without sacrificing your optimization?

I wonder also since you can have multiple pipelines with multiple remote write exporters (i.e. sending data from dev cluster to 2 destination, dev and prod) if that would break this too.

edma2 · 2024-10-02T00:56:46Z

On the other hand, I agree that we shouldn't block optimizations based on something we want to do in the future 😬. @edma2, knowing that we'll eventually shard the output, any suggestions on how to do this without sacrificing your optimization?

@ArthurSens my initial thought here is maybe wrap things in a sync.Pool but I'm not sure how that might affect performance. I'm kinda leaning to making that a future change but I can think about it more.

I wonder also since you can have multiple pipelines with multiple remote write exporters (i.e. sending data from dev cluster to 2 destination, dev and prod) if that would break this too.

@jmichalek132 Each exporter would have its own instance of PrometheusConverter so I think it would be ok.

github-actions · 2024-10-16T05:20:44Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2024-10-31T05:20:42Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

edma2 · 2024-11-01T00:40:30Z

@ArthurSens @dashpole @jmichalek132 I addressed comments and also changed the implementation so it's now in a sync.Pool. This now supports concurrent access from the exporter class in case it ever supports more than 1 worker at a time. Please take a look!

ArthurSens · 2024-11-01T12:09:46Z

Awesome edma! I'm struggling a bit to find time to review this one, just wanted to let you know that this is on my list :)

open-telemetry#57) createAttributes was allocating a new label slice for every series, which generates mucho garbage (~30-40% of all allocations). Keep around a re-usable underlying array of labels to reduce allocations on the hot path.

github-actions · 2025-04-01T05:20:31Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2025-04-16T05:20:30Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

FromMetrics no longer needed.

edma2 · 2025-04-23T18:05:24Z

@atoulme I've updated my branch and merged with main but I'm not seeing my changes reflected here. Can you please re-open this PR?

ArthurSens · 2025-04-25T18:16:36Z

Hi @edma2 , thanks for continuously coming back to this work! I've been struggling to find time to review this PR again, I just wanted to let you know that I'm aware of its existence and that I'm trying to allocate time for the review!

atoulme · 2025-05-09T18:16:39Z

I see one more conflict on the code - please resolve and mark ready for review again.

ArthurSens · 2025-05-10T12:24:57Z

@edma2, to be fully transparent and respectful of your time, let's put the work here on hold for a while. The optimization you're doing is excellent, but it's also not easy to understand without paying a decent amount of attention.

The team's priority right now is adhering to the OTel->Prometheus specification and implementing version 2 of the Remote Write Protocol, which is creating many conflicts in your PR.

I'm a bit concerned that what I'm saying will put you in a bad mood, and I apologize for that, but I think it would be worse if I don't say anything and let you keep resolving Merge conflicts every single week 😕

edma2 · 2025-05-12T16:28:57Z

@ArthurSens no problem, I totally understand if there are higher priorities right now. Thanks for giving me a heads up. Do you know when would be a good time to revisit the PR? Also, if splitting the PR into smaller pieces would make reviewing it easier, I can do that.

ArthurSens · 2025-05-12T21:22:37Z

I'd say that after we solve #33661, it should be a good time to get back to this. We shouldn't see multiple merge conflicts after that

github-actions · 2025-05-27T05:20:35Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2025-06-11T05:20:32Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

edma2 requested a review from dashpole as a code owner September 13, 2024 23:37

edma2 requested a review from a team September 13, 2024 23:37

github-actions bot assigned songy23 Sep 13, 2024

github-actions bot added exporter/prometheusremotewrite pkg/translator/prometheus labels Sep 13, 2024

github-actions bot requested review from Aneurysm9, bertysentry and rapphil September 13, 2024 23:37

edma2 force-pushed the prometheusremotewrite-optimize-createattributes branch from 6a5d2ca to c4d233a Compare September 13, 2024 23:40

dashpole reviewed Sep 16, 2024

View reviewed changes

edma2 requested a review from a team as a code owner October 1, 2024 22:28

edma2 requested a review from andrzej-stencel October 1, 2024 22:28

github-actions bot added the Stale label Oct 16, 2024

andrzej-stencel requested review from andrzej-stencel and removed request for andrzej-stencel October 16, 2024 09:49

dashpole removed the Stale label Oct 16, 2024

github-actions bot added the Stale label Oct 31, 2024

edma2 force-pushed the prometheusremotewrite-optimize-createattributes branch 2 times, most recently from b337721 to 6f941e3 Compare November 1, 2024 05:10

github-actions bot removed the Stale label Nov 1, 2024

atoulme marked this pull request as draft March 17, 2025 22:52

github-actions bot added the Stale label Apr 1, 2025

github-actions bot closed this Apr 16, 2025

edma2 added 3 commits April 23, 2025 10:51

Simplify PrometheusConverter interface

e7ce10e

FromMetrics no longer needed.

Merge branch 'main' into prometheusremotewrite-optimize-createattributes

185c721

update wal_test.go

cf66976

andrzej-stencel reopened this Apr 24, 2025

github-actions bot added the pkg/translator/prometheusremotewrite label Apr 24, 2025

Merge branch 'main' into prometheusremotewrite-optimize-createattributes

dbb42e9

edma2 marked this pull request as ready for review April 24, 2025 16:16

github-actions bot assigned MovieStoreGuy Apr 24, 2025

github-actions bot removed the Stale label Apr 25, 2025

atoulme marked this pull request as draft May 9, 2025 18:16

Merge branch 'main' into prometheusremotewrite-optimize-createattributes

8108fab

github-actions bot requested a review from ywwg May 10, 2025 00:00

add back newline

f276037

edma2 marked this pull request as ready for review May 10, 2025 00:18

github-actions bot assigned mx-psi May 10, 2025

github-actions bot added the Stale label May 27, 2025

github-actions bot closed this Jun 11, 2025

[prometheusremotewriteexporter] reduce allocations in createAttributes #35184

[prometheusremotewriteexporter] reduce allocations in createAttributes #35184

Uh oh!

Conversation

edma2 commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linux-foundation-easycla bot commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dashpole commented Sep 16, 2024

Uh oh!

Uh oh!

dashpole Sep 16, 2024

Choose a reason for hiding this comment

Uh oh!

edma2 Nov 1, 2024

Choose a reason for hiding this comment

Uh oh!

ArthurSens Nov 5, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurSens commented Sep 17, 2024

Uh oh!

jmichalek132 commented Sep 24, 2024

Uh oh!

edma2 commented Oct 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 16, 2024

Uh oh!

github-actions bot commented Oct 31, 2024

Uh oh!

edma2 commented Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurSens commented Nov 1, 2024

Uh oh!

github-actions bot commented Apr 1, 2025

Uh oh!

github-actions bot commented Apr 16, 2025

Uh oh!

edma2 commented Apr 23, 2025

Uh oh!

ArthurSens commented Apr 25, 2025

Uh oh!

atoulme commented May 9, 2025

Uh oh!

ArthurSens commented May 10, 2025

Uh oh!

edma2 commented May 12, 2025

Uh oh!

ArthurSens commented May 12, 2025

Uh oh!

github-actions bot commented May 27, 2025

Uh oh!

github-actions bot commented Jun 11, 2025

Uh oh!

Uh oh!

edma2 commented Sep 13, 2024 •

edited

Loading

linux-foundation-easycla bot commented Sep 13, 2024 •

edited

Loading

edma2 commented Oct 2, 2024 •

edited

Loading

edma2 commented Nov 1, 2024 •

edited

Loading