deltatocumulativeprocessor: enforce max bucket count for exphistograms #34157

edma2 · 2024-07-18T16:57:12Z

Description:

When merging exponential histograms, maintain a max bucket count by scaling down as necessary. Without a max bucket count the number of buckets is unbounded and can cause an OOM.

Link to tracking Issue: #33277

Testing:

Unit tests and validated the fix in real-world environments.

Documentation:

Added flag to README.md.

axw · 2024-07-19T04:04:55Z

I'm probably missing some nuance, but I'm a bit surprised to see this responsibility added to deltatocumulativeprocessor. Is the problem specific to this processor? Could it also also be a problem for intervalprocessor, e.g. merging multiple sparsely populated cumulative histograms?

I suppose the idea is that we should avoid downscaling unless max buckets is reached, in which case it wouldn't help to have a prior processor in the pipeline that downscales statelessly. Maybe in the long term there should be a single stateful aggregation processor that does both delta-to-cumulative and aggregation of cumulative metrics?

jpkrohling · 2024-07-19T08:43:01Z

While the problem might not be specific to only this one here, I'd prefer to get a working complete solution here first, and then refactor and identify which parts can be reused. I understand some things of this processor were already built with this in mind, and deltafill is an execution of that too.

edma2 · 2024-08-01T22:59:30Z

@RichieSams @jpkrohling do we need additional reviews for this PR?

edma2 · 2024-08-01T23:00:44Z

cc @sh0rez

github-actions · 2024-08-16T05:20:41Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2024-08-30T05:20:44Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

sh0rez

hey! thank you for contributing and so sorry for the late review.

I generally like your approach, but found some minor items :)

edma2 · 2024-09-10T23:46:43Z

processor/deltatocumulativeprocessor/README.md

@@ -30,6 +30,9 @@ processors:
        # will be dropped
        [ max_streams: <int> | default = 0 (off) ]

+        # desired maximum number of buckets to represent in exponential histograms. 
+        # histograms will downscale as necessary to accommodate this limit
+        [ max_exponential_histogram_buckets: <int> | default = 160 ]


no longer relevant. removed configurability as suggested in another comment

sh0rez · 2024-08-20T08:54:40Z

processor/deltatocumulativeprocessor/internal/data/add.go

+		return highLow{
+			low:  0,
+			high: -1,
+		}


wouldn't the zero value be just as meaningful here, or is there a reason for specifically returning -1?

hmm, I can't remember if there was a reason but using the zero value is more idiomatic in Go so I'll use that

sh0rez · 2024-08-20T08:56:56Z

processor/deltatocumulativeprocessor/internal/data/add.go

+type highLow struct {
+	low  int32
+	high int32
+}


This appears to be always used for upper and lower bounds, so how about:

type Bounds struct { Upper int32 Lower int32 }

sh0rez · 2024-08-20T08:58:30Z

processor/deltatocumulativeprocessor/internal/data/add.go

+}
+
+// with is an accessory for Merge() to calculate ideal combined scale.
+func (h *highLow) with(o highLow) highLow {


not modifying h, no need for a pointer ;)
also the width of 2x int32 is equal to a uintptr, so literally no difference

thanks for the suggestion, fixed

sh0rez · 2024-08-20T08:59:00Z

processor/deltatocumulativeprocessor/internal/data/add.go

+}
+
+// empty indicates whether there are any values in a highLow.
+func (h *highLow) empty() bool {


again no need for a pointer receiver

sh0rez · 2024-08-20T09:00:05Z

processor/deltatocumulativeprocessor/internal/data/add.go

+
+// changeScale computes how much downscaling is needed by shifting the
+// high and low values until they are separated by no more than size.
+func changeScale(hl highLow, size int) int32 {


its not changing any scale, but returning a computed scaleChange, so lets reflect that in naming :)

Suggested change

func changeScale(hl highLow, size int) int32 {

func scaleChange(hl highLow, size int) int32 {

renamed to downscaleNeeded for clarity

sh0rez · 2024-08-30T11:57:29Z

processor/deltatocumulativeprocessor/internal/metrics/data.go

-type ExpHistogram Metric
+type ExpHistogram struct {
+	Metric
+	MaxSize int
+}


having this a struct instead of a type alias has certain consequences (no longer castable) I wish to avoid for a future refactor I've planned for this code.

I know it's hard to include configurability with the current codebase, so I suggest to leave this out for now and just always use the recommended default. would that cause major headache?

I think that should be okay if it aligns better long-term. I haven't tried anything other than the default tbh.

…elemetry#55)

…icit

edma2 · 2024-09-11T00:20:41Z

@sh0rez thanks for the review! apologies on my delayed response as well

github-actions · 2024-10-16T05:20:30Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

edma2 · 2024-10-21T21:47:34Z

@sh0rez do you mind taking another look at this PR when you have a chance, thanks!

sh0rez · 2024-11-22T15:03:22Z

@edma2 I'm reviewing again now. Sorry you had to wait so long and thanks for sticking around

sh0rez · 2024-11-22T16:11:52Z

processor/deltatocumulativeprocessor/internal/data/add.go

-	expo.Merge(dp.Positive(), in.Positive())
-	expo.Merge(dp.Negative(), in.Negative())


What's the reason to do merging before widening the zero bucket?

Afaict widening must happen first, or we might merge buckets with a different zero threshhold, which is almost certainly wrong

Hm, I don't remember any specific reason. This was probably an oversight and you are correct - we need to widen the zero first because it affects the bucket counts.

sh0rez · 2024-11-22T16:17:00Z

processor/deltatocumulativeprocessor/internal/data/add.go

+	return b == bounds{}
+}
+
+// boundsAtScale is an accessory for Add() to calculate ideal combined scale.


can we be more descriptive here? as a reader, I'm mostly interested in how this works, which is not explained.

i'd like to read something along the lines of "computes the bucket boundaries at given scale. it does so by dividing the bounds by two (>> operation) as many times as the scales differ."

sh0rez · 2024-11-22T16:22:40Z

processor/deltatocumulativeprocessor/internal/data/add.go

+		b.upper >>= 1
+		b.lower >>= 1


Suggested change

b.upper >>= 1

b.lower >>= 1

b.upper /= 2

b.lower /= 2

it's the same and reads easier if you're not super familier with bitshifts

sh0rez · 2024-11-22T16:28:28Z

processor/deltatocumulativeprocessor/internal/data/add.go

+	minScale := min(dp.Scale(), in.Scale())
+
+	// logic is adapted from lightstep's algorithm for enforcing max buckets:
+	// https://github.com/lightstep/go-expohisto/blob/4375bf4ef2858552204edb8b4572330c94a4a755/structure/exponential.go#L542
+	// first, calculate the highest and lowest indices for each bucket, given the candidate min scale.
+	// then, calculate how much downscaling is needed to fit the merged range within max bucket count.
+	// finally, perform the actual downscaling.
+	posBounds := dp.boundsAtScale(dp.Positive(), minScale)
+	posBounds = posBounds.with(in.boundsAtScale(in.Positive(), minScale))
+
+	negBounds := dp.boundsAtScale(dp.Negative(), minScale)
+	negBounds = negBounds.with(in.boundsAtScale(in.Negative(), minScale))
+
+	minScale = min(
+		minScale-downscaleNeeded(posBounds, dp.MaxSize),
+		minScale-downscaleNeeded(negBounds, dp.MaxSize),
+	)


can we move all of this logic into the expo package? This is generally useful and fits that packages purpose better.

maybe:

package expo // Limit returns the Scale a and b need to be downscaled to so that merging does // not exceed the given max bucket length func Limit(a, b DataPoint, max int) Scale {}

Sounds good.

sh0rez · 2024-11-22T16:29:37Z

processor/deltatocumulativeprocessor/internal/data/data.go

@@ -80,10 +80,11 @@ func (dp Histogram) CopyTo(dst Histogram) {

 type ExpHistogram struct {
 	expo.DataPoint
+	MaxSize int


This doubles the size of each ExpHistogram value from 8 to 16.

Given there is likely no need to configure this at runtime anyways, can we just have a constant on the expo package?

expo.Limit(dp, in, expo.DefaultLimit)

sh0rez · 2024-11-22T16:34:21Z

processor/deltatocumulativeprocessor/internal/data/expo/scale.go

-	for i := size; i < counts.Len(); i++ {
-		counts.SetAt(i, 0)
-	}
+	counts.FromRaw(counts.AsRaw()[:size])


This is a tricky one.

This change makes us reallocate all bucket counts every time we downscale, regardless of whether we are close to the limit or not.
Worse, it allocs twice because AsRaw copies, and FromRaw copies again.

Generally, we should keep allocated memory even if we have no counts in there, because those might arrive in the future. This is fine because we enfore our limit by downscaling before, so we will never exceed it.

IIRC I had to do this because we don't explicitly store the upper bound of the bucket, so we rely on the length of the slice itself. We must know the upper bound of the bucket to calculate the new scale. We could explicitly track the "real" length somewhere but it adds complexity.

FWIW I didn't see any noticeable hit to performance when this optimization was removed, so I went with the simpler approach.

sh0rez · 2024-11-22T16:36:05Z

processor/deltatocumulativeprocessor/internal/data/expo_test.go

+		name:    "maxsize/1",
+		dp:      expdp{PosNeg: bins{0, 0, 0, ø}.Into(), Count: 0},
+		in:      expdp{PosNeg: bins{ø, ø, ø, ø, 1, 2, 3, 4}.Into(), Count: 2 * (1 + 2 + 3 + 4)},
+		want:    expdp{PosNeg: bins{ø, ø, 0, 10, ø}.Into(), Scale: -3, Count: 2 * (0 + (1 + 2 + 3 + 4))},


I do not understand this test case.

The max length is 1, but want specifies 2 buckets (0, 10). why?

Good question. When downscaling, there is an edge case where we may need to extend the bucket by 1 in order to fit all the counts. So the actual bucket size might be 1 larger than max scale.

github-actions · 2024-12-07T05:20:25Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2024-12-22T05:20:16Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

edma2 requested a review from jpkrohling as a code owner July 18, 2024 16:57

edma2 requested a review from a team July 18, 2024 16:57

github-actions bot assigned songy23 Jul 18, 2024

github-actions bot added the processor/deltatocumulative label Jul 18, 2024

github-actions bot requested review from RichieSams and sh0rez July 18, 2024 16:57

RichieSams approved these changes Jul 19, 2024

View reviewed changes

github-actions bot added the Stale label Aug 16, 2024

edma2 mentioned this pull request Aug 29, 2024

deltatocumulative: Number of buckets in exponential histograms should be capped #33277

Open

github-actions bot closed this Aug 30, 2024

jpkrohling reopened this Aug 30, 2024

sh0rez suggested changes Aug 30, 2024

View reviewed changes

github-actions bot removed the Stale label Aug 31, 2024

edma2 added 4 commits September 7, 2024 15:59

deltatocumulative: enforce max bucket count for exphistograms (open-t…

fd2d9a5

…elemetry#55)

add changelog

f8c3f2f

rename parameter to max_exponential_histogram_buckets to be more expl…

3646903

…icit

add flag to README.md

6f19845

edma2 force-pushed the deltatocumulative-cap-exphisto branch from 907742c to 6f19845 Compare September 7, 2024 23:09

edma2 added 5 commits September 7, 2024 16:43

minor formatting change

4cd2e97

remove configuration logic for max buckets

0a8d2f9

addressed code review comments

e44a578

updated changelog

b4eb99c

treat zero-value as empty bounds

68718cc

Merge branch 'main' into deltatocumulative-cap-exphisto

12338d3

edma2 requested a review from a team as a code owner October 1, 2024 22:26

github-actions bot added the Stale label Oct 16, 2024

Merge branch 'main' into deltatocumulative-cap-exphisto

0909365

github-actions bot removed the Stale label Oct 19, 2024

RichieSams approved these changes Oct 21, 2024

View reviewed changes

Merge branch 'main' into deltatocumulative-cap-exphisto

bbce1fd

Merge branch 'main' into deltatocumulative-cap-exphisto

ee00b32

RichieSams approved these changes Nov 7, 2024

View reviewed changes

Merge branch 'main' into deltatocumulative-cap-exphisto

ddb3817

sh0rez suggested changes Nov 22, 2024

View reviewed changes

github-actions bot added the Stale label Dec 7, 2024

euroelessar mentioned this pull request Dec 17, 2024

[deltatocumulativeprocessor] Introduce an upper bound for exp histogram buckets #36874

Open

github-actions bot closed this Dec 22, 2024

	func changeScale(hl highLow, size int) int32 {
	func scaleChange(hl highLow, size int) int32 {

		expo.Merge(dp.Positive(), in.Positive())
		expo.Merge(dp.Negative(), in.Negative())

deltatocumulativeprocessor: enforce max bucket count for exphistograms #34157

deltatocumulativeprocessor: enforce max bucket count for exphistograms #34157

Conversation

edma2 commented Jul 18, 2024 • edited Loading

axw commented Jul 19, 2024

jpkrohling commented Jul 19, 2024

edma2 commented Aug 1, 2024

edma2 commented Aug 1, 2024

github-actions bot commented Aug 16, 2024

github-actions bot commented Aug 30, 2024

sh0rez left a comment

Choose a reason for hiding this comment

This comment was marked as resolved.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edma2 commented Sep 11, 2024

github-actions bot commented Oct 16, 2024

edma2 commented Oct 21, 2024

sh0rez commented Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edma2 Nov 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 7, 2024

github-actions bot commented Dec 22, 2024

edma2 commented Jul 18, 2024 •

edited

Loading

sh0rez commented Nov 22, 2024 •

edited

Loading

edma2 Nov 23, 2024 •

edited

Loading