-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use sorted slices to represent HDR histogram #52
Conversation
proto/aggregation.proto
Outdated
@@ -152,3 +139,16 @@ message Overflow { | |||
bytes overflow_service_transactions_estimator = 5; | |||
bytes overflow_spans_estimator = 6; | |||
} | |||
|
|||
message Bar { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HDRHistogramBar
?
IIANM adding this message will increase the amount of pointer indirection, and memory pressure, compared to #49.
OTOH I like the approach of keeping the histogram sorted at all times. Did you already consider and investigate doing this with the existing pair of arrays representation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you already consider and investigate doing this with the existing pair of arrays representation?
Good point, I didn't do it. I will use counts and buckets instead of Bar
for the proto representation.
OTOH I like the approach of keeping the histogram sorted at all times
Another advantage of this is that it makes the Buckets()
calculation more efficient (which is why we see the improvement in CombinedMetricsToBatch
w.r.t. time/op
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the code to remove Bar
message from proto definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, just wondering if we can further simplify the HistogramRepresentation code.
for idx := 0; iter.next(); idx++ { | ||
if bucketsSeen == h.CountsRep.Len() { | ||
break | ||
iter.nextCountAtIdx() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the iterator
type is still needed, and whether we could simplify things. Would it be possible to store the bucket value, rather than index, in CountsRep? Then Buckets
is simply iterating through and copying to slices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to store the bucket value, rather than index, in CountsRep?
Do you mean save the highestEquivalentValue
for each bucket instead of index? That value is currently calculated based on the bucket and sub bucket index and I think it would require some re-implementation of the core logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be able to use the bucketIdx
to derive the highestEquivalentValue
directly without requiring the subBucket
(which is the whole reason I kept the iterator). Giving this a shot now.
This doesn't work because we require the subBucketIdx
even in this case. I think we might be able to do some refactoring to achieve a simpler state but maybe in a future PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 let's take it to a followup
Co-authored-by: Andrew Wilkins <axwalk@gmail.com>
to.Buckets = slices.Grow(to.Buckets, requiredLen) | ||
to.Counts = slices.Grow(to.Counts, requiredLen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there's a slight issue with slices.Grow here, since slices.Grow(s, n)
ensures there's extra space for n more elements. The right call here should be to.Buckets = slices.Grow(to.Buckets, requiredLen-len(to.Buckets))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added #56
An alternative to #49. Uses sorted slices to store HDRHistograms. The sorted nature is used to efficiently perform merges.
Benchmark (thanks to #50):
More benchmarks: