-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use sorted slices to represent HDR histogram #52
Changes from all commits
53351f2
e470d54
2614012
bc8cb2f
2eae439
11531ab
0a9b4b3
4c48335
cc712f7
4c9ae85
084daaa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,9 +6,9 @@ package aggregators | |
|
||
import ( | ||
"io" | ||
"sync" | ||
|
||
"github.com/axiomhq/hyperloglog" | ||
"golang.org/x/exp/slices" | ||
|
||
"github.com/elastic/apm-aggregation/aggregationpb" | ||
) | ||
|
@@ -407,39 +407,68 @@ func mergeSpanMetrics(to, from *aggregationpb.SpanMetrics) { | |
to.Sum += from.Sum | ||
} | ||
|
||
// mapPool is a pool of maps to facilitate histogram merges. | ||
var mapPool = sync.Pool{New: func() interface{} { | ||
return make(map[int32]int64) | ||
}} | ||
|
||
// mergeHistogram merges two proto representation of HDRHistogram. The | ||
// merge assumes both histograms are created with same arguments and | ||
// their representations are sorted by bucket. | ||
func mergeHistogram(to, from *aggregationpb.HDRHistogram) { | ||
// Assume both histograms are created with same arguments | ||
m := mapPool.Get().(map[int32]int64) | ||
defer mapPool.Put(m) | ||
for k := range m { | ||
delete(m, k) | ||
if len(from.Buckets) == 0 { | ||
return | ||
} | ||
|
||
for i := 0; i < len(to.Buckets); i++ { | ||
m[to.Buckets[i]] = to.Counts[i] | ||
} | ||
for i := 0; i < len(from.Buckets); i++ { | ||
m[from.Buckets[i]] += from.Counts[i] | ||
if len(to.Buckets) == 0 { | ||
to.Buckets = append(to.Buckets, from.Buckets...) | ||
to.Counts = append(to.Counts, from.Counts...) | ||
return | ||
} | ||
|
||
if cap(to.Buckets) < len(m) { | ||
to.Buckets = make([]int32, len(m)) | ||
} | ||
if cap(to.Counts) < len(m) { | ||
to.Counts = make([]int64, len(m)) | ||
requiredLen := len(to.Buckets) + len(from.Buckets) | ||
for toIdx, fromIdx := 0, 0; toIdx < len(to.Buckets) && fromIdx < len(from.Buckets); { | ||
v := to.Buckets[toIdx] - from.Buckets[fromIdx] | ||
switch { | ||
case v == 0: | ||
// For every bucket that is common, we need one less bucket in final slice | ||
requiredLen-- | ||
toIdx++ | ||
fromIdx++ | ||
case v < 0: | ||
toIdx++ | ||
case v > 0: | ||
fromIdx++ | ||
} | ||
} | ||
|
||
to.Buckets = to.Buckets[:0] | ||
to.Counts = to.Counts[:0] | ||
|
||
for b, c := range m { | ||
to.Buckets = append(to.Buckets, b) | ||
to.Counts = append(to.Counts, c) | ||
toIdx, fromIdx := len(to.Buckets)-1, len(from.Buckets)-1 | ||
to.Buckets = slices.Grow(to.Buckets, requiredLen) | ||
to.Counts = slices.Grow(to.Counts, requiredLen) | ||
Comment on lines
+441
to
+442
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe there's a slight issue with slices.Grow here, since There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added #56 |
||
to.Buckets = to.Buckets[:requiredLen] | ||
to.Counts = to.Counts[:requiredLen] | ||
for idx := len(to.Buckets) - 1; idx >= 0; idx-- { | ||
if fromIdx < 0 { | ||
break | ||
} | ||
if toIdx < 0 { | ||
to.Counts[idx] = from.Counts[fromIdx] | ||
to.Buckets[idx] = from.Buckets[fromIdx] | ||
fromIdx-- | ||
continue | ||
} | ||
v := to.Buckets[toIdx] - from.Buckets[fromIdx] | ||
switch { | ||
case v == 0: | ||
to.Counts[toIdx] += from.Counts[fromIdx] | ||
to.Counts[idx] = to.Counts[toIdx] | ||
to.Buckets[idx] = to.Buckets[toIdx] | ||
toIdx-- | ||
fromIdx-- | ||
case v > 0: | ||
to.Counts[idx] = to.Counts[toIdx] | ||
to.Buckets[idx] = to.Buckets[toIdx] | ||
toIdx-- | ||
case v < 0: | ||
to.Counts[idx] = from.Counts[fromIdx] | ||
to.Buckets[idx] = from.Buckets[fromIdx] | ||
fromIdx-- | ||
} | ||
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the
iterator
type is still needed, and whether we could simplify things. Would it be possible to store the bucket value, rather than index, in CountsRep? ThenBuckets
is simply iterating through and copying to slices.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean save the
highestEquivalentValue
for each bucket instead of index? That value is currently calculated based on the bucket and sub bucket index and I think it would require some re-implementation of the core logic.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be able to use thebucketIdx
to derive thehighestEquivalentValue
directly without requiring thesubBucket
(which is the whole reason I kept the iterator). Giving this a shot now.This doesn't work because we require the
subBucketIdx
even in this case. I think we might be able to do some refactoring to achieve a simpler state but maybe in a future PR?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 let's take it to a followup