Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Add mergeHistogram fast path #66

Closed
wants to merge 9 commits into from
Closed
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 42 additions & 18 deletions aggregators/merger.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ package aggregators

import (
"io"
"sort"

"github.com/axiomhq/hyperloglog"
"github.com/cespare/xxhash/v2"
Expand Down Expand Up @@ -384,6 +385,7 @@ func mergeSpanMetrics(to, from *aggregationpb.SpanMetrics) {
// mergeHistogram merges two proto representation of HDRHistogram. The
// merge assumes both histograms are created with same arguments and
// their representations are sorted by bucket.
// Caution: this function may mutate from.Count.
func mergeHistogram(to, from *aggregationpb.HDRHistogram) {
if len(from.Buckets) == 0 {
return
Expand All @@ -395,27 +397,49 @@ func mergeHistogram(to, from *aggregationpb.HDRHistogram) {
return
}

requiredLen := len(to.Buckets) + len(from.Buckets)
for toIdx, fromIdx := 0, 0; toIdx < len(to.Buckets) && fromIdx < len(from.Buckets); {
v := to.Buckets[toIdx] - from.Buckets[fromIdx]
switch {
case v == 0:
// For every bucket that is common, we need one less bucket in final slice
requiredLen--
toIdx++
fromIdx++
case v < 0:
toIdx++
case v > 0:
fromIdx++
var extra int
toLen, fromLen := len(to.Buckets), len(from.Buckets)
if fromLen < toLen { // Heuristics to trade between O(m lg n) and O(n + m).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[For reviewer] We can change this to if fromLen < toLen / 2 or any other heuristics, but I assume this is good enough to rule out the edge case where it is 100% impossible to merge without additional slots.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if fromLen < toLen / 2 isn't faster in benchmarks

// Fast path to optimize for cases where len(from.Buckets) << len(to.Buckets)
// Binary search for all from.Buckets in to.Buckets for fewer comparisons,
// mergeHistogram will be O(m lg n) where m = fromLen and n = toLen.
for fromIdx := 0; fromIdx < fromLen; fromIdx++ {
toIdx, found := sort.Find(toLen, func(toIdx int) int {
return int(from.Buckets[fromIdx] - to.Buckets[toIdx])
})
if found {
to.Counts[toIdx] += from.Counts[fromIdx]
from.Counts[fromIdx] = 0
} else {
extra++
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for fromIdx := 0; fromIdx < fromLen; fromIdx++ {
toIdx, found := sort.Find(toLen, func(toIdx int) int {
return int(from.Buckets[fromIdx] - to.Buckets[toIdx])
})
if found {
to.Counts[toIdx] += from.Counts[fromIdx]
from.Counts[fromIdx] = 0
} else {
extra++
}
}
findIn := to
for fromIdx := 0; fromIdx < fromLen; fromIdx++ {
target := from.Buckets[fromIdx]
toIdx, found := sort.Find(len(findIn), func(toIdx int) int {
return int(target - findIn.Buckets[toIdx])
})
if toIdx == len(findIn) {
// there is no bucket in `findIn` which is less than target
// so there cant be any bucket in `findIn` which is more than target
break
}
if found {
findIn.Counts[toIdx] += from.Counts[fromIdx]
from.Counts[fromIdx] = 0
} else {
extra++
}
findIn = findIn[toIdx:] // next target will definitely be greater than the current one
}

We can optimize the search a bit more by considering from to be sorted. Added a sample code above but please double check as I wrote it in a hurry to suggest the idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this is clever

Copy link
Member Author

@carsonip carsonip Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I have an idea that will be applicable to more cases. Instead of searching in 0..toLen each time, we utilize the toIdx from the previous pass, and search in 0..toIdx+1. Will require the this loop to be reversed for fromIdx := 0; fromIdx < fromLen; fromIdx++ {.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, in your code we must ensure extra > 0 when we break, and we'll also have to correctly determine extra. These shortcircuits are dangerous 😢

			if toIdx == len(findIn) {
				break
			}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR to limit the search space with the help of result of the previous iteration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, I think it might be better to just optimise the merger for exactly same buckets in to and from, if any of the bucket differs we fallback to previous logic. I think it will be simpler and give us the same benefits with slightly better performance. WDYT?

Copy link
Member Author

@carsonip carsonip Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if any of the bucket differs we fallback to previous logic. I think it will be simpler and give us the same benefits with slightly better performance.

Yes we can fallback to the previous logic, but we'll still have to pick between O(n+m) or O(m lg n) for counting extra buckets, so we can grow the slice in one go. Since in a lot of cases m << n, having O(m lg n) just for counting may still be faster than O(n+m). Imo the performance isn't that clear cut. Am I missing anything?

Copy link
Member Author

@carsonip carsonip Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, since the merge is O(n+m), then the runtime of the whole function where we do binary search without falling back immediately becomes max(O(m lg n), O(n+m)). Yes, in theory if we fallback early, there is slightly better performance. I'll see how I can incorporate this.

if extra == 0 {
return
}
} else {
// Slow path with runtime O(n + m).
requiredLen := toLen + fromLen
for toIdx, fromIdx := 0, 0; toIdx < toLen && fromIdx < fromLen; {
v := to.Buckets[toIdx] - from.Buckets[fromIdx]
switch {
case v == 0:
// For every bucket that is common, we need one less bucket in final slice
requiredLen--
toIdx++
fromIdx++
case v < 0:
toIdx++
case v > 0:
fromIdx++
}
}
extra = requiredLen - toLen
}

toIdx, fromIdx := len(to.Buckets)-1, len(from.Buckets)-1
to.Buckets = slices.Grow(to.Buckets, requiredLen-len(to.Buckets))
to.Counts = slices.Grow(to.Counts, requiredLen-len(to.Counts))
to.Buckets = to.Buckets[:requiredLen]
to.Counts = to.Counts[:requiredLen]
toIdx, fromIdx := toLen-1, fromLen-1
to.Buckets = slices.Grow(to.Buckets, extra)[:toLen+extra]
to.Counts = slices.Grow(to.Counts, extra)[:toLen+extra]
for idx := len(to.Buckets) - 1; idx >= 0; idx-- {
if fromIdx < 0 {
break
Expand Down