-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Add mergeHistogram fast path #66
Conversation
aggregators/merger.go
Outdated
fromIdx++ | ||
var extra int | ||
toLen, fromLen := len(to.Buckets), len(from.Buckets) | ||
if fromLen < toLen { // Heuristics to trade between O(m lg n) and O(n + m). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[For reviewer] We can change this to if fromLen < toLen / 2
or any other heuristics, but I assume this is good enough to rule out the edge case where it is 100% impossible to merge without additional slots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if fromLen < toLen / 2
isn't faster in benchmarks
aggregators/merger.go
Outdated
for fromIdx := 0; fromIdx < fromLen; fromIdx++ { | ||
toIdx, found := sort.Find(toLen, func(toIdx int) int { | ||
return int(from.Buckets[fromIdx] - to.Buckets[toIdx]) | ||
}) | ||
if found { | ||
to.Counts[toIdx] += from.Counts[fromIdx] | ||
from.Counts[fromIdx] = 0 | ||
} else { | ||
extra++ | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for fromIdx := 0; fromIdx < fromLen; fromIdx++ { | |
toIdx, found := sort.Find(toLen, func(toIdx int) int { | |
return int(from.Buckets[fromIdx] - to.Buckets[toIdx]) | |
}) | |
if found { | |
to.Counts[toIdx] += from.Counts[fromIdx] | |
from.Counts[fromIdx] = 0 | |
} else { | |
extra++ | |
} | |
} | |
findIn := to | |
for fromIdx := 0; fromIdx < fromLen; fromIdx++ { | |
target := from.Buckets[fromIdx] | |
toIdx, found := sort.Find(len(findIn), func(toIdx int) int { | |
return int(target - findIn.Buckets[toIdx]) | |
}) | |
if toIdx == len(findIn) { | |
// there is no bucket in `findIn` which is less than target | |
// so there cant be any bucket in `findIn` which is more than target | |
break | |
} | |
if found { | |
findIn.Counts[toIdx] += from.Counts[fromIdx] | |
from.Counts[fromIdx] = 0 | |
} else { | |
extra++ | |
} | |
findIn = findIn[toIdx:] // next target will definitely be greater than the current one | |
} |
We can optimize the search a bit more by considering from
to be sorted. Added a sample code above but please double check as I wrote it in a hurry to suggest the idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh this is clever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I have an idea that will be applicable to more cases. Instead of searching in 0..toLen
each time, we utilize the toIdx
from the previous pass, and search in 0..toIdx+1
. Will require the this loop to be reversed for fromIdx := 0; fromIdx < fromLen; fromIdx++ {
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover, in your code we must ensure extra > 0 when we break, and we'll also have to correctly determine extra
. These shortcircuits are dangerous 😢
if toIdx == len(findIn) {
break
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the PR to limit the search space with the help of result of the previous iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, I think it might be better to just optimise the merger for exactly same buckets in to and from, if any of the bucket differs we fallback to previous logic. I think it will be simpler and give us the same benefits with slightly better performance. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if any of the bucket differs we fallback to previous logic. I think it will be simpler and give us the same benefits with slightly better performance.
Yes we can fallback to the previous logic, but we'll still have to pick between O(n+m) or O(m lg n) for counting extra buckets, so we can grow the slice in one go. Since in a lot of cases m << n, having O(m lg n) just for counting may still be faster than O(n+m). Imo the performance isn't that clear cut. Am I missing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, since the merge is O(n+m)
, then the runtime of the whole function where we do binary search without falling back immediately becomes max(O(m lg n), O(n+m))
. Yes, in theory if we fallback early, there is slightly better performance. I'll see how I can incorporate this.
Closing this in favor of #69 |
Add fast path for mergeHistogram since it is using the most time in CPU profiles. This optimizes for cases where
len(from.Bucket) << len(to.Bucket)
so that we will have O(m lg n) as opposed to the regular O(n + m), where m = len(from.Bucket) and n = len(to.Bucket). This idea is from #49 but not implemented in the merged #52.benchstat: