-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use proto models directly for AggregateBatch #28
Conversation
8e09451
to
6c6bfd7
Compare
|
||
setCombinedMetrics := func(k CombinedMetricsKey, sim ServiceInstanceMetrics) { | ||
cm, ok := kvs[k] | ||
// m collects service instance metrics for each partition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked at writing every metric as a separate merge operation, and deferring all merging to pebble? I don't like that we have merging in two places. I think this should be possible by first calculating how many metrics we may write, to figure out how to distribute the event count, and then assigning it to each record as we go. But I don't know about the performance difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, indexing one by one gives the following benchmark diff with the current implementation (new refers to writing every metric as a separate merge operation):
name old time/op new time/op delta
AggregateCombinedMetrics-10 3.60µs ± 4% 3.56µs ± 2% ~ (p=0.099 n=10+9)
AggregateBatchSerial-10 9.13µs ± 2% 12.48µs ± 2% +36.68% (p=0.000 n=9+10)
AggregateBatchParallel-10 9.30µs ± 2% 12.71µs ± 3% +36.71% (p=0.000 n=9+10)
CombinedMetricsEncoding-10 3.71µs ± 1% 3.67µs ± 1% -1.01% (p=0.000 n=10+10)
CombinedMetricsDecoding-10 4.60µs ± 1% 4.65µs ± 1% +1.27% (p=0.007 n=10+10)
CombinedMetricsToBatch-10 372µs ± 1% 364µs ± 1% -2.32% (p=0.000 n=10+9)
EventToCombinedMetrics-10 775ns ± 2% 641ns ± 1% -17.34% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
AggregateCombinedMetrics-10 5.11kB ± 3% 5.04kB ± 2% -1.36% (p=0.007 n=9+9)
AggregateBatchSerial-10 11.8kB ± 0% 18.7kB ± 0% +58.00% (p=0.000 n=10+10)
AggregateBatchParallel-10 11.8kB ± 1% 18.7kB ± 0% +58.42% (p=0.000 n=10+10)
CombinedMetricsEncoding-10 0.00B 0.00B ~ (all equal)
CombinedMetricsDecoding-10 10.3kB ± 0% 10.3kB ± 0% ~ (p=0.137 n=8+10)
CombinedMetricsToBatch-10 38.6kB ± 0% 38.6kB ± 0% ~ (all equal)
EventToCombinedMetrics-10 80.0B ± 0% 64.0B ± 0% -20.00% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
AggregateCombinedMetrics-10 39.5 ± 4% 39.4 ± 4% ~ (p=0.561 n=10+10)
AggregateBatchSerial-10 79.0 ± 0% 100.0 ± 0% +26.58% (p=0.000 n=10+10)
AggregateBatchParallel-10 79.0 ± 0% 100.0 ± 0% +26.58% (p=0.000 n=10+10)
CombinedMetricsEncoding-10 0.00 0.00 ~ (all equal)
CombinedMetricsDecoding-10 40.0 ± 0% 40.0 ± 0% ~ (all equal)
CombinedMetricsToBatch-10 308 ± 0% 308 ± 0% ~ (all equal)
EventToCombinedMetrics-10 6.00 ± 0% 4.00 ± 0% -33.33% (p=0.000 n=10+10)
The difference comes due to an increase in the merge operations which leads to more allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌 maybe we can come back to it when you have removed translation from merging
Removes the translation layer for calls to
AggregateBatch
.Before, for every received
APMEvent
we will create a intermediary model which would eventually get translated to the proto model. The proto model would get marshaled to binary representation and indexed in pebble.After this PR, we directly create the proto model for every received
APMEvent
.Benchmarking results:
The changes in
AggregateCombinedMetrics
are a result of changes in how we create the input. Using the same logic onmain
the differences are negligible: