Use proto models directly for AggregateBatch #28

lahsivjar · 2023-07-20T06:55:51Z

Removes the translation layer for calls to AggregateBatch.

Before, for every received APMEvent we will create a intermediary model which would eventually get translated to the proto model. The proto model would get marshaled to binary representation and indexed in pebble.

After this PR, we directly create the proto model for every received APMEvent.

Benchmarking results:

name                         old time/op    new time/op    delta
AggregateCombinedMetrics-10    2.79µs ± 2%    3.60µs ± 4%  +29.06%  (p=0.000 n=10+10)
AggregateBatchSerial-10        12.7µs ± 3%     9.1µs ± 2%  -28.38%  (p=0.000 n=10+9)
AggregateBatchParallel-10      14.2µs ± 3%     9.3µs ± 2%  -34.37%  (p=0.000 n=10+9)
CombinedMetricsEncoding-10     3.89µs ± 1%    3.71µs ± 1%   -4.69%  (p=0.000 n=10+10)
CombinedMetricsDecoding-10     4.59µs ± 2%    4.60µs ± 1%     ~     (p=0.739 n=10+10)
CombinedMetricsToBatch-10       361µs ± 1%     372µs ± 1%   +3.17%  (p=0.000 n=10+10)
EventToCombinedMetrics-10      1.85µs ± 5%    0.78µs ± 2%  -57.98%  (p=0.000 n=10+10)

name                         old alloc/op   new alloc/op   delta
AggregateCombinedMetrics-10    3.88kB ± 3%    5.11kB ± 3%  +31.76%  (p=0.000 n=10+9)
AggregateBatchSerial-10        24.6kB ± 0%    11.8kB ± 0%  -51.98%  (p=0.000 n=8+10)
AggregateBatchParallel-10      24.6kB ± 0%    11.8kB ± 1%  -52.03%  (p=0.000 n=9+10)
CombinedMetricsEncoding-10      0.00B          0.00B          ~     (all equal)
CombinedMetricsDecoding-10     10.3kB ± 0%    10.3kB ± 0%     ~     (all equal)
CombinedMetricsToBatch-10      38.6kB ± 0%    38.6kB ± 0%     ~     (all equal)
EventToCombinedMetrics-10      4.29kB ± 0%    0.08kB ± 0%  -98.13%  (p=0.000 n=10+10)

name                         old allocs/op  new allocs/op  delta
AggregateCombinedMetrics-10      29.4 ± 2%      39.5 ± 4%  +34.35%  (p=0.000 n=10+10)
AggregateBatchSerial-10           112 ± 0%        79 ± 0%  -29.46%  (p=0.000 n=10+10)
AggregateBatchParallel-10         112 ± 0%        79 ± 0%  -29.46%  (p=0.000 n=9+10)
CombinedMetricsEncoding-10       0.00           0.00          ~     (all equal)
CombinedMetricsDecoding-10       40.0 ± 0%      40.0 ± 0%     ~     (all equal)
CombinedMetricsToBatch-10         308 ± 0%       308 ± 0%     ~     (all equal)
EventToCombinedMetrics-10        17.0 ± 0%       6.0 ± 0%  -64.71%  (p=0.000 n=10+10)

The changes in AggregateCombinedMetrics are a result of changes in how we create the input. Using the same logic on main the differences are negligible:

name                         old time/op    new time/op    delta
AggregateCombinedMetrics-10    3.90µs ± 4%    3.60µs ± 4%  -7.76%  (p=0.000 n=10+10)

name                         old alloc/op   new alloc/op   delta
AggregateCombinedMetrics-10    4.93kB ± 3%    5.11kB ± 3%  +3.72%  (p=0.000 n=10+9)

name                         old allocs/op  new allocs/op  delta
AggregateCombinedMetrics-10      38.0 ± 3%      39.5 ± 4%  +3.95%  (p=0.001 n=10+10)

axw · 2023-07-20T07:40:54Z

aggregators/converter.go

-
-	setCombinedMetrics := func(k CombinedMetricsKey, sim ServiceInstanceMetrics) {
-		cm, ok := kvs[k]
+	// m collects service instance metrics for each partition


Have you looked at writing every metric as a separate merge operation, and deferring all merging to pebble? I don't like that we have merging in two places. I think this should be possible by first calculating how many metrics we may write, to figure out how to distribute the event count, and then assigning it to each record as we go. But I don't know about the performance difference.

Yup, indexing one by one gives the following benchmark diff with the current implementation (new refers to writing every metric as a separate merge operation):

name old time/op new time/op delta AggregateCombinedMetrics-10 3.60µs ± 4% 3.56µs ± 2% ~ (p=0.099 n=10+9) AggregateBatchSerial-10 9.13µs ± 2% 12.48µs ± 2% +36.68% (p=0.000 n=9+10) AggregateBatchParallel-10 9.30µs ± 2% 12.71µs ± 3% +36.71% (p=0.000 n=9+10) CombinedMetricsEncoding-10 3.71µs ± 1% 3.67µs ± 1% -1.01% (p=0.000 n=10+10) CombinedMetricsDecoding-10 4.60µs ± 1% 4.65µs ± 1% +1.27% (p=0.007 n=10+10) CombinedMetricsToBatch-10 372µs ± 1% 364µs ± 1% -2.32% (p=0.000 n=10+9) EventToCombinedMetrics-10 775ns ± 2% 641ns ± 1% -17.34% (p=0.000 n=10+9) name old alloc/op new alloc/op delta AggregateCombinedMetrics-10 5.11kB ± 3% 5.04kB ± 2% -1.36% (p=0.007 n=9+9) AggregateBatchSerial-10 11.8kB ± 0% 18.7kB ± 0% +58.00% (p=0.000 n=10+10) AggregateBatchParallel-10 11.8kB ± 1% 18.7kB ± 0% +58.42% (p=0.000 n=10+10) CombinedMetricsEncoding-10 0.00B 0.00B ~ (all equal) CombinedMetricsDecoding-10 10.3kB ± 0% 10.3kB ± 0% ~ (p=0.137 n=8+10) CombinedMetricsToBatch-10 38.6kB ± 0% 38.6kB ± 0% ~ (all equal) EventToCombinedMetrics-10 80.0B ± 0% 64.0B ± 0% -20.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta AggregateCombinedMetrics-10 39.5 ± 4% 39.4 ± 4% ~ (p=0.561 n=10+10) AggregateBatchSerial-10 79.0 ± 0% 100.0 ± 0% +26.58% (p=0.000 n=10+10) AggregateBatchParallel-10 79.0 ± 0% 100.0 ± 0% +26.58% (p=0.000 n=10+10) CombinedMetricsEncoding-10 0.00 0.00 ~ (all equal) CombinedMetricsDecoding-10 40.0 ± 0% 40.0 ± 0% ~ (all equal) CombinedMetricsToBatch-10 308 ± 0% 308 ± 0% ~ (all equal) EventToCombinedMetrics-10 6.00 ± 0% 4.00 ± 0% -33.33% (p=0.000 n=10+10)

The difference comes due to an increase in the merge operations which leads to more allocations.

👌 maybe we can come back to it when you have removed translation from merging

lahsivjar requested a review from a team as a code owner July 20, 2023 06:55

Use proto models directly for AggregateBatch

6c6bfd7

lahsivjar force-pushed the remove-proto-translation branch from 8e09451 to 6c6bfd7 Compare July 20, 2023 07:02

Remove unused functions

57e787a

lahsivjar requested a review from axw July 20, 2023 07:04

axw reviewed Jul 20, 2023

View reviewed changes

axw approved these changes Jul 20, 2023

View reviewed changes

lahsivjar merged commit 4b64bad into elastic:main Jul 20, 2023
2 checks passed

lahsivjar deleted the remove-proto-translation branch July 20, 2023 09:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use proto models directly for AggregateBatch #28

Use proto models directly for AggregateBatch #28

lahsivjar commented Jul 20, 2023 •

edited

Loading

axw Jul 20, 2023

lahsivjar Jul 20, 2023

axw Jul 20, 2023

Use proto models directly for AggregateBatch #28

Use proto models directly for AggregateBatch #28

Conversation

lahsivjar commented Jul 20, 2023 • edited Loading

axw Jul 20, 2023

Choose a reason for hiding this comment

lahsivjar Jul 20, 2023

Choose a reason for hiding this comment

axw Jul 20, 2023

Choose a reason for hiding this comment

lahsivjar commented Jul 20, 2023 •

edited

Loading