Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use proto models directly for AggregateBatch #28

Merged
merged 2 commits into from
Jul 20, 2023

Conversation

lahsivjar
Copy link
Contributor

@lahsivjar lahsivjar commented Jul 20, 2023

Removes the translation layer for calls to AggregateBatch.

Before, for every received APMEvent we will create a intermediary model which would eventually get translated to the proto model. The proto model would get marshaled to binary representation and indexed in pebble.

After this PR, we directly create the proto model for every received APMEvent.

Benchmarking results:

name                         old time/op    new time/op    delta
AggregateCombinedMetrics-10    2.79µs ± 2%    3.60µs ± 4%  +29.06%  (p=0.000 n=10+10)
AggregateBatchSerial-10        12.7µs ± 3%     9.1µs ± 2%  -28.38%  (p=0.000 n=10+9)
AggregateBatchParallel-10      14.2µs ± 3%     9.3µs ± 2%  -34.37%  (p=0.000 n=10+9)
CombinedMetricsEncoding-10     3.89µs ± 1%    3.71µs ± 1%   -4.69%  (p=0.000 n=10+10)
CombinedMetricsDecoding-10     4.59µs ± 2%    4.60µs ± 1%     ~     (p=0.739 n=10+10)
CombinedMetricsToBatch-10       361µs ± 1%     372µs ± 1%   +3.17%  (p=0.000 n=10+10)
EventToCombinedMetrics-10      1.85µs ± 5%    0.78µs ± 2%  -57.98%  (p=0.000 n=10+10)

name                         old alloc/op   new alloc/op   delta
AggregateCombinedMetrics-10    3.88kB ± 3%    5.11kB ± 3%  +31.76%  (p=0.000 n=10+9)
AggregateBatchSerial-10        24.6kB ± 0%    11.8kB ± 0%  -51.98%  (p=0.000 n=8+10)
AggregateBatchParallel-10      24.6kB ± 0%    11.8kB ± 1%  -52.03%  (p=0.000 n=9+10)
CombinedMetricsEncoding-10      0.00B          0.00B          ~     (all equal)
CombinedMetricsDecoding-10     10.3kB ± 0%    10.3kB ± 0%     ~     (all equal)
CombinedMetricsToBatch-10      38.6kB ± 0%    38.6kB ± 0%     ~     (all equal)
EventToCombinedMetrics-10      4.29kB ± 0%    0.08kB ± 0%  -98.13%  (p=0.000 n=10+10)

name                         old allocs/op  new allocs/op  delta
AggregateCombinedMetrics-10      29.4 ± 2%      39.5 ± 4%  +34.35%  (p=0.000 n=10+10)
AggregateBatchSerial-10           112 ± 0%        79 ± 0%  -29.46%  (p=0.000 n=10+10)
AggregateBatchParallel-10         112 ± 0%        79 ± 0%  -29.46%  (p=0.000 n=9+10)
CombinedMetricsEncoding-10       0.00           0.00          ~     (all equal)
CombinedMetricsDecoding-10       40.0 ± 0%      40.0 ± 0%     ~     (all equal)
CombinedMetricsToBatch-10         308 ± 0%       308 ± 0%     ~     (all equal)
EventToCombinedMetrics-10        17.0 ± 0%       6.0 ± 0%  -64.71%  (p=0.000 n=10+10)

The changes in AggregateCombinedMetrics are a result of changes in how we create the input. Using the same logic on main the differences are negligible:

name                         old time/op    new time/op    delta
AggregateCombinedMetrics-10    3.90µs ± 4%    3.60µs ± 4%  -7.76%  (p=0.000 n=10+10)

name                         old alloc/op   new alloc/op   delta
AggregateCombinedMetrics-10    4.93kB ± 3%    5.11kB ± 3%  +3.72%  (p=0.000 n=10+9)

name                         old allocs/op  new allocs/op  delta
AggregateCombinedMetrics-10      38.0 ± 3%      39.5 ± 4%  +3.95%  (p=0.001 n=10+10)

@lahsivjar lahsivjar requested a review from a team as a code owner July 20, 2023 06:55
@lahsivjar lahsivjar requested a review from axw July 20, 2023 07:04

setCombinedMetrics := func(k CombinedMetricsKey, sim ServiceInstanceMetrics) {
cm, ok := kvs[k]
// m collects service instance metrics for each partition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you looked at writing every metric as a separate merge operation, and deferring all merging to pebble? I don't like that we have merging in two places. I think this should be possible by first calculating how many metrics we may write, to figure out how to distribute the event count, and then assigning it to each record as we go. But I don't know about the performance difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, indexing one by one gives the following benchmark diff with the current implementation (new refers to writing every metric as a separate merge operation):

name                         old time/op    new time/op    delta
AggregateCombinedMetrics-10    3.60µs ± 4%    3.56µs ± 2%     ~     (p=0.099 n=10+9)
AggregateBatchSerial-10        9.13µs ± 2%   12.48µs ± 2%  +36.68%  (p=0.000 n=9+10)
AggregateBatchParallel-10      9.30µs ± 2%   12.71µs ± 3%  +36.71%  (p=0.000 n=9+10)
CombinedMetricsEncoding-10     3.71µs ± 1%    3.67µs ± 1%   -1.01%  (p=0.000 n=10+10)
CombinedMetricsDecoding-10     4.60µs ± 1%    4.65µs ± 1%   +1.27%  (p=0.007 n=10+10)
CombinedMetricsToBatch-10       372µs ± 1%     364µs ± 1%   -2.32%  (p=0.000 n=10+9)
EventToCombinedMetrics-10       775ns ± 2%     641ns ± 1%  -17.34%  (p=0.000 n=10+9)

name                         old alloc/op   new alloc/op   delta
AggregateCombinedMetrics-10    5.11kB ± 3%    5.04kB ± 2%   -1.36%  (p=0.007 n=9+9)
AggregateBatchSerial-10        11.8kB ± 0%    18.7kB ± 0%  +58.00%  (p=0.000 n=10+10)
AggregateBatchParallel-10      11.8kB ± 1%    18.7kB ± 0%  +58.42%  (p=0.000 n=10+10)
CombinedMetricsEncoding-10      0.00B          0.00B          ~     (all equal)
CombinedMetricsDecoding-10     10.3kB ± 0%    10.3kB ± 0%     ~     (p=0.137 n=8+10)
CombinedMetricsToBatch-10      38.6kB ± 0%    38.6kB ± 0%     ~     (all equal)
EventToCombinedMetrics-10       80.0B ± 0%     64.0B ± 0%  -20.00%  (p=0.000 n=10+10)

name                         old allocs/op  new allocs/op  delta
AggregateCombinedMetrics-10      39.5 ± 4%      39.4 ± 4%     ~     (p=0.561 n=10+10)
AggregateBatchSerial-10          79.0 ± 0%     100.0 ± 0%  +26.58%  (p=0.000 n=10+10)
AggregateBatchParallel-10        79.0 ± 0%     100.0 ± 0%  +26.58%  (p=0.000 n=10+10)
CombinedMetricsEncoding-10       0.00           0.00          ~     (all equal)
CombinedMetricsDecoding-10       40.0 ± 0%      40.0 ± 0%     ~     (all equal)
CombinedMetricsToBatch-10         308 ± 0%       308 ± 0%     ~     (all equal)
EventToCombinedMetrics-10        6.00 ± 0%      4.00 ± 0%  -33.33%  (p=0.000 n=10+10)

The difference comes due to an increase in the merge operations which leads to more allocations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌 maybe we can come back to it when you have removed translation from merging

@lahsivjar lahsivjar merged commit 4b64bad into elastic:main Jul 20, 2023
2 checks passed
@lahsivjar lahsivjar deleted the remove-proto-translation branch July 20, 2023 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants