Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vtprotobuf: Unmarshal uses pooled objects #37

Closed
wants to merge 2 commits into from

Conversation

carsonip
Copy link
Member

@carsonip carsonip commented Jul 25, 2023

Since our use case does not really fit into using a pooled backing array containing pointers to allocated
objects given the recent partitioning, we continue to rely on pooling individual Keyed* structures. This PR adds support for using the Keyed* pools during Unmarshal, which should lead to lower allocs during merging. The benefit of this diminishes as concurrency increases, understandably as there will be higher lock contention for sync pool.

benchstat (rebased & updated on 1 aug):

goos: linux
goarch: amd64
pkg: github.com/elastic/apm-aggregation/aggregators
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
                                       │ bench-out/vt-unmarshal-pooling-before │ bench-out/vt-unmarshal-pooling-after │
                                       │                sec/op                 │    sec/op      vs base               │
AggregateCombinedMetrics-16                                       2.029µ ±  5%     1.965µ ± 7%        ~ (p=0.310 n=6)
AggregateBatchSerial-16                                           5.606µ ±  4%     4.932µ ± 3%  -12.02% (p=0.002 n=6)
AggregateBatchParallel-16                                         5.840µ ± 10%     5.344µ ± 6%   -8.49% (p=0.002 n=6)
CombinedMetricsEncoding-16                                        1.418µ ±  3%     1.450µ ± 2%        ~ (p=0.223 n=6)
CombinedMetricsToBatch-16                                         62.09µ ±  4%     62.58µ ± 4%        ~ (p=0.589 n=6)
EventToCombinedMetrics-16                                         412.2n ±  4%     416.9n ± 4%        ~ (p=0.485 n=6)
NDJSONSerial/go-2.0.0.ndjson-16                                   32.79m ±  7%     30.93m ± 3%   -5.68% (p=0.002 n=6)
NDJSONSerial/nodejs-3.29.0.ndjson-16                              18.82m ±  4%     17.89m ± 3%   -4.97% (p=0.002 n=6)
NDJSONSerial/python-6.7.2.ndjson-16                               54.16m ±  2%     51.49m ± 2%   -4.93% (p=0.002 n=6)
NDJSONSerial/ruby-4.5.0.ndjson-16                                 28.19m ±  1%     27.25m ± 1%   -3.35% (p=0.002 n=6)
NDJSONParallel/go-2.0.0.ndjson-16                                 31.79m ±  2%     30.83m ± 2%   -3.01% (p=0.002 n=6)
NDJSONParallel/nodejs-3.29.0.ndjson-16                            18.76m ±  2%     18.08m ± 2%   -3.62% (p=0.002 n=6)
NDJSONParallel/python-6.7.2.ndjson-16                             54.49m ±  2%     51.59m ± 3%   -5.33% (p=0.002 n=6)
NDJSONParallel/ruby-4.5.0.ndjson-16                               28.54m ±  1%     27.05m ± 2%   -5.21% (p=0.002 n=6)
geomean                                                           643.9µ           617.9µ        -4.04%

                                       │ bench-out/vt-unmarshal-pooling-before │  bench-out/vt-unmarshal-pooling-after  │
                                       │                 B/op                  │     B/op       vs base                 │
AggregateCombinedMetrics-16                                    1.411Ki ±  1%     1.101Ki ±  1%  -21.98% (p=0.002 n=6)
AggregateBatchSerial-16                                        2.466Ki ±  5%     1.363Ki ±  1%  -44.72% (p=0.002 n=6)
AggregateBatchParallel-16                                      2.543Ki ±  2%     1.374Ki ±  2%  -46.00% (p=0.002 n=6)
CombinedMetricsEncoding-16                                       837.5 ± 39%       864.5 ± 10%        ~ (p=0.589 n=6)
CombinedMetricsToBatch-16                                      3.982Ki ±  2%     3.984Ki ±  0%        ~ (p=0.442 n=6)
EventToCombinedMetrics-16                                        0.000 ±  0%       0.000 ±  0%        ~ (p=1.000 n=6) ¹
NDJSONSerial/go-2.0.0.ndjson-16                               11.524Mi ±  1%     8.977Mi ±  1%  -22.10% (p=0.002 n=6)
NDJSONSerial/nodejs-3.29.0.ndjson-16                           6.967Mi ±  0%     5.453Mi ±  0%  -21.74% (p=0.002 n=6)
NDJSONSerial/python-6.7.2.ndjson-16                            19.54Mi ±  0%     14.99Mi ±  1%  -23.30% (p=0.002 n=6)
NDJSONSerial/ruby-4.5.0.ndjson-16                             10.346Mi ±  0%     8.034Mi ±  0%  -22.35% (p=0.002 n=6)
NDJSONParallel/go-2.0.0.ndjson-16                             11.527Mi ±  0%     8.984Mi ±  1%  -22.06% (p=0.002 n=6)
NDJSONParallel/nodejs-3.29.0.ndjson-16                         6.969Mi ±  0%     5.456Mi ±  0%  -21.71% (p=0.002 n=6)
NDJSONParallel/python-6.7.2.ndjson-16                          19.51Mi ±  0%     15.00Mi ±  0%  -23.11% (p=0.002 n=6)
NDJSONParallel/ruby-4.5.0.ndjson-16                           10.351Mi ±  0%     8.027Mi ±  0%  -22.45% (p=0.002 n=6)
geomean                                                                      ²                  -21.84%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                       │ bench-out/vt-unmarshal-pooling-before │ bench-out/vt-unmarshal-pooling-after │
                                       │               allocs/op               │  allocs/op   vs base                 │
AggregateCombinedMetrics-16                                       21.00 ± 0%      17.00 ± 0%  -19.05% (p=0.002 n=6)
AggregateBatchSerial-16                                           26.00 ± 4%      13.00 ± 0%  -50.00% (p=0.002 n=6)
AggregateBatchParallel-16                                         27.00 ± 4%      13.00 ± 8%  -51.85% (p=0.002 n=6)
CombinedMetricsEncoding-16                                        0.000 ± 0%      0.000 ± 0%        ~ (p=1.000 n=6) ¹
CombinedMetricsToBatch-16                                         95.00 ± 0%      95.00 ± 0%        ~ (p=1.000 n=6) ¹
EventToCombinedMetrics-16                                         0.000 ± 0%      0.000 ± 0%        ~ (p=1.000 n=6) ¹
NDJSONSerial/go-2.0.0.ndjson-16                                  139.4k ± 3%     109.1k ± 1%  -21.78% (p=0.002 n=6)
NDJSONSerial/nodejs-3.29.0.ndjson-16                             78.06k ± 0%     60.12k ± 0%  -22.98% (p=0.002 n=6)
NDJSONSerial/python-6.7.2.ndjson-16                              214.5k ± 2%     157.4k ± 2%  -26.60% (p=0.002 n=6)
NDJSONSerial/ruby-4.5.0.ndjson-16                               122.32k ± 0%     94.77k ± 0%  -22.52% (p=0.002 n=6)
NDJSONParallel/go-2.0.0.ndjson-16                                139.4k ± 1%     109.0k ± 2%  -21.85% (p=0.002 n=6)
NDJSONParallel/nodejs-3.29.0.ndjson-16                           78.07k ± 0%     60.12k ± 0%  -22.99% (p=0.002 n=6)
NDJSONParallel/python-6.7.2.ndjson-16                            214.1k ± 2%     157.1k ± 2%  -26.59% (p=0.002 n=6)
NDJSONParallel/ruby-4.5.0.ndjson-16                             122.31k ± 0%     94.69k ± 0%  -22.58% (p=0.002 n=6)
geomean                                                                      ²                -23.66%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Closes #20

@carsonip carsonip requested a review from a team as a code owner July 25, 2023 09:48
@dmathieu
Copy link
Member

Diff between the two commits in the fork: https://github.com/carsonip/vtprotobuf/compare/22cdafcdd35e..2cf3150fde0e

@dmathieu
Copy link
Member

Won't this change be hard to get accepted in vtprotobuf upstream? Doesn't this mean we'll end up with a fork indefinitely?

@carsonip
Copy link
Member Author

I'm also a little hesitant about this as well, and yes that means we'll end up with a fork.

But I believe upstream will go down the route of pooling the backing arrays which contain pointers to objects in the heap. e.g. for a var s []*KeyedTransactionMetrics, when it is returned to the pool, all the KeyedTransactionMetrics are reset, and the slice is only resliced, i.e. s = s[:0]. Upon reuse, it is resliced, i.e. s = s[:len(s)+1], and then s[0] will contain an empty KeyedTransactionMetrics for reuse without alloc.

Currently for our usage, the upstream implementation may not be very efficient for us, since we try to obtain individual elements Keyed* from VTPool while the pool is never populated and it will result in allocs.

I'm happy to put this back into draft and wait for the code to stabilize a bit before we pick a route. It is micro-optimization anyway.

@carsonip carsonip marked this pull request as draft July 25, 2023 10:35
@carsonip
Copy link
Member Author

carsonip commented Aug 2, 2023

Closing in favor of #63

@carsonip carsonip closed this Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vtproto UnmarshalVT is allocating since pointer is nil
2 participants