Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aggregators: refactor, introduce Writer #36

Closed
wants to merge 2 commits into from
Closed

Conversation

axw
Copy link
Member

@axw axw commented Jul 24, 2023

Introduce a Writer type, which encapsulates a pebble.Batch and has methods for writing metrics to the batch. Splitting this out of the Aggregator type enables clients to create independent Writers which are bound to an execution context, to naturally avoid lock contention.

AggregateBatch is now known as WriteEventMetrics, and AggregateCombinedMetrics is now known as WriteCombinedMetrics. We remove tracing from WriteCombinedMetrics, to bring it closer in line with WriteEventMetrics; the cost of tracing at this level is too great.

Replace Aggregator.Stop with Aggregator.Close, which should always be called, even if periodic harvesting is never started.

Replace Aggregator.Run with Aggregator.StartHarvesting, which starts the harvest loop in the background with no context; it will be stopped by Aggregator.Close.

Move cachedEventsMap to its own file.

goos: linux
goarch: amd64
pkg: github.com/elastic/apm-aggregation/aggregators
cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics     
                            │ /tmp/main.txt │             /tmp/pr.txt             │
                            │    sec/op     │    sec/op     vs base               │
AggregateCombinedMetrics       8.631µ ±  4%   7.353µ ±  3%  -14.81% (p=0.002 n=6)
AggregateCombinedMetrics-8     7.252µ ±  8%   8.148µ ±  1%  +12.36% (p=0.002 n=6)
AggregateCombinedMetrics-16    7.312µ ±  9%   8.415µ ±  8%  +15.08% (p=0.004 n=6)
AggregateBatchSerial           22.09µ ±  9%   19.72µ ±  4%  -10.72% (p=0.002 n=6)
AggregateBatchSerial-8         19.45µ ±  7%   16.10µ ±  9%  -17.21% (p=0.002 n=6)
AggregateBatchSerial-16        19.00µ ±  9%   17.04µ ± 14%  -10.33% (p=0.002 n=6)
AggregateBatchParallel         21.59µ ±  3%   22.09µ ± 10%        ~ (p=0.065 n=6)
AggregateBatchParallel-8       23.40µ ± 12%   17.14µ ±  5%  -26.78% (p=0.002 n=6)
AggregateBatchParallel-16      24.74µ ± 18%   17.42µ ±  7%  -29.58% (p=0.002 n=6)
geomean                        15.33µ         13.79µ        -10.06%

                            │ /tmp/main.txt │             /tmp/pr.txt              │
                            │     B/op      │     B/op       vs base               │
AggregateCombinedMetrics       4.772Ki ± 6%   4.449Ki ±  2%   -6.76% (p=0.004 n=6)
AggregateCombinedMetrics-8     4.732Ki ± 9%   4.458Ki ±  1%        ~ (p=0.368 n=6)
AggregateCombinedMetrics-16    4.585Ki ± 7%   4.461Ki ±  2%        ~ (p=0.589 n=6)
AggregateBatchSerial          11.455Ki ± 3%   9.687Ki ± 10%  -15.44% (p=0.002 n=6)
AggregateBatchSerial-8        11.534Ki ± 2%   8.792Ki ±  9%  -23.77% (p=0.002 n=6)
AggregateBatchSerial-16       11.544Ki ± 1%   8.771Ki ± 12%  -24.03% (p=0.002 n=6)
AggregateBatchParallel         11.53Ki ± 2%   11.07Ki ±  4%   -3.96% (p=0.002 n=6)
AggregateBatchParallel-8       11.51Ki ± 1%   10.90Ki ±  8%   -5.32% (p=0.009 n=6)
AggregateBatchParallel-16      11.54Ki ± 2%   10.58Ki ±  5%   -8.31% (p=0.002 n=6)
geomean                        8.541Ki        7.598Ki        -11.04%

                            │ /tmp/main.txt │            /tmp/pr.txt            │
                            │   allocs/op   │ allocs/op   vs base               │
AggregateCombinedMetrics         38.00 ± 5%   29.00 ± 3%  -23.68% (p=0.002 n=6)
AggregateCombinedMetrics-8       38.00 ± 8%   29.00 ± 3%  -23.68% (p=0.002 n=6)
AggregateCombinedMetrics-16      37.00 ± 5%   29.00 ± 3%  -21.62% (p=0.002 n=6)
AggregateBatchSerial             79.00 ± 4%   68.50 ± 8%  -13.29% (p=0.002 n=6)
AggregateBatchSerial-8           79.00 ± 3%   63.00 ± 8%  -20.25% (p=0.002 n=6)
AggregateBatchSerial-16          79.00 ± 1%   63.00 ± 8%  -20.25% (p=0.002 n=6)
AggregateBatchParallel           79.00 ± 3%   76.00 ± 3%   -3.80% (p=0.009 n=6)
AggregateBatchParallel-8         79.00 ± 0%   75.00 ± 5%   -5.06% (p=0.002 n=6)
AggregateBatchParallel-16        79.00 ± 0%   75.00 ± 0%   -5.06% (p=0.002 n=6)
geomean                          61.72        52.11       -15.56%

Introduce a Writer type, which encapsulates
a pebble.Batch and has methods for writing
metrics to the batch. Splitting this out of
the Aggregator type enables clients to create
independent Writers which are bound to an
execution context, to naturally avoid lock
contention.

Replace Aggregator.Stop with Aggregator.Close,
which should always be called, even if periodic
harvesting is never started.

Replace Aggregator.Run with Aggregator.StartHarvesting,
which starts the harvest loop in the background
with no context; it will be stopped by Aggregator.Close.

Move cachedEventsMap to its own file.
@axw axw marked this pull request as ready for review July 24, 2023 06:50
@axw axw requested a review from a team as a code owner July 24, 2023 06:50
Copy link
Contributor

@lahsivjar lahsivjar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave a brief look over the code. Do you think it would be a good idea to move the writers to an internal package? It would make the interaction between aggregator and writer a bit more cleaner I think.

}
batch := w.batch
w.batch = nil
w.aggregator.writers.Delete(w)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move managing the writers to the aggregator code? Also, maybe it would be beneficial to move the writer to internal package to make the interface (public and private methods) a bit more cleaner?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move managing the writers to the aggregator code?

Do you mean move move the Writer.Close method into aggregator?

Also, maybe it would be beneficial to move the writer to internal package to make the interface (public and private methods) a bit more cleaner?

There's not a lot of code in writer.go, so I don't personally see value in making an internal package. I'm not sure I follow though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not a lot of code in writer.go, so I don't personally see value in making an internal package. I'm not sure I follow though.

Sorry, should have explained in more detail. The interaction between writer and aggregator seems a bit cluttered since they use the private and the public methods as required. I was suggesting moving the writer to internal in its separate package so that the interaction is cleaner (the exposed public methods are called between writer and aggregator).

Do you mean move move the Writer.Close method into aggregator?

I was referring to removing the access of writers map from writer.go. Maybe we can return the batch from Close and delete if from map in the aggregator code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should have explained in more detail. The interaction between writer and aggregator seems a bit cluttered since they use the private and the public methods as required. I was suggesting moving the writer to internal in its separate package so that the interaction is cleaner (the exposed public methods are called between writer and aggregator).

I had a look, but it's not straightforward since the writer needs to refer to the CombinedMetricsKey type.

I was referring to removing the access of writers map from writer.go. Maybe we can return the batch from Close and delete if from map in the aggregator code.

That won't allow a direct call to Writer.Close to remove the writer from the Aggregator. A couple of options:

  • We can not remove from the map eagerly, but defer that to harvest or Aggregator.Close. I don't love this since it means resources hang around unnecessarily.
  • We could inject a callback into Writer, to be called on Close

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could inject a callback into Writer, to be called on Close

This sounds good but it might be problematic to pass callback in case we want to close the writer independently of the aggregator.


mu sync.Mutex
mu sync.RWMutex
writers sync.Map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that writers is always accessed within Aggregator#mu#Lock. Maybe we don't need the sync.Map? I think we should also be good with a slice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That mutex isn't necessarily held when closing a writer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a slice requires O(N) writer removal though

Copy link
Member

@carsonip carsonip Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just a map[*Writer]struct{}, given it is protected by the aggregator mutex?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(oh I'm late to the party, ignore my comments. Thanks for @axw 's explanation. I see that Writer.Close only holds w.mu.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that Writer.Close only holds w.mu.)

Yes but the caller to Writer.Close i.e. Aggregator.Close gets a lock. I was suggesting to keep the access of writers to only the aggregator code and remove the requirement of sync based structure.

Copy link
Member

@carsonip carsonip Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We remove all at one time, we can just iterate over the slice from the end and mark each of them nil. No need for O(N) I think.

I meant in the case of one writer closing itself, not aggregator.Close.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but the caller to Writer.Close i.e. Aggregator.Close gets a lock.

I believe user code will call Writer.Close as well, not only Aggregator.Close.

Copy link
Member Author

@axw axw Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe user code will call Writer.Close as well, not only Aggregator.Close

Right. At least in theory, you could have multiple independent writers and they can each independently close their writers. For example, see BenchmarkAggregateBatch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe user code will call Writer.Close as well, not only Aggregator.Close.

Ah, good point, didn't realize this.

@axw
Copy link
Member Author

axw commented Jul 24, 2023

@lahsivjar could you please elaborate on that? I'm not picturing how that would look.

// is closed.
//
// NewWriter will return an error if the aggregator has already been stopped.
func (a *Aggregator) NewWriter() (*Writer, error) {
a.mu.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RLock should be sufficient but it leads to worse lock contention and worse performance in other places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very surprising, given that NewWriter should not be in any hot code paths...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

under BenchmarkAggregateBatchParallel with -cpu=512 it is. But I guess that's too extreme having many new writers in a second.

@axw axw closed this Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants