-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics interface(s) #355
Comments
Hi @bdittmer. @ulrfa has proposed some changes to the metrics in #350 - as part of that effort maybe we can move to a metrics interface that would minimise the maintenance cost of your fork? It would be good to have your input on that (both on the interface/config and also the metrics that are being collected). That might give you another option: a local patch to implement your metrics implementation, but with a cleaner integration point. |
This is a draft, not ready to be merged. I wait with test cases until getting feedback about the main functionality. Same functionality as in buchgr#350 - Prometheus counter for cache hit ratio of only AC requests. - Support for prometheus labels based on custom HTTP and gRPC headers. but implemented in an alternative way: - disk.io implements two interfaces: cache.CasAcCache, cache.Stats - cache.metricsdecorator is decorator for cache.CasAcCache and provides prometheus metrics. Pros with this alternative implementation: - Should allow non-prometheus metrics as requested in buchgr#355 - Avoid the question about if the counters should be placed in disk.go or http.go/grpc*.go. If placing in disk.go, there are issues about how to avoid incrementing counter twice for the same request (both in Get and in GetValidatedActionResult) and at the same time count found AC but missing CAS, as cache miss. - Makes headers available also for logic in disk.go, which could be useful for other functionality in the future. - Metrics can be separated from logging, and still not require injecting counter increment invocations in tons of places. Incrementing from a few well defined places minimize the risk for bugs in the metrics.
Hi @ulrfa! #358 is very interesting and I like the direction. Interfaces for the AC and CAS caches is very useful and makes layering functionality on top of But for metrics, I was thinking of something like this:
with concrete implementations in different packages (we shouldn't force users to pull in Prometheus dependencies if they don't use Prometheus). Thoughts? |
I’m afraid such interface would have to be more complex. Like handling labels, other metric types than counter and gauge, perhaps controlling when the counters are created, etc. What about |
Yeah, that was a some what contrived and simple example. To expand on it, I think each each client of a I admittedly haven't investigated the implications for the gRPC metrics (we aren't using it) but will look into that. Would a proof of concept PR with a reference prometheus implementation help? |
Personally I prefer using prometheus official golang client library directly. I don’t fully understand your idea, but got the feeling that it would add complexity and overhead. And bazel-remote is using third party libraries |
Hi @ulrfa, I just opened #362 so you can get a better idea of what I'm thinking. I don't think it adds complexity or overhead, and for those of us who aren't fortunate enough to have infrastructure for supporting prometheus metrics I think this provides a nice middle ground. Let me know what you think, thanks! |
#362 implements only a subset of the Prometheus functionality. I’m lacking for example the Histogram type and Labels for all types. Prometheus has many methods for configuring labels in different ways. Others might lack other functionality. I belive it would become complex when considering all that. I think I prefer having knowledge about counters as local as possible, e.g. in the file where they are incremented. Not also listing counters in a central file like: bazel-remote/metric/prometheus/prometheus.go Lines 19 to 29 in 94d39cb
|
Sorry, I have been busy with other work and haven't had an opportunity to respond to @ulrfa regarding his comments on my metrics abstraction proof of concept, #362. My main issue is unresolved in #358: we don't use Prometheus for metrics collection and we don't need the hard dependency on Prometheus. Users shouldn't be forced to patch the upstream or depend on Prometheus if they don't use it. I think parts of #358 are good and other parts are not. It adds a lot of complexity that can be implemented by users in their own fork by implementing a simple interface. #362 is the best of both worlds: if a user wants to collect all their metrics on the cache side (we do not and don't think others should) than they can create their own I'll comment on #358 tomorrow and respond to @ulrfa opinions tomorrow, thanks for keeping this on your radar @mostynb! |
From what I can see the only references to prometheus in #358 are in the metricsdecorator package (if we were to remove the preexisting metrics code), and that package is only referenced from main. Would that make it easy to skip this code and the prometheus dependency in your fork?
I don't have much insight into how bazel-remote is used in the wild, unfortunately. |
If the Prometheus metrics are removed from other places (e.g. https://github.com/buchgr/bazel-remote/pull/358/files#diff-2a4bb11e9a9fb8613e7bb7dd4637b2e750894407c702f8bbf2584775f0127444L23-L24) and isolated to a single package that would help and that's exactly what #362 is trying to accomplish. The difference is #362 leaves the current metrics in place and makes introducing new metrics collection systems (for instance, statsd) or more complicated metrics logic (like applying labels) trivial. A user can create a new package with their metrics implementation and either patch it into |
Is this still blocking patches for AC hit rate metrics? I'm personally not sure allowing drop-in replacements for various instrumentation systems is worth the complexity. Could downstream integrations be based off the prometheus endpoint instead (like a reverse statsd_exporter)? |
Regarding metrics collection, will you consider a PR that replaces the hard dependency on prometheus with a dependency on one or more metrics interfaces? It will keep prometheus as the default/example implementation. Background:
We have a fork of bazel-remote to wire in our internal services (app config, logging, health checks, etc.). We also try (aspire?) to keep our fork as close to the upstream as possible. This is somewhat manageable with a separate
main
package and the available interfaces.Internally we don't officially support prometheus. With the existing metrics implementation I think there are two options to avoid modifying upstream code:
I have a general idea what this looks like but want to gauge interest before committing to a PR. Thanks!
The text was updated successfully, but these errors were encountered: