Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AC hit rate metrics with prometheus labels (alternative implementation with decorator) #358

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ulrfa
Copy link
Contributor

@ulrfa ulrfa commented Sep 26, 2020

This is a draft, not ready to be merged. I wait with test cases until
getting feedback about the main functionality.

Same functionality as in #350

  • Prometheus counter for cache hit ratio of only AC requests.

  • Support for prometheus labels based on custom HTTP and gRPC headers.

but implemented in an alternative way:

  • disk.io implements two interfaces: cache.CasAcCache, cache.Stats

  • cache.metricsdecorator is decorator for cache.CasAcCache and
    provides prometheus metrics.

Pros with this alternative implementation:

  • Should allow non-prometheus metrics as requested in
    Metrics interface(s) #355

  • Avoid the question about if the counters should be placed in
    disk.go or http.go/grpc*.go. If placing in disk.go, there are
    issues about how to avoid incrementing counter twice for the
    same request (both in Get and in GetValidatedActionResult)
    and at the same time count found AC but missing CAS, as cache miss.

  • Makes headers available also for logic in disk.go, which could be
    useful for other functionality in the future.

  • Metrics can be separated from logging, and still not require
    injecting counter increment invocations in tons of places.
    Incrementing from a few well defined places minimize the risk
    for bugs in the metrics.

This is a draft, not ready to be merged. I wait with test cases until
getting feedback about the main functionality.

Same functionality as in buchgr#350

 - Prometheus counter for cache hit ratio of only AC requests.

 - Support for prometheus labels based on custom HTTP and gRPC headers.

but implemented in an alternative way:

 - disk.io implements two interfaces: cache.CasAcCache, cache.Stats

 - cache.metricsdecorator is decorator for cache.CasAcCache and
   provides prometheus metrics.

Pros with this alternative implementation:

 - Should allow non-prometheus metrics as requested in
   buchgr#355

 - Avoid the question about if the counters should be placed in
   disk.go or http.go/grpc*.go. If placing in disk.go, there are
   issues about how to avoid incrementing counter twice for the
   same request (both in Get and in GetValidatedActionResult)
   and at the same time count found AC but missing CAS, as cache miss.

 - Makes headers available also for logic in disk.go, which could be
   useful for other functionality in the future.

 - Metrics can be separated from logging, and still not require
   injecting counter increment invocations in tons of places.
   Incrementing from a few well defined places minimize the risk
   for bugs in the metrics.
@ulrfa ulrfa mentioned this pull request Sep 26, 2020
Copy link
Collaborator

@mostynb mostynb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had time to fully digest this or the original version yet, but this feels a bit cleaner so far. I'll get back to you late this week with some more feedback.

@@ -649,6 +649,9 @@ func cacheFilePath(kind cache.EntryKind, cacheDir string, hash string) string {
// value from the CAS if it and all its dependencies are also available. If
// not, nil values are returned. If something unexpected went wrong, return
// an error.
// TODO Consider separating implementation of cache.AcStore interface, to open up
// possibilities combining that functionality with proxies, and also allow
// bazel-remote configurations with proxy but no local disk storage?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A proxy-only mode has been on my todo-list for a while. I have considered doing this as a short-circuit inside disk.Cache (it might be time to rename or split the package at that point).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mostynb Does it make sense to bring back the cache.Cache interface that the proxies and disk cache used to implement for something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mostynb Is it something more than implementation of GetValidatedActionResult, that you expect will be needed from disk.io, in a proxy-only-mode?

@@ -344,8 +348,7 @@ func main() {
}

validateAC := !c.DisableHTTPACValidation
h := server.NewHTTPCache(diskCache, accessLogger, errorLogger, validateAC, c.EnableACKeyInstanceMangling, gitCommit)

h := server.NewHTTPCache(casAcCache, diskCache, accessLogger, errorLogger, validateAC, c.EnableACKeyInstanceMangling, gitCommit)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit strange to pass both casAcCache and diskCache here. Doesn't the former wrap the latter?

Copy link
Contributor Author

@ulrfa ulrfa Sep 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewHTTPCache takes two interfaces: cache.BlobAcStore and cache.Stats.

The file disk.go implements both those interfaces.

But metricsdecorator is a decorator only for cache.BlobAcStore. The main reason is that I imagine using other metricsdecorator instances wrapping also the proxies, replacing the current counters in the proxies. But the proxies does not implement cache.Stats and therfore metricsdecorator also does not. (Using metricsdecorator instances also for proxies would involve setting counter name in call to NewMetricsDecorator, optional non-AC metrics support , etc)

The GRPC server is not using the cache.Stats interface.

// TODO Document interface
type CasAcCache interface {
// TODO change to io.ReadCloser?
Put(kind EntryKind, hash string, size int64, rdr io.Reader, reqCtx RequestContext) error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could use a map[string]string instead of a RequestContext interface in these functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that first, but ended up with cache.RequestContext because:

  • gRPC and HTTP/2 is using canonical lowercase header names, but the golang HTTP/1.1 library normalize header names to start with capital letter. I want to hide that difference, but avoid translating headers that are never accessed, and instead translate on demand in:

    bazel-remote/server/http.go

    Lines 395 to 398 in 4b2b8b2

    func (h *reqCtxHttp) GetHeader(headerName string) (headerValues []string) {
    headerName = strings.Title(headerName)
    if headerValues, ok := h.request.Header[headerName]; ok {
    return headerValues

  • Minimize overhead for creating cache.RequestContext instance in general (only allocate struct and store a pointer)

  • Allows extending cache.RequestContext with additional methods that returns non-header meta information. Such as Host, RemoteAddr or Cookie from HTTP requests. Or any other information that could be added by bazel-remote’s HTTP/GRPC servers, like enum indicating if original request were HTTP or GRPC.

  • Allows extending cache.RequestContext to allow proxies to cancel outgoing requests, if associated incomming request is canceled. I have not looked into details, but seems to be something related to that in underlying GRPC and HTTP contexts.

  • Allows extending cache.RequestContext in general, instead of having to update all Get/Put/Contains invocations with additional parameters, all over bazel-remote codebase.

  • Allows propagating custom information in general between a chain of different cache.BlobStore implementations (such as proxies, decorators, …)


// TODO Document interface
type AcStore interface {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this interface used anywhere outside BlobAcStore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I’m only thinking that having a separate AcStore interface might be more generic and flexible. Especially if GetValidatedActionResult method would be extracted from disk.go in the future. Or if there would be decorators wrapping only AcStore. But I don’t have much experience of embedding interfaces and structs in golang. It might work with only BlobStore and BlobAcStore interface, if you think it is preferable to avoid AcStore?

@ulrfa
Copy link
Contributor Author

ulrfa commented Jan 25, 2021

Hi @mostynb. I would be happy to continue working on this patch, what do you think?

@bdittmer
Copy link
Contributor

@ulrfa Is it possible to break this into two PRs? Basically thinking:

  1. The AC/CAS abstractions and RequestContext
  2. The metrics additions?

I think it will be easier to digest and review. If not I understand and will carve out some time later to look it over.

@ulrfa
Copy link
Contributor Author

ulrfa commented Jan 28, 2021

Hi @bdittmer,

@ulrfa Is it possible to break this into two PRs? Basically thinking:

  1. The AC/CAS abstractions and RequestContext

That is all changes except the three files listed below.

  1. The metrics additions?

That is the changes in the three source files:

  • cache/metricsdecorator/metricsdecorator.go
  • config/config.go
  • main.go

Perhaps the list of source files above is enough for decision about if you would want me to go forward with this. If you decide yes, then I will add test cases and can also split into several PRs if you want. Is that OK?

I will probably not have time to work on this until after next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants