-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: record metrics from rules and export to remote #3861
base: main
Are you sure you want to change the base?
Conversation
be2d95a
to
c90f289
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work, Alberto! 🚀
I'd like to discuss the queries we aim to answer. Have you analyzed how the exported metrics will be used? Just some example use cases
pkg/experiment/block/compaction.go
Outdated
func pyroscopeInstanceHash(shard uint32, id uuid.UUID) string { | ||
buf := make([]byte, 0, 40) | ||
buf = append(buf, byte(shard>>24), byte(shard>>16), byte(shard>>8), byte(shard)) | ||
buf = append(buf, id.String()...) | ||
return fmt.Sprintf("%x", xxhash.Sum64(buf)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we're using a UUID generated by compaction worker.
First of all, it is not helpful and will cause data duplication. Jobs might be retried multiple times: each attempt may result in exported samples that will have its own __pyroscope_instance__
label, that prevents deduplication in the metrics backend. Second, it will result in cardinality issues: there might be dozens and hundreds of compaction worker, each of them can handle any block (i.e., we get Rules x Shards x Workers series, where each rule may produce multiple series, based on the aggregation dimensions).
Note that compaction job source blocks always belong to the same shard but may be produced by a set of segment writers. This is a typical situation, when the shard ownership/affinity changes due to the topology change (node added or removed), or when the primary owner is not available, or when the placement rules for the dataset change.
It's possible that we have two segments with identical timestamps (given the millisecond precision of ULIDs). Whether we want to handle the issue in the very first version is probably the most important question, if we decide to use segment timestamps. I'd say, no, we don't have to. And if were to, we would need to ensure that the data sent from different segment origins is not mixed. The segment origin is determined by a combination of the Shard
and CreatedBy
metadata attributes and the timestamp of segment creation. We assume that within the Shard/CreatedBy
timestamp collision is not possible (this is not guaranteed strictly speaking). Shard/CreatedBy
cardinality is bound and is typically a 1:1 mapping. However, the worst case scenario is N*M – therefore we may want to get rid of it (e.g., by aggregating data in the backend with recording rules).
I see the following ways to solve/mitigate it:
- Add
Shard/CreatedBy
as a series label (hash of it). We probably could be fine with justCreatedBy
, but we need to make sure the timestamp collision is not possible in the segment writer: imagine a series is moved from one shard to another, hosted by the same segment-writer, and the timestamps of the segments that include this "transition" match. Such samples would be deduplicated in the time series (prometheus-like) backend. - Add an explicit metadata attribute to include the timestamp in nanosecond precision sufficient for our needs in practice. The timestamp is the real local time of the segment-writer produced the block.
- Handle this in compaction planner: we could probably somehow "guess" the timestamp, provided that we have all the information needed there.
It may be tempting to implement p.2. However, before we go further, I'd like to see analysis of the access patterns – basically: what queries we expect: for example, aggregation functions supported. Do we want to support functions without associative property (e.g., mean/average)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed to use CreatedBy
instead of worker id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be addressed at https://github.com/grafana/pyroscope-squad/issues/336
7d1e59b
to
1700a83
Compare
w := NewBlockWriter(dst, b.path, tmpdir) | ||
defer func() { | ||
err = multierror.New(err, w.Close()).Err() | ||
}() | ||
// Datasets are compacted in a strict order. | ||
for _, s := range b.datasets { | ||
s.registerSampleObserver(observer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point I think it's better to register observer in the dataset instead of passing it through this long call chain compact > mergeAndClose > merge > writeRow
e1221a0
to
ff45b50
Compare
ff45b50
to
56bc260
Compare
pkg/experiment/metrics/exporter.go
Outdated
panic(err) | ||
} | ||
|
||
c, err := remote.NewWriteClient("exporter", &remote.ClientConfig{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feedback from the Mimir team, we should set a custom user agent to identify this is the "pyroscope-metrics-exporter". See https://raintank-corp.slack.com/archives/C03NCLB4GG7/p1738233634244319
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too sure where? Can you point me to the piece of code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-read it. I thought we had to change the NewWriteClient name field. I'll do it with user agent instead. Thanks for pointing it out
2bb97ad
to
d8db16c
Compare
|
||
func (o *MetricsExporterSampleObserver) Flush() error { | ||
go func() { | ||
NewExporter(o.tenant, o.recorder.Recordings).Send() // TODO log error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before merging this should log some errors
RetryOnRateLimit: false, | ||
}) | ||
if err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should return error instead
// TODO | ||
return Config{ | ||
url: "omitted", | ||
username: "omitted", | ||
password: "omitted", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this is hardcoded, how can this be used?
Eventually you could read from environment variables until this is figured out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will inject it by config/env vars in a following PR.
func newClient(cfg Config) remote.WriteClient { | ||
wURL, err := url.Parse(cfg.url) | ||
if err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should return error instead
return nil | ||
} | ||
if e.client == nil { | ||
e.client = newClient(e.config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would create the client at NewExporter time as well and then handle problems as they arise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This came from the old code before Observer pattern, where exporter was created at the beginning and I was trying to save resources by delaying client creation. I'll create the client at the NewExporter once we ensure e.data is not empty
pkg/experiment/metrics/exporter.go
Outdated
panic(err) | ||
} | ||
|
||
c, err := remote.NewWriteClient("exporter", &remote.ClientConfig{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too sure where? Can you point me to the piece of code
} | ||
|
||
func recordingRulesFromTenant(tenant string) []*RecordingRule { | ||
// TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though you want to hardcode and test it, I would prefer those coming from e.g. an environment variable or a file (and parsed as yaml)so we can change them quicker than recompiling a new version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR, we introduce a first version of the metrics recorder and metrics exporter.
Every level 1 compaction job will record metrics from profiles in the form of time series. The recording will follow some recording rules given by config or an external service (for now, this is hardcoded to a single recording rule). The recorded metrics are exported to a remote after the compaction process.
Generated metrics are aggregations of total values of some kind of dimension (or profile type). The aggregation process is explained below:
time = blockTime
and value equal to the sum of all totalValues that match (T, F, E).Example:
Let's consider the following profiles present in some blocks being compacted
{service_name="worker", job="batch_compress", region="eu"}
{service_name="worker", job="batch_compress", region="eu"}
{service_name="API", region="eu"}
{service_name="worker", job="batch_compress", region="ap"}
{service_name="worker", job="batch_compress", region="us"}
{service_name="worker", job="batch_compress", region="eu"}
And the following recording rule:
Name = "cpu_usage_compress_workers"
T = cpu samples
F =
{service_name="worker", job="batch_compress"}
E =
"region"
This will result in the following exported series and samples.
{__name__="cpu_usage_compress_workers", service_name="worker", job="batch_compress", region="eu"} = (t, 120)
{__name__="cpu_usage_compress_workers", service_name="worker", job="batch_compress", region="ap"} = (t, 30)
{__name__="cpu_usage_compress_workers", service_name="worker", job="batch_compress", region="us"} = (t, 40)
Note that Profile 1 was discarded by profile type. Profiles 2 and 6 were aggregated, and Profile 3 was discarded by filter. For all of the 3 exported samples, t = blockTime.
Given the distributed architecture and concurrent nature of compactors, and the chosen timestamp for samples, time collisions may happen. For that reason, an extra
__pyroscope_instance__
label has been added, so that two compaction jobs may write to prometheus without causing overwrites. This intance id is computed from a worker id and a shard id.Next steps:
Out of scope right now: