feat: record metrics from rules and export to remote #3861

alsoba13 · 2025-01-21T14:52:54Z

In this PR, we introduce a first version of the metrics recorder and metrics exporter.

Every level 1 compaction job will record metrics from profiles in the form of time series. The recording will follow some recording rules given by config or an external service (for now, this is hardcoded to a single recording rule). The recorded metrics are exported to a remote after the compaction process.

Generated metrics are aggregations of total values of some kind of dimension (or profile type). The aggregation process is explained below:

Given a recording rule with a profile type T, a filter F (made of a set of key-value) and a set of labels E to export.
Every profile seen during the compaction that matches T and F will be considered for the aggregation.
To aggregate, profiles are grouped by E, resulting in multiple time series.
Every time serie will have a single sample with time = blockTime and value equal to the sum of all totalValues that match (T, F, E).
Hence, as we are adding up all totalValues that fulfill the conditions, we are conceptually aggregating by time (we discard the original profile timestamp and use the block time), resulting into a single sample per series per compaction job.

Example:

Let's consider the following profiles present in some blocks being compacted

profile	profile type	labels	totalValue	stacktraces (ignored)	timestamp (ignored)
1	memory alloc_space	`{service_name="worker", job="batch_compress", region="eu"}`	10	...	...
2	cpu samples	`{service_name="worker", job="batch_compress", region="eu"}`	20	...	...
3	cpu samples	`{service_name="API", region="eu"}`	1	...	...
4	cpu samples	`{service_name="worker", job="batch_compress", region="ap"}`	30	...	...
5	cpu samples	`{service_name="worker", job="batch_compress", region="us"}`	40	...	...
6	cpu samples	`{service_name="worker", job="batch_compress", region="eu"}`	100	...	...

And the following recording rule:
Name = "cpu_usage_compress_workers"
T = cpu samples
F = {service_name="worker", job="batch_compress"}
E = "region"

This will result in the following exported series and samples.
{__name__="cpu_usage_compress_workers", service_name="worker", job="batch_compress", region="eu"} = (t, 120)
{__name__="cpu_usage_compress_workers", service_name="worker", job="batch_compress", region="ap"} = (t, 30)
{__name__="cpu_usage_compress_workers", service_name="worker", job="batch_compress", region="us"} = (t, 40)

Note that Profile 1 was discarded by profile type. Profiles 2 and 6 were aggregated, and Profile 3 was discarded by filter. For all of the 3 exported samples, t = blockTime.

Given the distributed architecture and concurrent nature of compactors, and the chosen timestamp for samples, time collisions may happen. For that reason, an extra __pyroscope_instance__ label has been added, so that two compaction jobs may write to prometheus without causing overwrites. This intance id is computed from a worker id and a shard id.

Next steps:

Get the export config programmatically so every metric is exported to the expected datasource (tenant-wise)
Read rules from external service (tenant-settings?) and config.
Error handling: lack of error handling is evident. There's a lot of room here for improvement but we should strive to not interfere with compaction and consider retries vs metrics loss.

Out of scope right now:

functions/stacktraces processing

kolesnikovae

Good work, Alberto! 🚀

I'd like to discuss the queries we aim to answer. Have you analyzed how the exported metrics will be used? Just some example use cases

kolesnikovae · 2025-01-22T02:32:14Z

pkg/experiment/block/compaction.go

+func pyroscopeInstanceHash(shard uint32, id uuid.UUID) string {
+	buf := make([]byte, 0, 40)
+	buf = append(buf, byte(shard>>24), byte(shard>>16), byte(shard>>8), byte(shard))
+	buf = append(buf, id.String()...)
+	return fmt.Sprintf("%x", xxhash.Sum64(buf))
+}


I'm not sure why we're using a UUID generated by compaction worker.

First of all, it is not helpful and will cause data duplication. Jobs might be retried multiple times: each attempt may result in exported samples that will have its own __pyroscope_instance__ label, that prevents deduplication in the metrics backend. Second, it will result in cardinality issues: there might be dozens and hundreds of compaction worker, each of them can handle any block (i.e., we get Rules x Shards x Workers series, where each rule may produce multiple series, based on the aggregation dimensions).

Note that compaction job source blocks always belong to the same shard but may be produced by a set of segment writers. This is a typical situation, when the shard ownership/affinity changes due to the topology change (node added or removed), or when the primary owner is not available, or when the placement rules for the dataset change.

It's possible that we have two segments with identical timestamps (given the millisecond precision of ULIDs). Whether we want to handle the issue in the very first version is probably the most important question, if we decide to use segment timestamps. I'd say, no, we don't have to. And if were to, we would need to ensure that the data sent from different segment origins is not mixed. The segment origin is determined by a combination of the Shard and CreatedBy metadata attributes and the timestamp of segment creation. We assume that within the Shard/CreatedBy timestamp collision is not possible (this is not guaranteed strictly speaking). Shard/CreatedBy cardinality is bound and is typically a 1:1 mapping. However, the worst case scenario is N*M – therefore we may want to get rid of it (e.g., by aggregating data in the backend with recording rules).

I see the following ways to solve/mitigate it:

Add Shard/CreatedBy as a series label (hash of it). We probably could be fine with just CreatedBy, but we need to make sure the timestamp collision is not possible in the segment writer: imagine a series is moved from one shard to another, hosted by the same segment-writer, and the timestamps of the segments that include this "transition" match. Such samples would be deduplicated in the time series (prometheus-like) backend.

Add an explicit metadata attribute to include the timestamp in nanosecond precision sufficient for our needs in practice. The timestamp is the real local time of the segment-writer produced the block.

Handle this in compaction planner: we could probably somehow "guess" the timestamp, provided that we have all the information needed there.

It may be tempting to implement p.2. However, before we go further, I'd like to see analysis of the access patterns – basically: what queries we expect: for example, aggregation functions supported. Do we want to support functions without associative property (e.g., mean/average)?

Fixed to use CreatedBy instead of worker id

This may be addressed at https://github.com/grafana/pyroscope-squad/issues/336

pkg/experiment/block/compaction.go

pkg/experiment/metrics/recorder.go

alsoba13 · 2025-01-24T13:20:32Z

pkg/experiment/block/compaction.go

 	w := NewBlockWriter(dst, b.path, tmpdir)
 	defer func() {
 		err = multierror.New(err, w.Close()).Err()
 	}()
 	// Datasets are compacted in a strict order.
 	for _, s := range b.datasets {
+		s.registerSampleObserver(observer)


At this point I think it's better to register observer in the dataset instead of passing it through this long call chain compact > mergeAndClose > merge > writeRow

simonswine · 2025-01-30T10:48:37Z

pkg/experiment/metrics/exporter.go

+		panic(err)
+	}
+
+	c, err := remote.NewWriteClient("exporter", &remote.ClientConfig{


Feedback from the Mimir team, we should set a custom user agent to identify this is the "pyroscope-metrics-exporter". See https://raintank-corp.slack.com/archives/C03NCLB4GG7/p1738233634244319

I am not too sure where? Can you point me to the piece of code

I re-read it. I thought we had to change the NewWriteClient name field. I'll do it with user agent instead. Thanks for pointing it out

…port

simonswine · 2025-02-07T11:25:39Z

pkg/experiment/metrics/observer.go

+
+func (o *MetricsExporterSampleObserver) Flush() error {
+	go func() {
+		NewExporter(o.tenant, o.recorder.Recordings).Send() // TODO log error


Before merging this should log some errors

simonswine · 2025-02-07T11:27:29Z

pkg/experiment/metrics/exporter.go

+		RetryOnRateLimit: false,
+	})
+	if err != nil {
+		panic(err)


Should return error instead

simonswine · 2025-02-07T11:28:01Z

pkg/experiment/metrics/exporter.go

+	// TODO
+	return Config{
+		url:      "omitted",
+		username: "omitted",
+		password: "omitted",
+	}


When this is hardcoded, how can this be used?

Eventually you could read from environment variables until this is figured out.

I will inject it by config/env vars in a following PR.

simonswine · 2025-02-07T11:28:14Z

pkg/experiment/metrics/exporter.go

+func newClient(cfg Config) remote.WriteClient {
+	wURL, err := url.Parse(cfg.url)
+	if err != nil {
+		panic(err)


Should return error instead

simonswine · 2025-02-07T11:29:03Z

pkg/experiment/metrics/exporter.go

+		return nil
+	}
+	if e.client == nil {
+		e.client = newClient(e.config)


I would create the client at NewExporter time as well and then handle problems as they arise

Good catch. This came from the old code before Observer pattern, where exporter was created at the beginning and I was trying to save resources by delaying client creation. I'll create the client at the NewExporter once we ensure e.data is not empty

simonswine · 2025-02-07T11:32:26Z

pkg/experiment/metrics/exporter.go

+		panic(err)
+	}
+
+	c, err := remote.NewWriteClient("exporter", &remote.ClientConfig{


I am not too sure where? Can you point me to the piece of code

simonswine · 2025-02-07T11:33:39Z

pkg/experiment/metrics/rules.go

+}
+
+func recordingRulesFromTenant(tenant string) []*RecordingRule {
+	// TODO


Even though you want to hardcode and test it, I would prefer those coming from e.g. an environment variable or a file (and parsed as yaml)so we can change them quicker than recompiling a new version.

This part of the code will be deleted at #3874 (although it hasn't been removed from the draft yet). On the other hand, supporting static recording rules was discarded. We may implement it later.

alsoba13 requested a review from a team as a code owner January 21, 2025 14:52

alsoba13 force-pushed the alsoba13/metrics-from-profiles-record-and-export branch from be2d95a to c90f289 Compare January 21, 2025 14:57

alsoba13 marked this pull request as draft January 21, 2025 22:02

kolesnikovae reviewed Jan 22, 2025

View reviewed changes

pkg/experiment/metrics/recorder.go Outdated Show resolved Hide resolved

pkg/experiment/metrics/recorder.go Outdated Show resolved Hide resolved

alsoba13 force-pushed the alsoba13/metrics-from-profiles-record-and-export branch 3 times, most recently from 7d1e59b to 1700a83 Compare January 22, 2025 09:42

alsoba13 marked this pull request as ready for review January 22, 2025 13:58

alsoba13 commented Jan 24, 2025

View reviewed changes

alsoba13 added 3 commits January 27, 2025 10:43

feat: record metrics from rules and export to remote

1dcc7d3

use CreatedBy instead of workerId

0737329

complete matcher and state recorder

babebf4

alsoba13 force-pushed the alsoba13/metrics-from-profiles-record-and-export branch from e1221a0 to ff45b50 Compare January 27, 2025 09:45

observer pattern

56bc260

alsoba13 force-pushed the alsoba13/metrics-from-profiles-record-and-export branch from ff45b50 to 56bc260 Compare January 27, 2025 10:43

bryanhuhta mentioned this pull request Jan 29, 2025

feat: Delete settings from tenant-settings #3871

Open

simonswine reviewed Jan 30, 2025

View reviewed changes

flush async, rename MetricsObserver to SampleObserver, agent name

d8db16c

alsoba13 force-pushed the alsoba13/metrics-from-profiles-record-and-export branch from 2bb97ad to d8db16c Compare February 7, 2025 09:06

Merge branch 'main' into alsoba13/metrics-from-profiles-record-and-ex…

5fe4bde

…port

simonswine reviewed Feb 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: record metrics from rules and export to remote #3861

feat: record metrics from rules and export to remote #3861

alsoba13 commented Jan 21, 2025

kolesnikovae left a comment

kolesnikovae Jan 22, 2025 •

edited

Loading

alsoba13 Jan 22, 2025

alsoba13 Feb 5, 2025

alsoba13 Jan 24, 2025

simonswine Jan 30, 2025

alsoba13 Feb 7, 2025

simonswine Feb 7, 2025

alsoba13 Feb 7, 2025

simonswine Feb 7, 2025

simonswine Feb 7, 2025

simonswine Feb 7, 2025

alsoba13 Feb 7, 2025

simonswine Feb 7, 2025

simonswine Feb 7, 2025

alsoba13 Feb 7, 2025

simonswine Feb 7, 2025

simonswine Feb 7, 2025

alsoba13 Feb 7, 2025

feat: record metrics from rules and export to remote #3861

Are you sure you want to change the base?

feat: record metrics from rules and export to remote #3861

Conversation

alsoba13 commented Jan 21, 2025

Example:

kolesnikovae left a comment

Choose a reason for hiding this comment

kolesnikovae Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kolesnikovae Jan 22, 2025 •

edited

Loading