Use the registry for (sub-)metric validation and move data crunching out of the `Engine` #2426

na-- · 2022-03-09T13:52:19Z

This PR is the first part of #1889. It aims to prepare the Engine for complete annihilation 😅 I want to get completely rid of it and use its constituent parts separately instead, basically to almost completely decouple the test execution from the metrics handling.

However, the Engine has tons of tests that would have to be re-written. So this PR is an attempt to softly split the Engine by moving things around and defining and implementing its replacement components, while still keeping them under the same Engine facade. This way we can have a reasonably high assurance we haven't broken anything, since all of the old tests still work with minimal adjustment. Hopefully, this will allow us to merge things gradually, not have to do it all at once.

Given the extensive refactoring here, there were a few opportunities to fix some bugs and side-issues along the way:

commit 'Refactor the Engine to actually use the metrics registry` (adc4d9f) implements a lot of the requirements of Parse and validate thresholds before starting the execution #2330 when it comes to validating that thresholds are defined on valid metrics and sub-metrics
commit Fix the bug of thresholds not working for unused metrics (e5d8c32) fixes Metrics for which no data was recorded are not displayed or evaluated for Thresholds #1346
commit Fix submetric matching bug when nonexistent keys are specified (a3138b1) fixes Custom metric threshold calculation using wrong statistics #2390
commit Move the Engine data crunching logic in a new component under metrics (2476cfc) goes a long way towards closing Remove the Engine and split metrics/VUs handling #1889, since now the MetricsEngine gets its metrics like just another output, almost no special handling when it comes to it 🎉

Overall, I am not sure if it will be easier to review this commit by commit or all at once. The commits are logically separated, but a lot of the same code is changed in multiple commits because of that 🤷‍♂️

na-- · 2022-03-09T14:32:10Z

The xk6 test is failing because of this: #2428
I'd prefer someone else to fix it in another PR and then I'll rebase

mstoykov · 2022-03-18T12:58:04Z

core/engine.go

+	for metricName, thresholds := range e.options.Thresholds {
+		metric, err := e.getOrInitPotentialSubmetric(metricName)
+
+		if e.runtimeOptions.NoThresholds.Bool {


Shouldn't this skip the whole parsing either way - irregardless of error?

Hmm I don't think so, given that thresholds are basically the only way to currently create sub-metrics for the end-of-test summary, and we can have --no-thresholds without also having --no-summary. So, if --no-thresholds are enabled, I'm following the previously established logic that we shouldn't fail the test for wrongly configured thresholds, but I'm still trying to parse them for sub-metrics, just emit any errors as warnings.

mstoykov · 2022-03-18T13:03:55Z

core/engine.go

 				e.Metrics[m.Name] = m
+				m.Observed = true


I really dislike this new property on the Metric type. I would really prefer if we go back to the map.

I do think Metric already has too many responsibilities and now adding it knowing it was "observed" seems contra productive.
we even still add it to the map so ... 🤷 Why not just get it out of it.

I really dislike it as well, I want to completely split apart the current Metric into at least two parts:

The first part, which should be the actual Metric, will be just the Name, Type and Contains- that should be the only thing that theRegistry` actually controls and cares about

A second completely separate struct should be the Sink and Observed and probably the threshold and submetric things. That will only be a concern of the MetricsRegistry. I don't think they should even be exported... In any case, the mapping from a *Metric to that second struct will happen only in the MetricsRegistry.

Unfortunately, until we have that separation, I kind of had to put Observed in the Metric struct, since that's where the Sink already was 😞 Doing otherwise without even more refactoring would cause us to have a map lookup for every sample of every metric that k6 generates... 😞 I intended on splitting this when I work on the time series refactoring.

btw this is what I mean by this TODO:

k6/stats/stats.go

Lines 449 to 450 in f06d6f0

// TODO: decouple the metrics from the sinks and thresholds... have them

// linked, but not in the same struct?

e a map lookup for every sample of every metric that k6 generates..

Didn't we have the same up to this point as well?

Yes, we did, but we don't actually need to have that, so I guess you can consider it an optimization? 😅 :

I probably dislike the extra item in Metric even more than you, but with Sink already in there, this actually seems a better (temporary and transient) architecture to me, since all of this state is now kept in Metric and will be easy to refactor into a different struct all at the same time. Previously, the Observed state was implicitly tracked by that map in the Engine, while all of the other state was in every Metric...

If you want, I am fine with implementing my TODO above now, before we merge this PR, though I'd prefer to do it in a separate PR after this #2442 PR from @oleiade is also merged. It will leave the metrics.Registry as the only source of truth for metrics, so we'll be easily able to assign them sequential IDs and use that in the MetricsEngine without the need for a map.

mstoykov · 2022-03-21T12:45:33Z

@na-- the panic you posted is due to the test parallalization not because of the code.

olegbespalov

Ugh, it's a big and good one 👍

In general looks good, I left a few comments.

olegbespalov · 2022-03-21T15:03:18Z

metrics/engine/engine.go

+		}
+
+		if err != nil {
+			return fmt.Errorf("invalid metric '%s' in threshold definitions: %w", metricName, err)


Maybe not for this PR, but what do you think about first trying to collect all invalid metrics and return an error that contains all wrong variations?

metrics/engine/engine.go

codebien

Some minor things

codebien · 2022-03-21T18:24:40Z

metrics/package.go

+// TODO: maybe even move the outputs to a sub-folder here? it may be worth it to
+// do a new Output v2 implementation that uses channels and is more usable and
+// easier to write? this way the old extensions can still work for a while, with
+// an adapter and a deprecation notice


I think we should have this comment in the issue instead to have it here.

I kind of already also have it the issue though, see the end of #2430 (comment)

core/engine.go

core/local/local.go

codebien · 2022-03-21T19:05:28Z

core/engine.go

+		e.ingester = me.GetIngester()
+		outputs = append(outputs, e.ingester)


Is it not possible to have this in the same place where we set the other outputs?

It may be, but I think it will actually be worse to do it that way.

he other outputs receive a whole bunch of things this one doesn't need, while this one needs the MetricsEngine, something we create a few lines above here (or in cmd/run.go in the follow up PR without an Engine). Besides, we add this output based on completely different criteria only some of the time, based on RuntimeOptions values...

cmd/run.go

oleiade

LGTM 🎉

A bunch of minor, non-blocking comments on my side. I do agree with the brought up argument about metrics.Observed; I dislike it too, but I understand the arguments in (temporary favor of it, and can live with it for now.

core/engine.go

oleiade · 2022-03-24T16:28:56Z

metrics/engine/engine.go

+
+func (me *MetricsEngine) getOrInitPotentialSubmetric(name string) (*stats.Metric, error) {
+	// TODO: replace with strings.Cut after Go 1.18
+	nameParts := strings.SplitN(name, "{", 2)


I ended up doing something somewhat similar to extract metric/submetric name for thresholds validation. I ended up using strings.FieldFunc, which, I believe, could lead to somewhat clearer code here?

delimiterFn := func(c rune) bool { return c == '{' || c == '}' || c == ':' } fmt.Println("Fields are: %q", strings.FieldsFunc("metric_name{foo:bar}", f))

The resulting length is either 1 (metric), or an odd value >=3. No need to match the closing } for instance. You could also drop the matching of : in your case, I believe.

Hope that's helpful 🙇🏻

Wouldn't that also accept metric_name}foo{bar: as valid? 😕

That's a fair point. I need to look into it. I'll add a test for it in #2463. Commit e4e51f0 introduces a ParseMetricName function.

This is a prerequisite for solving other issues like always evaluating thresholds correctly, and as a side-benefit, it also allows us to validate them in the init context, before the test has started.

This allows us to slowly deconstruct and split apart the Engine. It also clears the way for us to have test suites, where every test has a separate pool of VUs and its own ExecutionScheduler.

oleiade

🍾

na-- added this to the v0.38.0 milestone Mar 9, 2022

na-- requested review from mstoykov, oleiade, codebien and olegbespalov March 9, 2022 13:52

This was linked to issues Mar 9, 2022

Custom metric threshold calculation using wrong statistics #2390

Closed

Metrics for which no data was recorded are not displayed or evaluated for Thresholds #1346

Closed

This was referenced Mar 10, 2022

Refactor the Output interface #2430

Open

Move the stats package content to metrics package #2433

Merged

This comment was marked as outdated.

Sign in to view

oleiade mentioned this pull request Mar 14, 2022

Remove stats.New #2442

Merged

sniku assigned na-- Mar 16, 2022

na-- force-pushed the cleanup-6 branch from f82b364 to ec5a5e7 Compare March 16, 2022 16:30

na-- force-pushed the cleanup-7 branch from 2476cfc to fb9437e Compare March 16, 2022 16:35

Base automatically changed from cleanup-6 to master March 18, 2022 11:23

na-- force-pushed the cleanup-7 branch from fb9437e to 2cfd7ef Compare March 18, 2022 11:26

mstoykov reviewed Mar 18, 2022

View reviewed changes

na-- force-pushed the cleanup-7 branch from 591a0fe to f06d6f0 Compare March 18, 2022 13:18

olegbespalov reviewed Mar 21, 2022

View reviewed changes

codebien reviewed Mar 21, 2022

View reviewed changes

oleiade previously approved these changes Mar 22, 2022

View reviewed changes

core/engine.go Outdated Show resolved Hide resolved

core/engine.go Outdated Show resolved Hide resolved

core/engine.go Outdated Show resolved Hide resolved

core/engine.go Outdated Show resolved Hide resolved

oleiade reviewed Mar 24, 2022

View reviewed changes

oleiade mentioned this pull request Mar 25, 2022

Validate thresholds at init #2463

Merged

na-- added 2 commits March 29, 2022 10:19

Move Output management out of the Engine

4f4e384

Move the metrics package out of lib/

e2441d7

na-- dismissed oleiade’s stale review via 408e37c March 29, 2022 07:29

na-- force-pushed the cleanup-7 branch from f06d6f0 to 408e37c Compare March 29, 2022 07:29

na-- added 7 commits March 29, 2022 11:26

Refactor the Engine to actually use the metrics registry

9b3968e

This is a prerequisite for solving other issues like always evaluating thresholds correctly, and as a side-benefit, it also allows us to validate them in the init context, before the test has started.

Fix the bug of thresholds not working for unused metrics

f44ecad

Fix submetric matching bug when nonexistent keys are specified

2115bcb

Pass BuiltinMetrics via ExecState, emit vus and vus_max by ExecScheduler

ed65ad1

This allows us to slowly deconstruct and split apart the Engine. It also clears the way for us to have test suites, where every test has a separate pool of VUs and its own ExecutionScheduler.

Add an integration test for custom metrics and thresholds

52262f0

Move the Engine data crunching logic in a new component under metrics/

ce2f4b2

Rename signal handlers to be clearer

9d58579

na-- force-pushed the cleanup-7 branch from 408e37c to d7bc04b Compare March 29, 2022 08:27

na-- requested review from olegbespalov, codebien, oleiade and mstoykov March 29, 2022 08:29

na-- added 2 commits March 29, 2022 12:13

Make code more readable based on PR reviews

25144a6

Make the REST API tests less flaky

8276246

na-- force-pushed the cleanup-7 branch from 58fca82 to 8276246 Compare March 29, 2022 09:13

oleiade approved these changes Mar 29, 2022

View reviewed changes

mstoykov approved these changes Mar 29, 2022

View reviewed changes

na-- merged commit 7694e8d into master Mar 29, 2022

na-- deleted the cleanup-7 branch March 29, 2022 13:26

na-- mentioned this pull request Apr 26, 2022

Remove the Engine and split metrics/VUs handling #1889

Closed

na-- mentioned this pull request Jun 9, 2022

OpenTelemetry metrics output #2557

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the registry for (sub-)metric validation and move data crunching out of the `Engine` #2426

Use the registry for (sub-)metric validation and move data crunching out of the `Engine` #2426

na-- commented Mar 9, 2022

na-- commented Mar 9, 2022

This comment was marked as outdated.

mstoykov Mar 18, 2022 •

edited

Loading

na-- Mar 18, 2022

mstoykov Mar 18, 2022

na-- Mar 18, 2022

na-- Mar 18, 2022

mstoykov Mar 18, 2022

na-- Mar 18, 2022

mstoykov commented Mar 21, 2022 •

edited

Loading

olegbespalov left a comment

olegbespalov Mar 21, 2022

codebien left a comment

codebien Mar 21, 2022

na-- Mar 22, 2022

codebien Mar 21, 2022

na-- Mar 22, 2022

oleiade left a comment

oleiade Mar 24, 2022

na-- Mar 29, 2022

oleiade Mar 29, 2022

oleiade left a comment

	// TODO: decouple the metrics from the sinks and thresholds... have them
	// linked, but not in the same struct?

		e.ingester = me.GetIngester()
		outputs = append(outputs, e.ingester)

Use the registry for (sub-)metric validation and move data crunching out of the Engine #2426

Use the registry for (sub-)metric validation and move data crunching out of the Engine #2426

Conversation

na-- commented Mar 9, 2022

na-- commented Mar 9, 2022

This comment was marked as outdated.

mstoykov Mar 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mstoykov commented Mar 21, 2022 • edited Loading

olegbespalov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codebien left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oleiade left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oleiade left a comment

Choose a reason for hiding this comment

Use the registry for (sub-)metric validation and move data crunching out of the `Engine` #2426

Use the registry for (sub-)metric validation and move data crunching out of the `Engine` #2426

mstoykov Mar 18, 2022 •

edited

Loading

mstoykov commented Mar 21, 2022 •

edited

Loading