-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom metric threshold calculation using wrong statistics #2390
Comments
To define a threshold on a custom metric, you have to use the custom metric. A custom metric is a standalone metric and independent from the You actually want to define your threshold as: export const options = {
thresholds: {
'trend_homepage': [
{ threshold: 'p(99.99) < 100', abortOnFail: false }
],
'trend_pi': [
{ threshold: 'p(99.99) < 24', abortOnFail: false }
],
},
} |
Also note that the tags you apply to your custom metric are rather redundant, they always contain the same value. const piTrend = new Trend('trend_pi', true);
function piRequest(data, metric) {
const name = 'tag_pi';
const params = { tags: { name } };
const res = http.request('GET', 'https://test.k6.io/pi.php?decimals=2', {}, params);
metric.add(res.timings.duration, { requestName: name, metric: 'timings.duration' })
} Filtering on the tags will output identical metric values for all combinations: options = {
thresholds: {
'trend_pi': [ 'p(99.99) < 100' ],
'trend_pi{requestName:tag_pi}': [ 'p(99.99) < 100' ],
'trend_pi{metric:timings.duration}': [ 'p(99.99) < 100' ],
'trend_pi(requestName:tag_pi,metric:timings.duration}': [ 'p(99.99) < 100' ],
}
} |
Hmm @efdknittlfrank, while you are correct, I think @timterrill found a k6 bug here.
Yes, this looks like a problem, though I created a simpler example, since the current one has a lot of (probably unnecessary) complexity. Consider this much simpler test run: import http from 'k6/http';
export const options = {
thresholds: {
'http_reqs': ['count == 2'],
'http_reqs{key}': ['count == 1'],
'http_reqs{key:value}': ['count == 1'],
'http_reqs{nonexistent_key}': ['count == 0'],
}
}
export default function (data) {
http.get('https://httpbin.test.k6.io/anything?r=1', { tags: {} })
http.get('https://httpbin.test.k6.io/anything?r=2', { tags: { 'key': 'value' } })
} It will result in the following summary (cropped to only the relevant parts):
My suspicion is that since metric tags are meant to be We are currently in the midst of refactoring how the thresholds are evaluated (#2356) by @oleiade, so this might already be fixed, or easier to fix after that PR. In any case, it seems like a valid bug, thank you for reporting it! 🙇 |
@na-- this likely should (and already is in that PR) a "syntax error" |
@mstoykov, maybe, partially though that would be kind of a breaking change and deserves its own separate evaluation... But again, the problem persists even if I use the proper syntax for thresholds that want to check if a metric tag exists but has an empty value... These should be valid import http from 'k6/http';
export const options = {
thresholds: {
'http_reqs': ['count == 2'],
'http_reqs{key:""}': ['count == 1'],
'http_reqs{key:value}': ['count == 1'],
'http_reqs{nonexistent_key:""}': ['count == 0'],
}
}
export default function (data) {
http.get('https://httpbin.test.k6.io/anything?r=1', { tags: {} })
http.get('https://httpbin.test.k6.io/anything?r=2', { tags: { 'key': 'value' } })
} And yet I still get the same wrong results:
FWIW, this is the part where threshold definitions like Lines 506 to 515 in c4b88f5
And IMO that part is somewhat fine... If we ignore the fact the UX is slightly confusing, I am not sure it should be treated as a syntax error 🤷♂️ The cloud treats threshold definitions like The bigger problem on the k6 OSS side is with the data crunching, when determining whether a certain metric sample is part of a specific threshold sub-metric. That one is apparently completely broken, since it somehow thinks there are 2 HTTP requests with a |
Thanks @efdknittlfrank and @na-- for the immediate action on this! FWIW, I was just testing out the two techniques for setting thresholds and was not going to use both in a given test, as I knew it was redundant but I wondered which way I'd like best. Thanks for pointing out that I was using the wrong syntax for the trend_XXX custom metric thresholds, and I have fixed and tested that on my end. I'm glad though that it did identify a bug ;) my $0.02 on a fix might be to ignore empty tag values as I can't imagine how that's useful? Likely if that happens someone made a mistake (like I did), so possibly even generate an error message "Can't set threshold on an undefined tag-value"? |
Brief summary
When experimenting with the differences between custom metric thresholds and thresholds on tagged requests, I discovered that the custom metric thresholds were not using the custom metric data, they were using the overall statistics from the entire test (http_req_duration).
This caused thresholds for the custom metric to fail when the tagged requests passed. In my test example I have 2 requests, each of which has a unique custom metric and a unique tag:
The metrics for the tag and custom metric should be the same, therefore the thresholds for each request should both pass or fail.
k6 version
k6 v0.34.1 ((devel), go1.17, darwin/amd64)
OS
macOS 12.1
Docker version and image (if applicable)
No response
Steps to reproduce the problem
To reproduce this issue,
$> k6 run k6-threshold-bug.js
NOTE: Depending on the actual response times of the 2 endpoints, the threshold may not actually fail, but repeated runs should reproduce it.
Expected behaviour
The output I expect from this run is as follows (I modified the 2 rows for
{ trend_XXXX }
to be what I expected):...
...
Notice how
trend_pi
, the threshold rows fortrend_pi
andname:tag_pi
all had 3 requests and the same metrics as the custom metric did. Thehttp_req_duration
row had 217 requests (which represents all requests for the test run).Also the
name:tag_pi
and custom metrictrend_pi
thresholds BOTH passed (as expected).Actual behaviour
This is the actual output of my run:
And the part of the summary output to compare against the above section:
ISSUE: The threshold for { trend_pi } failed and both { trend_XXX} rows had the wrong data.
...
...
Notice how
trend_pi
had 3 requests, and the threshold row forname:tag_pi
had 3 requests and the same metrics as the custom metric did, but the threshold calculation row fortrend_pi
had 217 requests and the same metrics as thehttp_req_duration
has for the entire test run (THIS IS THE ISSUE).Also the
name:tag_pi
threshold passed, where the custom metrictrend_pi
did not pass. That should be impossible because the simple code made a single http request that was tagged withname:tag_pi
, and the custom metrictrend_pi
just tracked thehttp_req_duration
from the request.The text was updated successfully, but these errors were encountered: