-
Notifications
You must be signed in to change notification settings - Fork 18
Fix the property metrics for prometheus #176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit that I'm not 100% certain that I follow what is being fixed when changing from
cf.channels.count{propertyName=propertyValue}
to
cf.channels.count.propertyName{propertyName=propertyValue}
Is the point that if you have two propertynames (propA and propB) then you only got one of the two, i.e. you would only see
cf.channels.count{propA=propertyValue}
instead of
cf.channels.count{propA=propertyValue}
cf.channels.count{propB=propertyOtherValue}
But does this still allow you to see
cf.channels.count.propertyName{propertyName=propertyValue1}
cf.channels.count.propertyName{propertyName=propertyValue2}
in the output?
if (metricProperties == null || metricProperties.isEmpty()) { | ||
return new LinkedMultiValueMap<>(); | ||
} | ||
return Arrays.stream(metricProperties.split(";")).map(s -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to give an explicit example of using this syntax; the application.properties file only has a single metricProperty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some docs.
Don't love this from the description. The suffix _count should only be there for counters, so if this really is a gauge then it should not be present. The current metric we have been using appears to be a counter (but maybe this is just since CF never forgets channels), as we can do (somewhat pseudo)
If we start to do things like cf_channel_count_whatever, then we're breaking the convention (suffix with _count for counters), and cf_channel_count_tag(tag) also strikes me as odd. IMO I think we instead should go for a dumber implementation for counting properties and tags, and especially for counting channels with a specific property or tag. The biggest reason for this, in my opinion, is because cf_channel_count without (prometheus) tags always should be the total count of channels, which voids the use of props and prometheus tags outside of pvStatus. So I would prefer:
|
Yes exactly. The alternative solution is to put every property in every cf.channels.count metric, i.e.
and then to get counts for only propA you could have an ignore value or something like
But this seems a bit overkill, as you probably don't care about every combination. |
This actually sounds almost correct to me, if we are to continue with this level of flexibility and granularity. At least that way the aggregate count (without specifying props) is correct. 🤷 But rather than "IGNORED" this should be just something like |
6a82778
to
f420efa
Compare
src/site/sphinx/config.rst
Outdated
You can also set the metrics.tags to add counts of number of channels per tag. These are exposed as | ||
`cf_channel_count{tag=tagName}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since channels can have any number of tags, doesn't this break the sum channel count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced the name to try and make it more clear what it means. And correspondingly updated the documentation.
Don't we need special handling for The current design is very useful, where we get
Because then we know we can get the total channel count from
but we can also use a gauge on it to see internally the split between active and inactive. |
You will get that with cf_pvStatus_channel_count. In fact I'll add a test case for this. |
I see your line of thinking, but just as a FYI this will break all displays we have right now (since there is one query up until deployment of this version, and a different one after deployment), and we will might never make use of the non-property metric for the channel count. |
For example metrics.properties=pvStatus:Active, Inactive; archive: default, b, !*; archiver: beam, acc, !* Result cf_channel_count{archive="-",archiver="beam",pvStatus="Inactive",} 0.0 cf_channel_count{archive="default",archiver="beam",pvStatus="Inactive",} 0.0 cf_channel_count{archive="b",archiver="acc",pvStatus="Active",} 0.0 cf_channel_count{archive="b",archiver="acc",pvStatus="Inactive",} 0.0 cf_channel_count{archive="-",archiver="beam",pvStatus="Active",} 0.0 cf_channel_count{archive="b",archiver="-",pvStatus="Inactive",} 0.0 cf_channel_count{archive="b",archiver="-",pvStatus="Active",} 0.0 cf_channel_count{archive="-",archiver="-",pvStatus="Active",} 2.0 cf_channel_count{archive="b",archiver="beam",pvStatus="Active",} 0.0 cf_channel_count{archive="default",archiver="acc",pvStatus="Inactive",} 0.0 cf_channel_count{archive="-",archiver="acc",pvStatus="Active",} 0.0 cf_channel_count{archive="b",archiver="beam",pvStatus="Inactive",} 0.0 cf_channel_count{archive="default",archiver="-",pvStatus="Inactive",} 0.0 cf_channel_count{archive="default",archiver="-",pvStatus="Active",} 0.0 cf_channel_count{archive="default",archiver="beam",pvStatus="Active",} 0.0 cf_channel_count{archive="-",archiver="acc",pvStatus="Inactive",} 0.0 cf_channel_count{archive="-",archiver="-",pvStatus="Inactive",} 1.0 cf_channel_count{archive="default",archiver="acc",pvStatus="Active",} 0.0
|
"a", List.of("b", "c"), | ||
"d", List.of("e", "!*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love how you name all of your test entitites 🙃
Problem was that on the prometheus list wasn't getting a breakdown of properties more than the first defined one.
Discovered this is due to for each metric (Gauge in micrometer language) every tag has to have a value. So I changed it from
cf.channel.count{propertyName=propertyValue}
to
cf.channel.count{prop0=propValueA, prop1=propValueB}
and all the many combinations.
Note:
From the propertyRepository api, I don't think there is a way to get all of the property values for a property. I think this would be useful, but maybe a big query with the current database design. Can think about this maybe?