Skip to content

Fix the property metrics for prometheus #176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 2, 2025

Conversation

jacomago
Copy link
Contributor

@jacomago jacomago commented Mar 24, 2025

Problem was that on the prometheus list wasn't getting a breakdown of properties more than the first defined one.
Discovered this is due to for each metric (Gauge in micrometer language) every tag has to have a value. So I changed it from

cf.channel.count{propertyName=propertyValue}

to

cf.channel.count{prop0=propValueA, prop1=propValueB}

and all the many combinations.

Note:
From the propertyRepository api, I don't think there is a way to get all of the property values for a property. I think this would be useful, but maybe a big query with the current database design. Can think about this maybe?

Copy link

@simon-ess simon-ess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that I'm not 100% certain that I follow what is being fixed when changing from

cf.channels.count{propertyName=propertyValue}

to

cf.channels.count.propertyName{propertyName=propertyValue}

Is the point that if you have two propertynames (propA and propB) then you only got one of the two, i.e. you would only see

cf.channels.count{propA=propertyValue}

instead of

cf.channels.count{propA=propertyValue}
cf.channels.count{propB=propertyOtherValue}

But does this still allow you to see

cf.channels.count.propertyName{propertyName=propertyValue1}
cf.channels.count.propertyName{propertyName=propertyValue2}

in the output?

if (metricProperties == null || metricProperties.isEmpty()) {
return new LinkedMultiValueMap<>();
}
return Arrays.stream(metricProperties.split(";")).map(s ->

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to give an explicit example of using this syntax; the application.properties file only has a single metricProperty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some docs.

@anderslindho
Copy link
Contributor

Don't love this from the description. The suffix _count should only be there for counters, so if this really is a gauge then it should not be present. The current metric we have been using appears to be a counter (but maybe this is just since CF never forgets channels), as we can do (somewhat pseudo)

  • cf_channel_count(active)
  • cf_channel_count(inactive)
    but also do
  • cf_channel_count
    which then is the sum of the above two metrics

If we start to do things like cf_channel_count_whatever, then we're breaking the convention (suffix with _count for counters), and cf_channel_count_tag(tag) also strikes me as odd.

IMO I think we instead should go for a dumber implementation for counting properties and tags, and especially for counting channels with a specific property or tag. The biggest reason for this, in my opinion, is because cf_channel_count without (prometheus) tags always should be the total count of channels, which voids the use of props and prometheus tags outside of pvStatus.

So I would prefer:

  • cf_channel_count(pvStatus)
    and then for any other property we want to track, we do something like
  • cf_alias_count
  • cf_aa_policy_count

@jacomago
Copy link
Contributor Author

I have to admit that I'm not 100% certain that I follow what is being fixed when changing from

cf.channels.count{propertyName=propertyValue}

to

cf.channels.count.propertyName{propertyName=propertyValue}

Is the point that if you have two propertynames (propA and propB) then you only got one of the two, i.e. you would only see

cf.channels.count{propA=propertyValue}

instead of

cf.channels.count{propA=propertyValue}
cf.channels.count{propB=propertyOtherValue}

But does this still allow you to see

cf.channels.count.propertyName{propertyName=propertyValue1}
cf.channels.count.propertyName{propertyName=propertyValue2}

in the output?

Yes exactly. The alternative solution is to put every property in every cf.channels.count metric, i.e.

cf.channels.count{propA=a, propB=b}

and then to get counts for only propA you could have an ignore value or something like

cf.channels.count{propA=a, propB=IGNORED}

But this seems a bit overkill, as you probably don't care about every combination.

@anderslindho
Copy link
Contributor

anderslindho commented Mar 25, 2025

I have to admit that I'm not 100% certain that I follow what is being fixed when changing from

cf.channels.count{propertyName=propertyValue}

to

cf.channels.count.propertyName{propertyName=propertyValue}

Is the point that if you have two propertynames (propA and propB) then you only got one of the two, i.e. you would only see

cf.channels.count{propA=propertyValue}

instead of

cf.channels.count{propA=propertyValue}
cf.channels.count{propB=propertyOtherValue}

But does this still allow you to see

cf.channels.count.propertyName{propertyName=propertyValue1}
cf.channels.count.propertyName{propertyName=propertyValue2}

in the output?

Yes exactly. The alternative solution is to put every property in every cf.channels.count metric, i.e.

cf.channels.count{propA=a, propB=b}

and then to get counts for only propA you could have an ignore value or something like

cf.channels.count{propA=a, propB=IGNORED}

But this seems a bit overkill, as you probably don't care about every combination.

This actually sounds almost correct to me, if we are to continue with this level of flexibility and granularity. At least that way the aggregate count (without specifying props) is correct. 🤷

But rather than "IGNORED" this should be just something like cf_channel_count{propA=a, propB~=*} (which really is equivalent to cf_channel_count{propA=a}).

@jacomago jacomago requested a review from simon-ess March 27, 2025 15:42
Comment on lines 177 to 178
You can also set the metrics.tags to add counts of number of channels per tag. These are exposed as
`cf_channel_count{tag=tagName}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since channels can have any number of tags, doesn't this break the sum channel count?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the name to try and make it more clear what it means. And correspondingly updated the documentation.

@anderslindho
Copy link
Contributor

Don't we need special handling for pvStatus?

The current design is very useful, where we get

cf_channel_count{pvStatus="Active"} 123.0
cf_channel_count{pvStatus="Inactive"} 3.0

Because then we know we can get the total channel count from

cf_channel_count # 126.0

but we can also use a gauge on it to see internally the split between active and inactive.

@jacomago
Copy link
Contributor Author

Don't we need special handling for pvStatus?

The current design is very useful, where we get

cf_channel_count{pvStatus="Active"} 123.0
cf_channel_count{pvStatus="Inactive"} 3.0

Because then we know we can get the total channel count from

cf_channel_count # 126.0

but we can also use a gauge on it to see internally the split between active and inactive.

You will get that with cf_pvStatus_channel_count. In fact I'll add a test case for this.

@anderslindho
Copy link
Contributor

Don't we need special handling for pvStatus?
The current design is very useful, where we get

cf_channel_count{pvStatus="Active"} 123.0
cf_channel_count{pvStatus="Inactive"} 3.0

Because then we know we can get the total channel count from

cf_channel_count # 126.0

but we can also use a gauge on it to see internally the split between active and inactive.

You will get that with cf_pvStatus_channel_count. In fact I'll add a test case for this.

I see your line of thinking, but just as a FYI this will break all displays we have right now (since there is one query up until deployment of this version, and a different one after deployment), and we will might never make use of the non-property metric for the channel count.

jacomago added 2 commits April 2, 2025 14:48
For example

metrics.properties=pvStatus:Active, Inactive; archive: default, b, !*; archiver: beam, acc, !*

Result

cf_channel_count{archive="-",archiver="beam",pvStatus="Inactive",} 0.0
cf_channel_count{archive="default",archiver="beam",pvStatus="Inactive",} 0.0
cf_channel_count{archive="b",archiver="acc",pvStatus="Active",} 0.0
cf_channel_count{archive="b",archiver="acc",pvStatus="Inactive",} 0.0
cf_channel_count{archive="-",archiver="beam",pvStatus="Active",} 0.0
cf_channel_count{archive="b",archiver="-",pvStatus="Inactive",} 0.0
cf_channel_count{archive="b",archiver="-",pvStatus="Active",} 0.0
cf_channel_count{archive="-",archiver="-",pvStatus="Active",} 2.0
cf_channel_count{archive="b",archiver="beam",pvStatus="Active",} 0.0
cf_channel_count{archive="default",archiver="acc",pvStatus="Inactive",} 0.0
cf_channel_count{archive="-",archiver="acc",pvStatus="Active",} 0.0
cf_channel_count{archive="b",archiver="beam",pvStatus="Inactive",} 0.0
cf_channel_count{archive="default",archiver="-",pvStatus="Inactive",} 0.0
cf_channel_count{archive="default",archiver="-",pvStatus="Active",} 0.0
cf_channel_count{archive="default",archiver="beam",pvStatus="Active",} 0.0
cf_channel_count{archive="-",archiver="acc",pvStatus="Inactive",} 0.0
cf_channel_count{archive="-",archiver="-",pvStatus="Inactive",} 1.0
cf_channel_count{archive="default",archiver="acc",pvStatus="Active",} 0.0
Copy link

sonarqubecloud bot commented Apr 2, 2025

Comment on lines +19 to +20
"a", List.of("b", "c"),
"d", List.of("e", "!*")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love how you name all of your test entitites 🙃

@jacomago jacomago merged commit c5d16e7 into ChannelFinder:master Apr 2, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants