CPU Usage metric off by 50% when using containerd CRI #1885

lyveng · 2022-09-26T07:03:04Z

lyveng
Sep 26, 2022

Hi. Recently we noticed that the cpu utilisation was above(~27) the requests/limits(18) for containers under heavy load when we used the irate function(default in kube-prometheus 0.10). Upon debugging we found that this was because prometheus was configured to ignore the timestamps in the kubelet cadvisor metrics when they are scraped(coderef). This change was introduced as part of this #695. In #695 , it is mentioned that this change was introduced because stale data was seen as per a user in IRC when the container is OOM killed. But after this change, the cpu utilisation seems to be off by quite a margin(almost 50% in our observations) because p8s uses the prometheus scrape interval instead of the actual timestamp in kubelet for calculating rate. We’ve debugged and documented this issue in detail in this doc.

This was observed by
@ringtail
as well in this PR. The proposed fix was different though.

In openshift, they've have a dedicated ServiceMonitor for prometheus-adapter with honorTimestamps set to true - Ref - JIRA Ticket

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU Usage metric off by 50% when using containerd CRI #1885

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

CPU Usage metric off by 50% when using containerd CRI #1885

lyveng Sep 26, 2022

Replies: 0 comments

lyveng
Sep 26, 2022