You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi. Recently we noticed that the cpu utilisation was above(~27) the requests/limits(18) for containers under heavy load when we used the irate function(default in kube-prometheus 0.10). Upon debugging we found that this was because prometheus was configured to ignore the timestamps in the kubelet cadvisor metrics when they are scraped(coderef). This change was introduced as part of this #695. In #695 , it is mentioned that this change was introduced because stale data was seen as per a user in IRC when the container is OOM killed. But after this change, the cpu utilisation seems to be off by quite a margin(almost 50% in our observations) because p8s uses the prometheus scrape interval instead of the actual timestamp in kubelet for calculating rate. We’ve debugged and documented this issue in detail in this doc.
This was observed by @ringtail
as well in this PR. The proposed fix was different though.
In openshift, they've have a dedicated ServiceMonitor for prometheus-adapter with honorTimestamps set to true - Ref - JIRA Ticket
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi. Recently we noticed that the cpu utilisation was above(~27) the requests/limits(18) for containers under heavy load when we used the irate function(default in kube-prometheus 0.10). Upon debugging we found that this was because prometheus was configured to ignore the timestamps in the kubelet cadvisor metrics when they are scraped(coderef). This change was introduced as part of this #695. In #695 , it is mentioned that this change was introduced because stale data was seen as per a user in IRC when the container is OOM killed. But after this change, the cpu utilisation seems to be off by quite a margin(almost 50% in our observations) because p8s uses the prometheus scrape interval instead of the actual timestamp in kubelet for calculating rate. We’ve debugged and documented this issue in detail in this doc.
This was observed by
@ringtail
as well in this PR. The proposed fix was different though.
In openshift, they've have a dedicated ServiceMonitor for prometheus-adapter with honorTimestamps set to true - Ref - JIRA Ticket
Beta Was this translation helpful? Give feedback.
All reactions