From 37822ab8813e91c2e029278961d258f088cd72e2 Mon Sep 17 00:00:00 2001 From: Diko Parvanov Date: Tue, 28 Mar 2023 16:13:51 +0300 Subject: [PATCH] Changed unit_unavailable interval for prometheus As stated in issue https://github.com/canonical/bundle-kubeflow/issues/564 the duation for alerts for argo is set to 0m, which is too low for prod environments. We need to change to at least 5m to prevent the flapping behavior. Partial-Bug: https://github.com/canonical/bundle-kubeflow/issues/564 --- src/prometheus_alert_rules/unit_unavailable.rule | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/prometheus_alert_rules/unit_unavailable.rule b/src/prometheus_alert_rules/unit_unavailable.rule index 06b7464..93a89e8 100644 --- a/src/prometheus_alert_rules/unit_unavailable.rule +++ b/src/prometheus_alert_rules/unit_unavailable.rule @@ -1,6 +1,6 @@ alert: TrainingOperatorUnitIsUnavailable expr: up < 1 -for: 0m +for: 5m labels: severity: critical annotations: