Skip to content

Commit

Permalink
Update limit for GardenerFailureRateTooHighOrMissing to prevent false…
Browse files Browse the repository at this point in the history
… alarms (#1023)

* Update limit for GardenerFailureRateTooHighOrMissing
* Fix parentheses typo
  • Loading branch information
stephen-soltesz authored Dec 8, 2023
1 parent 983e373 commit a713d6d
Showing 1 changed file with 2 additions and 3 deletions.
5 changes: 2 additions & 3 deletions config/federation/prometheus/alerts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -879,9 +879,8 @@ groups:
# GardenerFailureRateTooHigh fires when the number of failed Gardener jobs
# in the last day rises above 1%.
- alert: GardenerFailureRateTooHighOrMissing
expr: (sum(rate(gardener_jobs_total{status!="success"}[1d])) by (experiment, datatype) /
sum(rate(gardener_jobs_total[1d])) by (experiment, datatype)) > 0.01
for: 10m
expr: sum(increase(gardener_jobs_total{status!="success"}[1d])) by (experiment, datatype) > 3
for: 24h
labels:
repo: dev-tracker
severity: ticket
Expand Down

0 comments on commit a713d6d

Please sign in to comment.