Besides the generic options listed below, checks have additional options that are specific to their type. A single check can generate data for one or more "IDs", e.g. mountpoints, temperature sensors, and so on. Each alarm is instantiated for every ID.
name | example | optional | default |
---|---|---|---|
disable | true |
✔ | false |
interval | 60 |
✔ | 300 |
name | "Foobar" |
❌ | |
timeout | 1 |
✔ | min(5 , interval) |
placeholders | {"internal_check_id" = "id_foobar"} |
✔ | |
filter | {type = "Average", window_size = 16 } |
✔ | |
type | "FilesystemUsage" |
❌ | |
alarms | see below | ✔ |
If true
, the check is disabled and will not be instantiated.
The time between two consecutive checks in seconds. Has to be greater or equal to the timeout.
The name of the check. It is used for logging and the check_name
placeholder.
Must be unique.
The maximum time in seconds a check may take to return its measurement data before being interrupted. Has to be less or equal to the interval.
Custom placeholders that will be merged with the ones of the alarms/actions.
Filter to transform the measurement data using a transformation function.
Type of the check as listed below. This determines which specific check and alarm options are available.
One of:
- DockerContainerStatus
- FilesystemUsage
- MemoryUsage
- NetworkThroughput
- PressureAverage
- ProcessExitStatus
- SystemdUnitStatus
- Temperature
List of alarms.
Name of the check that triggered the alarm.
ID of the check that triggered the alarm.
Error while getting the measurement data, if any.
Besides the generic options listed below, alarms have additional options that are specific to check's type.
name | example | optional | default |
---|---|---|---|
disable | true |
✔ | false |
name | "Foobar" |
❌ | |
action | "FooAction" |
❌ | |
placeholders | {"internal_alarm_id" = "id_foobar"} |
✔ | |
filter | {type = "Average", window_size = 16 } |
✔ | |
cycles | 3 |
✔ | 1 |
repeat_cycles | 100 |
✔ | |
recover_action | "FooAction" |
✔ | |
recover_placeholders | {"internal_alarm_id" = "id_foobar"} |
✔ | |
recover_cycles | 3 |
✔ | 1 |
error_action | "FooAction" |
✔ | |
error_placeholders | {"internal_alarm_id" = "id_foobar"} |
✔ | |
error_repeat_cycles | 100 |
✔ | |
error_recover_action | "FooAction" |
✔ | |
error_recover_placeholders | {"internal_alarm_id" = "id_foobar"} |
✔ | |
invert | true |
✔ | false |
If true
, the alarm is disabled and will not be instantiated.
The name of the alarm. It is used for logging and the alarm_name
placeholder. Must be unique for the check.
The name of the action to trigger when the state transitions from good to bad.
Custom placeholders that will be merged with the ones of the check and the actions. This one is used for all actions.
Filter to transform the measurement using a transformation function.
Number of bad cycles it takes to transition from good to bad state. Must be at least 1.
If this is non-zero, the action is triggered repeatedly every repeat_cycles
cycles while in the bad state.
If it is zero, the action is only triggered once when the state transitions from good to bad.
The name of the action to trigger when the state transitions from bad to good.
Custom placeholders that will be merged with the ones of the check and the actions. This one is used only for the recover_action
.
Number of good cycles it takes to transition from bad to good state. Must be at least 1.
The name of the action to trigger when the state transitions from good or bad to error.
Custom placeholders that will be merged with the ones of the check and the actions. This one is used only for the error_action
.
If this is non-zero, the action is triggered repeatedly every error_repeat_cycles
cycles while in the error state.
If it is zero, the action is only triggered once when the state transitions from good or bad to error.
The name of the action to trigger when the state transitions from error to good or bad.
Custom placeholders that will be merged with the ones of the check and the actions. This one is used only for the error_recover_action
.
If true
, inverts the decision based on the check's measurement data. E.g. the FilesystemUsage check may be used to check if there is less (or equal) than 20% of the space used instead of more than that.
Name of the alarm that triggered the action.
ISO8601 timestamp of the alarm's state change event.
Duration the last state lasted in seconds.
Duration the last state lasted as ISO8601 duration.
Current state of the alarm.
One of:
Good
Bad
Error