SNMP walk-capable monitoring plugin for Icinga/Nagios
There are a lot of devices out there that only provides detailed status of the internal hardware. While from the same vendor, the actual configuration such as number of CPUs or memory modules might differ which causes headaches when trying to set up snmp monitoring with decent coverage for those systems.
While the original snmp_check
plugin does a decent job for checking
thresholds, it's found lacking when monitoring discrete states due to:
- No possibility to report WARNING state.
- You must specify a static set of OIDs (items) to check.
This utility adds the possibility to define WARNING states, and also
leverages snmpwalk
/ snmpbulkwalk
to check the status of an entire
subset of components. To make it easier it also supports specifying the MIB
directory, eliminating the need to use the env MIBDIR=/path/to/mibs check_snmp ...
hack.
The original snmp_check
plugin is a C program executing the standard
snmpget
utility. This plugin also executes the standard SNMP utilities,
and while written in Perl the performance impact is negligible compared to
the original snmp_check
.
A common misconception is that all SNMP MIB files should be placed in a single directory. While this works for smaller setups, when adding MIBs for a wider range of devices you will find that there are conflicts and also that simply parsing all MIB files becomes CPU intensive on the monitoring host.
A better strategy is to collect the needed MIBs in separate directories for each class of devices.
- Perl
- Including
Monitoring::Plugin
- Including
- Standard Net-SNMP tools for
snmpget
et al. - Recommended: MIB files for your devices
- While using numeric OIDs/values works, the user experience is much better when using MIBs.
GNU Affero General Public License v3.0 (AGPLv3)
For more information, see:
https://www.gnu.org/licenses/agpl-3.0.html
https://choosealicense.com/licenses/agpl-3.0/
This utility is not intended for checking thresholds such as fan speeds
etc, use the original check_snmp
plugin for that.
While the arguments are heavily inspired by the original check_snmp
plugin, the behavior might differ.
See the help as shown by ./check_snmp_state -h
for the complete set of
arguments.
To debug issues, it's recommended to run manually using the -vv
option
to enable verbose/debug output.
Check all memoryHealthStatus items, this uses snmpbulkwalk
.
We expect Normal
to signify an OK
state. Everything else will get an
CRITICAL
status with the extra tag UNDEFINED
in the output.
Since newer XClarity firmwares only supports SNMP v3, that's what we use.
$ ./check_snmp_state -M /path/to/mibs/lenovo/xcc -P 3 -U snmpuser \
-H hostname-xcc -O Normal -W -o LENOVO-XCC-MIB::memoryHealthStatus
SNMP_STATE OK - 6 OK
OK: LENOVO-XCC-MIB::memoryHealthStatus.1 Normal
OK: LENOVO-XCC-MIB::memoryHealthStatus.2 Normal
OK: LENOVO-XCC-MIB::memoryHealthStatus.3 Normal
OK: LENOVO-XCC-MIB::memoryHealthStatus.4 Normal
OK: LENOVO-XCC-MIB::memoryHealthStatus.5 Normal
OK: LENOVO-XCC-MIB::memoryHealthStatus.6 Normal
Check all memoryDeviceStatus items. This uses snmpbulkwalk
. Here we leverage that the MIB file
enumerates the status, so we can actually tell the plugin what to expect as
ok/warning/critical states.
$ ./check_snmp_state -M /path/to/mibs/idrac -H hostname-idrac \
-O ok -w nonCritical -c critical -c nonRecoverable \
-W -o IDRAC-MIB-SMIv2::memoryDeviceStatus
SNMP_STATE OK - 16 OK
OK: IDRAC-MIB-SMIv2::memoryDeviceStatus.1.1 ok
OK: IDRAC-MIB-SMIv2::memoryDeviceStatus.1.2 ok
OK: IDRAC-MIB-SMIv2::memoryDeviceStatus.1.3 ok
OK: IDRAC-MIB-SMIv2::memoryDeviceStatus.1.4 ok
OK: IDRAC-MIB-SMIv2::memoryDeviceStatus.1.5 ok
OK: IDRAC-MIB-SMIv2::memoryDeviceStatus.1.6 ok
OK: IDRAC-MIB-SMIv2::memoryDeviceStatus.1.7 ok
OK: IDRAC-MIB-SMIv2::memoryDeviceStatus.1.8 ok
OK: IDRAC-MIB-SMIv2::memoryDeviceStatus.1.9 ok
OK: IDRAC-MIB-SMIv2::memoryDeviceStatus.1.10 ok
OK: IDRAC-MIB-SMIv2::memoryDeviceStatus.1.11 ok
OK: IDRAC-MIB-SMIv2::memoryDeviceStatus.1.12 ok
OK: IDRAC-MIB-SMIv2::memoryDeviceStatus.1.13 ok
OK: IDRAC-MIB-SMIv2::memoryDeviceStatus.1.14 ok
OK: IDRAC-MIB-SMIv2::memoryDeviceStatus.1.15 ok
OK: IDRAC-MIB-SMIv2::memoryDeviceStatus.1.16 ok
Check all global/combined/rollup/summary states explicitly, this uses
snmpget
on the specific OIDs.
Here we leverage that the MIB file enumerates the status, so we can actually tell the plugin what to expect as ok/warning/critical states.
$ ./check_snmp_state -M /path/to/mibs/idrac -H hostname-idrac \
-O ok -w nonCritical -c critical -c nonRecoverable \
-o IDRAC-MIB-SMIv2::globalSystemStatus.0 -o IDRAC-MIB-SMIv2::globalStorageStatus.0 \
-o IDRAC-MIB-SMIv2::systemStateGlobalSystemStatus.1 -o IDRAC-MIB-SMIv2::systemStatePowerSupplyStatusCombined.1 \
-o IDRAC-MIB-SMIv2::systemStateVoltageStatusCombined.1 -o IDRAC-MIB-SMIv2::systemStateAmperageStatusCombined.1 \
-o IDRAC-MIB-SMIv2::systemStateCoolingDeviceStatusCombined.1 -o IDRAC-MIB-SMIv2::systemStateTemperatureStatusCombined.1 \
-o IDRAC-MIB-SMIv2::systemStateMemoryDeviceStatusCombined.1 -o IDRAC-MIB-SMIv2::systemStateChassisIntrusionStatusCombined.1 \
-o IDRAC-MIB-SMIv2::systemStatePowerUnitStatusCombined.1 -o IDRAC-MIB-SMIv2::systemStateCoolingUnitStatusCombined.1 \
-o IDRAC-MIB-SMIv2::systemStateProcessorDeviceStatusCombined.1 -o IDRAC-MIB-SMIv2::systemStateBatteryStatusCombined.1 \
-o IDRAC-MIB-SMIv2::systemStateTemperatureStatisticsStatusCombined.1
SNMP_STATE OK - 15 OK
OK: IDRAC-MIB-SMIv2::globalSystemStatus.0 ok
OK: IDRAC-MIB-SMIv2::globalStorageStatus.0 ok
OK: IDRAC-MIB-SMIv2::systemStateGlobalSystemStatus.1 ok
OK: IDRAC-MIB-SMIv2::systemStatePowerSupplyStatusCombined.1 ok
OK: IDRAC-MIB-SMIv2::systemStateVoltageStatusCombined.1 ok
OK: IDRAC-MIB-SMIv2::systemStateAmperageStatusCombined.1 ok
OK: IDRAC-MIB-SMIv2::systemStateCoolingDeviceStatusCombined.1 ok
OK: IDRAC-MIB-SMIv2::systemStateTemperatureStatusCombined.1 ok
OK: IDRAC-MIB-SMIv2::systemStateMemoryDeviceStatusCombined.1 ok
OK: IDRAC-MIB-SMIv2::systemStateChassisIntrusionStatusCombined.1 ok
OK: IDRAC-MIB-SMIv2::systemStatePowerUnitStatusCombined.1 ok
OK: IDRAC-MIB-SMIv2::systemStateCoolingUnitStatusCombined.1 ok
OK: IDRAC-MIB-SMIv2::systemStateProcessorDeviceStatusCombined.1 ok
OK: IDRAC-MIB-SMIv2::systemStateBatteryStatusCombined.1 ok
OK: IDRAC-MIB-SMIv2::systemStateTemperatureStatisticsStatusCombined.1 ok
The IBM TS4500 tape library tends to just ignore a tape drive when communication has stopped working entirely, specify the expected number of results to work around this problem.
The below example expects 12 tape drives to be present, but since one of then is on the fritz the library silently ignores the broken one in the middle of the bunch and returns state for 11 drives.
$ ./check_snmp_state -H hostname-ts4500 -M /path/to/mibs/ibm/ts4500 -W -o SNIA-SML-MIB::mediaAccessDevice-Availability -O runningFullPower -r 12
SNMP_STATE UNKNOWN - 11 OK, 1 Error - Expected at least 12 values, only found 11.
OK: SNIA-SML-MIB::mediaAccessDevice-Availability.1 runningFullPower
OK: SNIA-SML-MIB::mediaAccessDevice-Availability.2 runningFullPower
OK: SNIA-SML-MIB::mediaAccessDevice-Availability.3 runningFullPower
OK: SNIA-SML-MIB::mediaAccessDevice-Availability.4 runningFullPower
OK: SNIA-SML-MIB::mediaAccessDevice-Availability.5 runningFullPower
OK: SNIA-SML-MIB::mediaAccessDevice-Availability.6 runningFullPower
OK: SNIA-SML-MIB::mediaAccessDevice-Availability.7 runningFullPower
OK: SNIA-SML-MIB::mediaAccessDevice-Availability.8 runningFullPower
OK: SNIA-SML-MIB::mediaAccessDevice-Availability.9 runningFullPower
OK: SNIA-SML-MIB::mediaAccessDevice-Availability.10 runningFullPower
OK: SNIA-SML-MIB::mediaAccessDevice-Availability.11 runningFullPower
UNKNOWN/missing: Expected at least 12 values, only found 11