|
| 1 | +Prometheus IPMI Exporter |
| 2 | +======================== |
| 3 | + |
| 4 | +This is an IPMI over LAN exporter for [Prometheus](https://prometheus.io). |
| 5 | + |
| 6 | +An instance running on one host can be used to monitor a large number of IPMI |
| 7 | +interfaces by passing the `target` parameter to a scrape. It uses tools from |
| 8 | +the [FreeIPMI](https://www.thomas-krenn.com/en/wiki/FreeIPMI_ipmimonitoring) |
| 9 | +suite for the actual IPMI communication. |
| 10 | + |
| 11 | +## Installation |
| 12 | + |
| 13 | +You need a Go development environment. Then, run the following to get the |
| 14 | +source code and build and install the binary: |
| 15 | + |
| 16 | + go get github.com/soundcloud/ipmi_exporter |
| 17 | + |
| 18 | +## Running |
| 19 | + |
| 20 | +A minimal invocation looks like this: |
| 21 | + |
| 22 | + ./ipmi_exporter |
| 23 | + |
| 24 | +Supported parameters include: |
| 25 | + |
| 26 | + - `web.listen-address`: the address/port to listen on (default: `":9290"`) |
| 27 | + - `config.file`: path to the configuration file (default: `ipmi.yml`) |
| 28 | + - `path`: path to the FreeIPMI executables (default: rely on `$PATH`) |
| 29 | + |
| 30 | +Make sure you have at least the following tools from the |
| 31 | +[FreeIPMI](https://www.thomas-krenn.com/en/wiki/FreeIPMI_ipmimonitoring) suite |
| 32 | +installed: |
| 33 | + |
| 34 | + - `ipmimonitoring` |
| 35 | + - `ipmi-dcmi` |
| 36 | + - `bmc-info` |
| 37 | + |
| 38 | +## Configuration |
| 39 | + |
| 40 | +The general configuration pattern is similar to that of the [blackbox |
| 41 | +exporter](https://github.com/prometheus/blackbox_exporter), i.e. Prometheus |
| 42 | +scrapes a small number (possibly one) of IPMI exporters with a `target` URL |
| 43 | +parameter to tell the exporter which IPMI device it should use to retrieve the |
| 44 | +IPMI metrics. We have taken this approach as IPMI devices often provide useful |
| 45 | +information even while the supervised host is turned off. If you are running |
| 46 | +the exporter on a separate host anyway, it makes more sense to have only a few |
| 47 | +of them, each probing many (possibly thousands of) IPMI devices, rather than |
| 48 | +one exporter per IPMI device. |
| 49 | + |
| 50 | +### IPMI exporter |
| 51 | + |
| 52 | +The exporter requires a configuration file called `ipmi.yml` (can be |
| 53 | +overridden, see above). It must contain user names and passwords for IPMI |
| 54 | +access to all targets. It supports a “default” target, which is used as |
| 55 | +fallback if the target is not explicitly listed in the file. |
| 56 | + |
| 57 | +The configuration file also supports a blacklist of sensors, useful in case of |
| 58 | +OEM-specific sensors that FreeIPMI cannot deal with properly or otherwise |
| 59 | +misbehaving sensors. |
| 60 | + |
| 61 | +See the included `ipmi.yml` file for an example. |
| 62 | + |
| 63 | +### Prometheus |
| 64 | + |
| 65 | +To add your IPMI targets to Prometheus, you can use any of the supported |
| 66 | +service discovery mechanism of your choice. The following example uses the |
| 67 | +file-based SD and should be easy to adjust to other scenarios. |
| 68 | + |
| 69 | +Create a YAML file that contains a list of targets, e.g.: |
| 70 | + |
| 71 | +``` |
| 72 | +--- |
| 73 | +- targets: |
| 74 | + - 10.1.2.23 |
| 75 | + - 10.1.2.24 |
| 76 | + - 10.1.2.25 |
| 77 | + - 10.1.2.26 |
| 78 | + - 10.1.2.27 |
| 79 | + - 10.1.2.28 |
| 80 | + - 10.1.2.29 |
| 81 | + - 10.1.2.30 |
| 82 | + labels: |
| 83 | + job: ipmi_exporter |
| 84 | +``` |
| 85 | + |
| 86 | +This file needs to be stored on the Prometheus server host. Assuming that this |
| 87 | +file is called `/srv/ipmi_exporter/targets.yml`, and the IPMI exporter is |
| 88 | +running on a host that has the DNS name `ipmi-exporter.internal.example.com`, |
| 89 | +add the following to your Prometheus config: |
| 90 | + |
| 91 | +``` |
| 92 | +- job_name: ipmi |
| 93 | + scrape_interval: 1m |
| 94 | + scrape_timeout: 30s |
| 95 | + metrics_path: /ipmi |
| 96 | + scheme: http |
| 97 | + file_sd_configs: |
| 98 | + - files: |
| 99 | + - /srv/ipmi_exporter/targets.yml |
| 100 | + refresh_interval: 5m |
| 101 | + relabel_configs: |
| 102 | + - source_labels: [__address__] |
| 103 | + separator: ; |
| 104 | + regex: (.*)(:80)? |
| 105 | + target_label: __param_target |
| 106 | + replacement: ${1} |
| 107 | + action: replace |
| 108 | + - source_labels: [__param_target] |
| 109 | + separator: ; |
| 110 | + regex: (.*) |
| 111 | + target_label: instance |
| 112 | + replacement: ${1} |
| 113 | + action: replace |
| 114 | + - separator: ; |
| 115 | + regex: .* |
| 116 | + target_label: __address__ |
| 117 | + replacement: ipmi-exporter.internal.example.com:9198 |
| 118 | + action: replace |
| 119 | +``` |
| 120 | + |
| 121 | +For more information, e.g. how to use mechanisms other than a file to discover |
| 122 | +the list of hosts to scrape, please refer to the [Prometheus |
| 123 | +documentation](https://prometheus.io/docs). |
| 124 | + |
| 125 | +## Exported data |
| 126 | + |
| 127 | +### Scrape meta data |
| 128 | + |
| 129 | +There are two metrics providing data about the scrape itself: |
| 130 | + |
| 131 | + - `ipmi_up` is `1` if all data could successfully be retrieved from the remote |
| 132 | + host, `0` otherwise |
| 133 | + - `ipmi_scrape_duration_seconds` is the amount of time it took to retrieve the |
| 134 | + data |
| 135 | + |
| 136 | +### BMC info |
| 137 | + |
| 138 | +For some basic information, there is a constant metric `ipmi_bmc_info` with |
| 139 | +value `1` and labels providing the firmware revision and manufacturer as |
| 140 | +returned from the BMC. Example: |
| 141 | + |
| 142 | + ipmi_bmc_info{firmware_revision="2.52",manufacturer_id="Dell Inc. (674)"} 1 |
| 143 | + |
| 144 | +### Power consumption |
| 145 | + |
| 146 | +The metric `ipmi_dcmi_power_consumption_current_watts` can be used to monitor |
| 147 | +the live power consumption of the machine in Watts. If in doubt, this metric |
| 148 | +should be used over any of the sensor data (see below), even if their name |
| 149 | +might suggest that they measure the same thing. This metric has no labels. |
| 150 | + |
| 151 | +### Sensors |
| 152 | + |
| 153 | +IPMI sensors in general have one or two distinct pieces of information that are |
| 154 | +of interest: a value and/or a state. The exporter always exports both, even if |
| 155 | +the value is NaN or the state non-sensical. This is so one can still always |
| 156 | +find the metrics to avoid ending up in a situation where one is looking for |
| 157 | +e.g. the value of a sensor that is in a critical state, but can't find it and |
| 158 | +assume this to be a problem. |
| 159 | + |
| 160 | +The state of a sensor can be one of _nominal_, _warning_, _critical_, or _N/A_, |
| 161 | +reflected by the metric values `0`, `1`, `2`, and `NaN` respectively. Think of |
| 162 | +this as a kind of severity. |
| 163 | + |
| 164 | +For sensors with known semantics (i.e. units), corresponding specific metrics |
| 165 | +are exported. For everything else, generic metrics are exported. |
| 166 | + |
| 167 | +#### Temperature sensors |
| 168 | + |
| 169 | +Temperature sensors measure a temperature in degrees Celsius and their state |
| 170 | +usually reflects the temperature going above the vendor-recommended value. For |
| 171 | +each temperature sensor, two metrics are exported (state and value), using the |
| 172 | +sensor ID and the sensor name as labels. Example: |
| 173 | + |
| 174 | + ipmi_temperature_celsius{id="18",name="Inlet Temp"} 24 |
| 175 | + ipmi_temperature_state{id="18",name="Inlet Temp"} 0 |
| 176 | + |
| 177 | +#### Fan speed sensors |
| 178 | + |
| 179 | +Fan speed sensors measure fan speed in rotations per minute (RPM) and their |
| 180 | +state usually reflects the speed being to low, indicating the fan might be |
| 181 | +broken. For each fan speed sensor, two metrics are exported (state and value), |
| 182 | +using the sensor ID and the sensor name as labels. Example: |
| 183 | + |
| 184 | + ipmi_fan_speed_rpm{id="12",name="Fan1A"} 4560 |
| 185 | + ipmi_fan_speed_state{id="12",name="Fan1A"} 0 |
| 186 | + |
| 187 | +#### Voltage sensors |
| 188 | + |
| 189 | +Voltage sensors measure a voltage in Volts. For each voltage sensor, two |
| 190 | +metrics are exported (state and value), using the sensor ID and the sensor name |
| 191 | +as labels. Example: |
| 192 | + |
| 193 | + ipmi_voltage_state{id="2416",name="12V"} 0 |
| 194 | + ipmi_voltage_volts{id="2416",name="12V"} 12 |
| 195 | + |
| 196 | +#### Current sensors |
| 197 | + |
| 198 | +Current sensors measure a current in Amperes. For each current sensor, two |
| 199 | +metrics are exported (state and value), using the sensor ID and the sensor name |
| 200 | +as labels. Example: |
| 201 | + |
| 202 | + ipmi_current_state{id="83",name="Current 1"} 0 |
| 203 | + ipmi_current_amperes{id="83",name="Current 1"} 0 |
| 204 | + |
| 205 | +#### Power sensors |
| 206 | + |
| 207 | +Power sensors measure power in Watts. For each power sensor, two metrics are |
| 208 | +exported (state and value), using the sensor ID and the sensor name as labels. |
| 209 | +Example: |
| 210 | + |
| 211 | + ipmi_power_state{id="90",name="Pwr Consumption"} 0 |
| 212 | + ipmi_power_watts{id="90",name="Pwr Consumption"} 70 |
| 213 | + |
| 214 | +Note that based on our observations, this may or may not be a reading |
| 215 | +reflecting the actual live power consumption. We recommend using the more |
| 216 | +explicit [power consumption metrics](#power_consumption) for this. |
| 217 | + |
| 218 | +#### Generic sensors |
| 219 | + |
| 220 | +For all sensors that can not be classified, two generic metrics are exported, |
| 221 | +the state and the value. However, to provide a little more context, the sensor |
| 222 | +type is added as label (in addition to name and ID). Example: |
| 223 | + |
| 224 | + ipmi_sensor_state{id="139",name="Power Cable",type="Cable/Interconnect"} 0 |
| 225 | + ipmi_sensor_value{id="139",name="Power Cable",type="Cable/Interconnect"} NaN |
| 226 | + |
0 commit comments