I propose updating the way Kafka metrics are collected by implementing a metrics reporter that directly exposes metrics in the Prometheus format.
PoC: https://github.com/mimaison/prometheus-metrics-reporter
Efficient monitoring of Kafka relies on the accurate collection of metrics data. For historical reasons, Kafka uses two distinct libraries for metrics:
org.apache.kafka.common.metrics
creates Kafka metrics on the client side and common metrics on the server side.com.yammer.metrics.metrics-core
is the predecessor of Dropwizard and creates Yammer metrics on the broker side.
For both types of metrics, Kafka exposes a reporter interface to expose metrics to monitoring systems. Kafka metrics use org.apache.kafka.common.metrics.MetricsReporter and Yammer metrics use kafka.metrics.KafkaMetricsReporter
which is not officially part of the public API. Kafka has built-in metrics reporter implementations for JMX for both types.
At the moment Strimzi relies on these default JMX reporters and uses jmx_exporter which is a Java agent that retrieves metrics via JMX and exposes them over an HTTP endpoint in the Prometheus format. Then Prometheus is configured to scrape that endpoint to retrieve the Kafka metrics.
org.apache.kafka.common.metrics.JmxReporter
is the reporter implementation for the Kafka metrics.org.apache.kafka.server.metrics.FilteringJmxReporter
is the reporter implementation for the Yammer metrics. It’s namedFilteringJmxReporter
because it extends theJmxReporter
class from the Yammer library and adds an option to select the metrics to report.
I propose updating the metrics collection pipeline for the following reasons:
- The current metrics collection pipeline is pretty convoluted. We have metrics reporters first exposing metrics via JMX before using a Java agent to expose them again via HTTP. Using metrics reporters to directly expose metrics to Prometheus would significantly simplify it by removing JMX from the picture and removing jmx_exporter.
- Each component along the pipeline has its own configurations and specificities. For example, since Kafka 3.4.0, it’s possible to disable the JmxReporter for Kafka metrics by setting auto.include.jmx.reporter to
false
. Currently, it is not possible to disableFilteringJmxReporter
. Both jmx_exporter and the built-in reporter offer the flexibility to choose which metrics to collect. - The jmx_exporter Java agent supports complex metrics mapping rules. These rules allow renaming metrics and it can make it hard to investigate issues as metrics could have different names. This prevents using metrics like an API, whether this is to build Grafana dashboards or for the operator to rely on metrics.
- Due to the complex rules, it performs badly when there’s a very large number of metrics due to a lot of topic/partitions.
The proposal is to build metrics reporters that directly exposes metrics via an HTTP endpoint in the Prometheus format. This will be a new project/repository under the Strimzi organization.
Reporters will expose the following configurations:
prometheus.metrics.reporter.listener
: The listener to expose the metrics in the formathttp://<HOST>:<PORT>
. If the<HOST>
part if empty the listener binds to the default interface, if it is set to0.0.0.0
, the listener binds to all interfaces. If the<PORT>
part is set to0
, a random port is picked. Default:http://:8080
.prometheus.metrics.reporter.allowlist
: A comma separated list of regex patterns to specify the metrics to collect. Default:.*
. Only metrics matching at least one of the patterns in the list will be emitted.prometheus.metrics.reporter.listener.enable
: A boolean to determine whether to start the listener or not. Default:true
.
The reporters will also export JVM metrics similar to the ones exported by jmx_exporter. These are provided by the JVM instrumentation package from the Prometheus Java client.
This proposal ignores supporting HTTPS as today Strimzi does not allow configuring it with jmx_exporter. Note that jmx_exporter 0.19.0 added support for HTTPS. If needed we can add it later to the reporters.
This proposal will produce an implementation for each type of metrics reporter.
KafkaPrometheusMetricsReporter
usable on brokers (for Kafka metrics) and on Kafka clients (including Connect and Streams)YammerPrometheusMetricsReporter
usable on brokers (for Yammer metrics)
The Prometheus metrics registry is a singleton and the HTTP server will also be a singleton. This will allow applications to start multiple instances of the reporter (for example in applications containing multiple Kafka clients like Streams, Connect), and still collect all metrics via a single HTTP endpoint per JVM.
The reporter for Kafka metrics will be usable outside of Strimzi by applications using Kafka clients. To do so applications will need to set the metric.reporters
configuration to KafkaPrometheusMetricsReporter
and set the reporter configurations accordingly for each Kafka client they instantiate.
Strimzi users currently enable jmx_exporter with the following configuration:
metricsConfig:
type: jmxPrometheusExporter
valueFrom:
configMapKeyRef:
name: kafka-metrics
key: kafka-metrics-config.yml
To enable metric reporters, Strimzi users would use a configuration like the following:
metricsConfig:
type: strimziMetricsReporter
values:
allowList:
- "kafka_log.*"
- "kafka_network.*"
Strimzi will only allow users to set the prometheus.metrics.reporter.allowlist
setting via the allowList
field. The prometheus.metrics.reporter.listener
will not be customizable in Strimzi and will be set to http://:9404
. It also helps to avoid conflicts with the Strimzi Kafka Agent currently using port 8080.
Out of all the metrics emitted by Kafka brokers and clients, a few of them have non-numeric values. Prometheus only supports numeric values for metrics. When using jmx_exporter it is possible to write rules that move the non-numeric value to a label. For example we do it in kafka-connect-metrics for the status
metric which has a string value.
I propose to do this automatically in the reporters. For example a metric named: kafka.connect:type=connector-task-metrics,connector="{connector}",task="{task}"<>status
with the value running
will be converted into kafka_connect_connector_task_metrics_status{connector="heartbeats",task="0",status="running"}
and its value will be set to 1.0
.
The reporter will be a new project, strimzi-metrics-reporter
. It will be used by Strimzi components but should also be usable without Strimzi, for example in client side applications.
Reporters are not usable with ZooKeeper so Strimzi should keep support for jmx_exporter while it also supports ZooKeeper. The transition would happen in 3 stages:
-
The new reporters are added as an additional options to the Strimzi custom resource, but we keep supporting the JMX Exporter and use it in our examples as the main thing (with possibly some new examples of CRs and dashboards prepared as well, but not as the main thing)
-
We switch to the new reporters as the main example. We keep supporting JMX Exporter but deprecate it.
-
Later, once ZooKeeper is not supported anymore, we might decide to drop the JMX Exporter support completely.
The OAuth plugin exposes the strimzi.oauth.metric.reporters
configuration to let users specify a metrics reporter. Today the OAuth plugin automatically uses org.apache.kafka.common.metrics.JmxReporter
if that configuration is not set. To switch to the new reporter, users should set strimzi.oauth.metric.reporters
to KafkaPrometheusMetricsReporter
.
Note that directly instantiating metric reporters in the plugin is a workaround as Kafka currently does not provide a way for plugins to register metrics. KIP-877, currently in discussion, aims at solving this issue. Once this KIP is accepted, we should update the plugin to use this mechanism.
To function, the bridge instantiates Kafka clients. Today it has its own custom mechanism to retrieve metrics from the Kafka clients via JMX and expose them to Prometheus.
We can update the bridge to use metric reporters to retrieve metrics from Kafka clients and still keep it's own mechanism to expose them to Prometheus (as it's also exposing its own metrics). To switch behavior, I propose introducing a new configuration to the bridge metricsMode
/KAFKA_BRIDGE_METRICS_MODE
which would initially default to jmx
to keep the current behavior. If set to reporter
, the bridge would set the metric.reporters
configuration on all Kafka clients it starts to KafkaPrometheusMetricsReporter
and retrieve their metrics via the Prometheus metrics registry. It should also set prometheus.metrics.reporter.listener
to disabled
so KafkaPrometheusMetricsReporter
instances don't start their own HTTP endpoint.
This plugin creates metrics by using the Yammer metrics library. com.yammer.metrics.Metrics
registers metrics in the default Yammer registry and automatically exports metrics via JMX.
Like strimzi-kafka-oauth, this plugin should be updated once KIP-877 is available in Kafka.
In the meantime, YammerPrometheusMetricsReporter
will retrieve metrics from both the Kafka Yammer registry and the default Yammer registry to get all Yammer metrics.
Differences with jmx_exporter metrics:
- The reporter does not compute 1/5/15 minute rate, mean, max, min, stddev metrics. It's preferable to compute them in Prometheus instead of on the client side.
- The reporter is missing the
kafka_server_app_info_starttimems
metric with the client/broker id label. (Due to KAFKA-15186)
Assuming jmx_exporter does not have any rules, this is the other main metric change:
- With the reporter, the
name
field is put directly in the metric name. For example this MBean,kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperRequestLatencyMs
with theCount
attribute is converted tokafka_server_zookeeperclientmetrics_zookeeperrequestlatencyms_count
. By default jmx_exporter keeps the name as a label,kafka_server_zookeeperclientmetrics_count{name="ZooKeeperRequestLatencyMs",}
.
Actually with the example rules from kafka-metrics.yaml, jmx_exporter generates metric names very similar to the reporter.
See https://gist.github.com/mimaison/335bc52bd5fb39097b7e6865c8cd1bea that contains the output from jmx_exporter with the default metrics.yaml rules, and the proposed metrics reporter with the following configuration.
prometheus.metrics.reporter.allowlist=kafka_cluster.*,kafka_controller.*,kafka_log.*,kafka_network.*,kafka_server_(brokertopicmetrics|delayedoperationpurgatory|fetchsessioncache|kafkarequesthandlerpool|kafkaserver|replicaalterlogdirsmanager|replicafetchermanager|replicamanager|sessionexpirelistener|socket_server_metrics|zookeeperclientmetrics).*,kafka_utils.*
Both files have been sorted and the comments have been removed so they are easy to compare.
If we also remove the values, doing a diff highlights the following differences:
-
The JMX metrics only exist with jmx_exporter.
jmx_config_reload_failure_created jmx_config_reload_failure_total jmx_config_reload_success_created jmx_config_reload_success_total jmx_exporter_build_info{version="0.19.0",name="jmx_prometheus_javaagent",} jmx_scrape_cached_beans jmx_scrape_duration_seconds jmx_scrape_error
This is expected as JMX is not used with the reporter.
-
With JMX a number of
java_lang
metrics are emitted. These are not retrieved by Prometheus Hotspot exports. -
The
PerSec
suffix is removed by jmx_exporter rules so a few metrics have slightly different names, for example:kafka_controller_controllerstats_uncleanleaderelections_total # jmx_exporter kafka_controller_controllerstats_uncleanleaderelectionspersec_total # reporter kafka_network_requestmetrics_errors_total{request="UpdateMetadata",error="NONE",} # jmx_exporter kafka_network_requestmetrics_errorspersec_total{request="UpdateMetadata",error="NONE",} # reporter
This is due to mapping rules.
-
An underscore
_
is added in front of thepercent
suffix by jmx_exporter rules so a few metrics have slightly different names, for example:kafka_network_processor_idle_percent{networkProcessor="0",} # jmx_exporter kafka_network_processor_idlepercent{networkProcessor="0",} # reporter
This is due to mapping rules.
-
The
count
suffix is used instead oftotal
for some Counters by jmx_exporter:kafka_network_acceptor_acceptorblockedpercent_count{listener="PLAINTEXT",} # jmx_exporter kafka_network_acceptor_acceptorblockedpercent_total{listener="PLAINTEXT",} # reporter
Prometheus recommends using the
total
suffix and it's actually automatically injected by the Java library. The example jmx_exporter rules correctly replace the suffix in most metrics (like in the metrics mentioned above withPerSec
) but they don't match all metrics hence the inconsistency. -
Some labels are wrapped twice in quotes by jmx_exporter
# jmx_exporter kafka_log_logcleanermanager_uncleanable_bytes{logDirectory="\"/tmp/kafka-logs\"",} kafka_log_logcleanermanager_uncleanable_partitions_count{logDirectory="\"/tmp/kafka-logs\"",} kafka_log_logmanager_logdirectoryoffline{logDirectory="\"/tmp/kafka-logs\"",} # reporter kafka_log_logcleanermanager_uncleanable_bytes{logDirectory="/tmp/kafka-logs",} kafka_log_logcleanermanager_uncleanable_partitions_count{logDirectory="/tmp/kafka-logs",} kafka_log_logmanager_logdirectoryoffline{logDirectory="/tmp/kafka-logs",}
I'm guessing this is done to support paths containing special characters. In the context of Strimzi I don't think this is necessary.
In terms of performance, in my very limited testing the reporter is much faster than jmx_exporter. This is not a benchmark or rigorous by any means, I've simply been running time curl --head http://localhost:{PORT}/metrics
against a single broker on my laptop with both the reporter (on port 8080) and jmx_exporter (on port 9090) running. Since each partition has metrics, I used the number of partitions as a proxy to increase the total number of metrics.
# of partitions | jmx_exporter | reporter |
---|---|---|
1 | 600ms | 100ms |
500 | 900ms | 300ms |
1000 | 1300ms | 400ms |
2000 | 2000ms | 800ms |
- I considered using Micrometer in the reporter. The benefit is that it would allow exporting metrics to different monitoring systems. The issue is that it requires converting the Kafka and Yammer metrics into the Micrometer format and then have Micrometer export that to Prometheus. As each metric library has its own characteristics, chaining several conversions can lead to slightly different semantics. Finally Prometheus seems to be the leading monitoring solution in the Kubernetes ecosystem, and I expect most other monitoring tools to integrate with it. For these reasons I decided to not use Micrometer. Ideally Kafka would export its metrics via Micrometer. It's something I've started to explore but it is definitively a very difficult task.