Skip to content

Latest commit

 

History

History
110 lines (86 loc) · 5.07 KB

operation-guide.md

File metadata and controls

110 lines (86 loc) · 5.07 KB

Operation Guide

Stability

The core of Grafana Agent is considered stable and suitable for production use. Features and other functionality that are subject to change and are not recommended for production use will be tagged interchangably as either "beta" or "experimental."

Host Filtering

Host Filtering implements a form of "dumb sharding," where operators may deploy one Grafana Agent instance per machine in a cluster, all using the same configuration, and the Grafana Agents will only scrape targets that are running on the same node as the Agent.

Running with host_filter: true means that if you have a target whose host machine is not also running a Grafana Agent process, that target will not be scraped!

Host Filtering is usually paired with a dedicated Agent process that is used for scraping targets that are running outside of a given cluster. For example, when running the Grafana Agent on GKE, you would have a DaemonSet with host_filter for scraping in-cluster targets, and a single dedicated Deployment for scraping other targets that are not running on a cluster node, such as the Kubernetes control plane API.

If you want to scale your scrape load without host filtering, you may use the scraping service instead.

The host name of the Agent is determined by reading $HOSTNAME. If $HOSTNAME isn't defined, the Agent will use Go's os.Hostname to determine the hostname.

The following meta-labels are used to determine if a target is running on the same machine as the target:

  • __address__
  • __meta_consul_node
  • __meta_dockerswarm_node_id
  • __meta_dockerswarm_node_hostname
  • __meta_dockerswarm_node_address
  • __meta_kubernetes_pod_node_name
  • __meta_kubernetes_node_name
  • __host__

The final label, __host__, isn't a label added by any Prometheus service discovery mechanism. Rather, __host__ can be generated by using host_filter_relabel_configs. This allows for custom relabeling rules to determine the hostname where the predefined ones fail. Relabeling rules added with host_filter_relabel_configs are temporary and just used for the host_filtering mechanism. Full relabeling rules should be applied in the appropriate scrape_config instead.

Note that scrape_config relabel_configs do not apply to the host filtering logic; only host_filter_relabel_configs will work.

If the determined hostname matches any of the meta labels, the discovered target is allowed. Otherwise, the target is ignored, and will not show up in the targets API.

Prometheus "Instances"

The Grafana Agent defines a concept of a Prometheus Instance, which is its own mini Prometheus-lite server. The Instance runs a combination of Prometheus service discovery, scraping, a WAL for storage, and remote_write.

Instances allow for fine grained control of what data gets scraped and where it gets sent. Users can easily define two Instances that scrape different subsets of metrics and send them to two completely different remote_write systems.

Instances are especially relevant to the scraping service mode, where breaking up your scrape configs into multiple Instances is required for sharding and balancing scrape load across a cluster of Agents.

Instance Sharing

The v0.5.0 release of the Agent introduced the concept of Instance sharing, which combines scrape_configs from compatible Instance configs into a single, shared Instance. Instance configs are compatible when they have no differences in configuration with the exception of what they scrape. remote_write configs may also differ in the order which endpoints are declared, but the unsorted remote_writes must still be an exact match.

In the shared Instances mode, the name field of remote_write configs is ignored. The resulting remote_write configs will have a name identical to the first six characters of the group name and the first six characters of the hash from that remote_write config separated by a -.

The shared Instances mode is the new default, and the previous behavior is deprecated. If you wish to restore the old behavior, set instance_mode: distinct in the prometheus_config block of your config file.

Shared Instances are completely transparent to the user with the exception of exposed metrics. With instance_mode: shared, metrics for Prometheus components (WAL, service discovery, remote_write, etc) have a instance_group_name label, which is the hash of all settings used to determine the shared instance. When instance_mode: distinct is set, the metrics for Prometheus components will instead have an instance_name label, which matches the name set on the individual Instance config. It is recommended to use the default of instance_mode: shared unless you don't mind the performance hit and really need granular metrics.

Users can use the targets API to see all scraped targets, and the name of the shared instance they were assigned to.