From 790dd33b0c43075d0692cfbed49249b5860bb6ac Mon Sep 17 00:00:00 2001 From: michael2893 Date: Tue, 7 Jan 2025 12:10:47 -0500 Subject: [PATCH] Add single writer principle note to deployment documentation (#5166) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Juraci Paixão Kröhling Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com> Co-authored-by: opentelemetrybot <107717825+opentelemetrybot@users.noreply.github.com> Co-authored-by: Phillip Carter --- .../collector/deployment/gateway/index.md | 42 +++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/content/en/docs/collector/deployment/gateway/index.md b/content/en/docs/collector/deployment/gateway/index.md index 6b3dae479b81..ed10bf0c74ad 100644 --- a/content/en/docs/collector/deployment/gateway/index.md +++ b/content/en/docs/collector/deployment/gateway/index.md @@ -251,3 +251,45 @@ Cons: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor [spanmetrics-connector]: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/connector/spanmetricsconnector + +## Multiple collectors and the single-writer principle + +All metric data streams within OTLP must have a +[single writer](/docs/specs/otel/metrics/data-model/#single-writer). When +deploying multiple collectors in a gateway configuration, it's important to +ensure that all metric data streams have a single writer and a globally unique +identity. + +### Potential problems + +Concurrent access from multiple applications that modify or report on the same +data can lead to data loss or degraded data quality. For example, you might see +inconsistent data from multiple sources on the same resource, where the +different sources can overwrite each other because the resource is not uniquely +identified. + +There are patterns in the data that may provide some insight into whether this +is happening or not. For example, upon visual inspection, a series with +unexplained gaps or jumps in the same series may be a clue that multiple +collectors are sending the same samples. You might also see errors in your +backend. For example, with a Prometheus backend: + +`Error on ingesting out-of-order samples` + +This error could indicate that identical targets exist in two jobs, and the +order of the timestamps is incorrect. For example: + +- Metric `M1` received at `T1` with a timestamp 13:56:04 with value `100` +- Metric `M1` received at `T2` with a timestamp 13:56:24 with value `120` +- Metric `M1` received at `T3` with a timestamp 13:56:04 with value `110` +- Metric `M1` received at time 13:56:24 with value `120` +- Metric `M1` received at time 13:56:04 with value `110` + +### Best practices + +- Use the + [Kubernetes attributes processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/k8sattributesprocessor) + to add labels to different Kubernetes resources. +- Use the + [resource detector processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/README.md) + to detect resource information from the host and collect resource metadata.