From 33864522628136d764c4f8ae284982a27cd36710 Mon Sep 17 00:00:00 2001 From: Michael Terranova Date: Sun, 8 Sep 2024 11:03:39 -0400 Subject: [PATCH 1/7] add single-writer-principle note to collector docs --- .../en/docs/collector/deployment/gateway.md | 45 +++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/content/en/docs/collector/deployment/gateway.md b/content/en/docs/collector/deployment/gateway.md index 5f0547c8b1f0..e4db14b68e75 100644 --- a/content/en/docs/collector/deployment/gateway.md +++ b/content/en/docs/collector/deployment/gateway.md @@ -218,3 +218,48 @@ Cons: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor [spanmetrics-connector]: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/connector/spanmetricsconnector + +## Multiple Collectors / Single Writer Principle + +All metric data streams within +OTLP must have a [single writer](/docs/specs/otel/metrics/data-model/#single-writer). +When deploying multiple collectors in a gateway configuration, it's important to +ensure that all metric data streams have a single writer and a globally unique +identity. + + + +### Potential Problems + +Concurrent access from multiple applications that modify or report on +the same data can lead to data loss or, at least, degraded data +quality. An example would be something like inconsistent data from multiple sources +on the same resource, where the different sources can overwrite each other because +the resource is not uniquely identified. + + +There are patterns in the data that may provide some insight into whether this +is happening or not. For example, upon visual inspection, a series with +unexplained gaps or jumps in the same series may be a clue that multiple +collectors are sending the same samples. + +There are also more direct errors that could surface in the backend. + +With a Prometheus backend, an example error is: +`Error on ingesting out-of-order samples`. + +This could indicate that identical targets exist in two jobs, and the order of +the timestamps is incorrect. + +Ex: +- Metric `M1` received at time 13:56:04 with value `100` +- Metric `M1` received at time 13:56:24 with value `120` +- Metric `M1` received at time 13:56:04 with value `110` + + +### Suggestions + +- Use the [k8sattributesprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/k8sattributesprocessor) + to add labels to kubernetes resources +- Use the [resource detector processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/README.md) + to detect resource information from the host and collect metadata related to them. From 9da5e866ee9c74e3853ec50a4c05a459111f1003 Mon Sep 17 00:00:00 2001 From: Michael Terranova Date: Mon, 9 Sep 2024 10:12:56 -0400 Subject: [PATCH 2/7] fix linting issue on github links, correct spelling --- content/en/docs/collector/deployment/gateway.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/en/docs/collector/deployment/gateway.md b/content/en/docs/collector/deployment/gateway.md index e4db14b68e75..3c2073f621f7 100644 --- a/content/en/docs/collector/deployment/gateway.md +++ b/content/en/docs/collector/deployment/gateway.md @@ -259,7 +259,7 @@ Ex: ### Suggestions -- Use the [k8sattributesprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/k8sattributesprocessor) +- Use the [Kubernetes Attributes Processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/k8sattributesprocessor) to add labels to kubernetes resources -- Use the [resource detector processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/README.md) +- Use the [Resource Detector Processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/README.md) to detect resource information from the host and collect metadata related to them. From 1695255f1a3f51823f3a54eac4a728a29cb35972 Mon Sep 17 00:00:00 2001 From: Michael Terranova Date: Fri, 27 Sep 2024 12:22:47 -0400 Subject: [PATCH 3/7] response to some review comments --- .../en/docs/collector/deployment/gateway.md | 32 +++++++------------ 1 file changed, 12 insertions(+), 20 deletions(-) diff --git a/content/en/docs/collector/deployment/gateway.md b/content/en/docs/collector/deployment/gateway.md index 3c2073f621f7..89e6c48ad9c7 100644 --- a/content/en/docs/collector/deployment/gateway.md +++ b/content/en/docs/collector/deployment/gateway.md @@ -201,13 +201,11 @@ collector. ## Tradeoffs Pros: - - Separation of concerns such as centrally managed credentials - Centralized policy management (for example, filtering certain logs or sampling) Cons: - - It's one more thing to maintain and that can fail (complexity) - Added latency in case of cascaded collectors - Higher overall resource usage (costs) @@ -219,7 +217,7 @@ Cons: [spanmetrics-connector]: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/connector/spanmetricsconnector -## Multiple Collectors / Single Writer Principle +## Multiple collectors and the single-writer principle All metric data streams within OTLP must have a [single writer](/docs/specs/otel/metrics/data-model/#single-writer). @@ -227,39 +225,33 @@ When deploying multiple collectors in a gateway configuration, it's important to ensure that all metric data streams have a single writer and a globally unique identity. - - -### Potential Problems +### Potential problems Concurrent access from multiple applications that modify or report on -the same data can lead to data loss or, at least, degraded data -quality. An example would be something like inconsistent data from multiple sources +the same data can lead to data loss or degraded data +quality. For example, you might see inconsistent data from multiple sources on the same resource, where the different sources can overwrite each other because the resource is not uniquely identified. - There are patterns in the data that may provide some insight into whether this is happening or not. For example, upon visual inspection, a series with unexplained gaps or jumps in the same series may be a clue that multiple -collectors are sending the same samples. - +collectors are sending the same samples. There are also more direct errors that could surface in the backend. With a Prometheus backend, an example error is: `Error on ingesting out-of-order samples`. -This could indicate that identical targets exist in two jobs, and the order of -the timestamps is incorrect. +This error could indicate that identical targets exist in two jobs, and the order of +the timestamps is incorrect. For example: -Ex: - Metric `M1` received at time 13:56:04 with value `100` - Metric `M1` received at time 13:56:24 with value `120` - Metric `M1` received at time 13:56:04 with value `110` +### Best practices -### Suggestions - -- Use the [Kubernetes Attributes Processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/k8sattributesprocessor) - to add labels to kubernetes resources -- Use the [Resource Detector Processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/README.md) - to detect resource information from the host and collect metadata related to them. +- Use the [Kubernetes attributes processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/k8sattributesprocessor) + to add labels to Kubernetes resources. +- Use the [resource detector processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/README.md) + to detect resource information from the host and collect resource metadata. From 0822a25de36c512265a5b991150c0d3312a3d5f6 Mon Sep 17 00:00:00 2001 From: michael2893 Date: Thu, 2 Jan 2025 09:35:28 -0500 Subject: [PATCH 4/7] Update content/en/docs/collector/deployment/gateway.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Juraci Paixão Kröhling --- content/en/docs/collector/deployment/gateway.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/collector/deployment/gateway.md b/content/en/docs/collector/deployment/gateway.md index 89e6c48ad9c7..baed67c1b347 100644 --- a/content/en/docs/collector/deployment/gateway.md +++ b/content/en/docs/collector/deployment/gateway.md @@ -252,6 +252,6 @@ the timestamps is incorrect. For example: ### Best practices - Use the [Kubernetes attributes processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/k8sattributesprocessor) - to add labels to Kubernetes resources. + to add labels to different Kubernetes resources. - Use the [resource detector processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/README.md) to detect resource information from the host and collect resource metadata. From be42e53fcf8335af960fdd380ac9c961515c2634 Mon Sep 17 00:00:00 2001 From: michael2893 Date: Thu, 2 Jan 2025 09:35:37 -0500 Subject: [PATCH 5/7] Update content/en/docs/collector/deployment/gateway.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Juraci Paixão Kröhling --- content/en/docs/collector/deployment/gateway.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/content/en/docs/collector/deployment/gateway.md b/content/en/docs/collector/deployment/gateway.md index baed67c1b347..731a63c5f13c 100644 --- a/content/en/docs/collector/deployment/gateway.md +++ b/content/en/docs/collector/deployment/gateway.md @@ -245,7 +245,9 @@ With a Prometheus backend, an example error is: This error could indicate that identical targets exist in two jobs, and the order of the timestamps is incorrect. For example: -- Metric `M1` received at time 13:56:04 with value `100` +- Metric `M1` received at `T1` with a timestamp 13:56:04 with value `100` +- Metric `M1` received at `T2` with a timestamp 13:56:24 with value `120` +- Metric `M1` received at `T3` with a timestamp 13:56:04 with value `110` - Metric `M1` received at time 13:56:24 with value `120` - Metric `M1` received at time 13:56:04 with value `110` From 6eaa78ee14042ecc74a911db5b4e60a0920f2839 Mon Sep 17 00:00:00 2001 From: michael2893 Date: Thu, 2 Jan 2025 09:36:37 -0500 Subject: [PATCH 6/7] Update content/en/docs/collector/deployment/gateway.md Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com> --- content/en/docs/collector/deployment/gateway.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/collector/deployment/gateway.md b/content/en/docs/collector/deployment/gateway.md index 731a63c5f13c..0aaf41ee30c3 100644 --- a/content/en/docs/collector/deployment/gateway.md +++ b/content/en/docs/collector/deployment/gateway.md @@ -237,7 +237,7 @@ There are patterns in the data that may provide some insight into whether this is happening or not. For example, upon visual inspection, a series with unexplained gaps or jumps in the same series may be a clue that multiple collectors are sending the same samples. -There are also more direct errors that could surface in the backend. +You might also see errors in your backend. For example, with a Prometheus backend: With a Prometheus backend, an example error is: `Error on ingesting out-of-order samples`. From 6a22294446c921ab6795eba30439f2dd6bdd3f84 Mon Sep 17 00:00:00 2001 From: michael2893 Date: Thu, 2 Jan 2025 09:36:45 -0500 Subject: [PATCH 7/7] Update content/en/docs/collector/deployment/gateway.md Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com> --- content/en/docs/collector/deployment/gateway.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/content/en/docs/collector/deployment/gateway.md b/content/en/docs/collector/deployment/gateway.md index 0aaf41ee30c3..fd1094a0017f 100644 --- a/content/en/docs/collector/deployment/gateway.md +++ b/content/en/docs/collector/deployment/gateway.md @@ -239,8 +239,7 @@ unexplained gaps or jumps in the same series may be a clue that multiple collectors are sending the same samples. You might also see errors in your backend. For example, with a Prometheus backend: -With a Prometheus backend, an example error is: -`Error on ingesting out-of-order samples`. +`Error on ingesting out-of-order samples` This error could indicate that identical targets exist in two jobs, and the order of the timestamps is incorrect. For example: