Skip to content

feat/backpressure with retryable codes #40509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

bmiguel-teixeira
Copy link
Contributor

@bmiguel-teixeira bmiguel-teixeira commented Jun 5, 2025

Description

Change the Exporter returns codes so it provides proper information to the OTEL Pipeline.
In this case, it provides backpressure information for retryable codes.

I have followed the conventions from https://opentelemetry.io/docs/specs/otlp/#failures-1
With the exception of making ALL 5xx retryable, since the same logic is applied for current built-in retry mechanism of the exporter.

Why this change?

In some scenarios in the OTEL Config pipelines, we run the prometheusremotewriteexporter in a "failfast" configuration.
This means the exporter tries to deliver, and if it fails, it moves on immediately.

However, in certain scenarios, id like for this "exit code" to be propagated back to the original client. This behaviour is natively implemented in OTLP Exporters.

Currently: ANY error is always cataloged as permanent error to the OTEL pipeline.
Now: Most OTLP Compliant codes + ALL 5xx are catalog as retryable, allowing the error to due backpressure to the OTEL pipeline and allowing final error to report back to the upstream.

P.S: Unsure if this is considered a breaking change?

Cheers,

Example of fail fast configuration


receivers:
  otlp:
    protocols:
      http:
        endpoint: "0.0.0.0:4318"

processors: {}

exporters:
  prometheusremotewrite:
    endpoint: http://localhost:9090/api/v1/write
    remote_write_queue:
      enabled: false
    retry_on_failure:
      enabled: false

service:
  extensions: []
  pipelines:
    metrics:
      receivers: [otlp]
      processors: []
      exporters: [prometheusremotewrite]

Link to tracking issue

N/A

Testing

Unit tests, and some local testing.

Documentation

TBD

@bmiguel-teixeira
Copy link
Contributor Author

I will take care of the linting and failling if this initial approach is valid!

@bmiguel-teixeira bmiguel-teixeira marked this pull request as ready for review June 5, 2025 15:11
@bmiguel-teixeira bmiguel-teixeira requested review from dashpole, ArthurSens and a team as code owners June 5, 2025 15:11
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jun 20, 2025
@github-actions github-actions bot requested review from Aneurysm9, rapphil and ywwg June 25, 2025 09:46
@bmiguel-teixeira
Copy link
Contributor Author

Any news here?

@github-actions github-actions bot removed the Stale label Jun 26, 2025
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jul 10, 2025
@atoulme
Copy link
Contributor

atoulme commented Jul 16, 2025

@ArthurSens please take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants