-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exporter prometheusremotewrite keeps sending data for 5m while receiver has only 1 data point #27893
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I have looked a bit more into this problem. As I use Grafana Cloud, I thought this might be a problem with Mimir. I did a reach for this and found this article: If prometheus scrape detects that an instance is down, if marks all related time series stale with a stale marker. But how can that be applied? The prometheusreceiver can implement this probably. Same as the httpreceiver, old time series should be marked stale somehow. 5 minutes is a long time to detect if an application has stopped. |
I believe the PRW exporter will not send points more than once unless it is retrying a failure. I strongly suspect what is happening is that prometheus displays a line for 5 minutes after it receives a point unless it receives a staleness marker. But since staleness markers are prometheus-specific, you won't get them when receiving data from non-prometheus sources.
The prometheus receiver does implement this, and it should work correctly. It uses the OTLP data point flag for "no recorded value" to indicate that a series is stale. The PRW exporter should send a staleness marker when it sees that data point flag. Overall, this is WAI, although the current UX isn't ideal. There are two potential paths forward:
|
/cc @Aneurysm9 @rapphil |
I solved by setting the Mimir parameter: lookback_delta: 1s |
I don't think this should be marked as bug, nor it's an issue with the remote write exporter, as metion in #27893 (comment), this is due to the other types of receivers not having staleness markers. When the metrics are ingested via a receiver that supports them, the remote write exporter sends it to the prometheus backend. What do you think @dashpole? |
Agreed. I consider this a feature request to add a notion of staleness to OTel, which would is presumably blocked on such a thing existing in the specification. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
Component(s)
exporter/prometheusremotewrite
What happened?
Description
prometheusremotewrite keeps sending data while the receiver only provides data once in a while or stopped delivering data.
What is the problem with that:
httpreceiver
is will first report:As soon as it gives problems the new data is from the httpreceiver will be like:
But the actual send by prometheusremotewrite for 5 minutes.
As soon as this is flacky, you will not see the switches either.
Steps to Reproduce
It is easy to reproduce with influxdb receiver (as provided in the separate config)
Expected Result
It is unexpected behaviour to keep it sending. Send a single datapoint only once.
If there no other option, then at least make this configurable how long it will keep sending stale data.
Actual Result
prometheus
endpoint shows this data for the period as defined withmetric_expiration
. Forprometheus
endpoint I can understand that you don't know if the endpoint has been scraped.prometheus
has the settingsend_timestamps: true
, then you can see when that last value is updated. A scraper can detect old/stale data.prometheusremotewrite
keeps sending the data for 5 minutesCollector version
0.87.0
Environment information
OpenTelemetry Collector 0.87.0 docker container: otel/opentelemetry-collector-contrib:0.87.0
OpenTelemetry Collector configuration
Log output
Additional context
No response
The text was updated successfully, but these errors were encountered: