-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add remote write protocol 2.0 support #9072
Comments
FYI otel collector contrib has lfx mentorship project starting in September to add support in the remote write exporter for remote write 2.0. Tracking issue: open-telemetry/opentelemetry-collector-contrib#33661. |
While testing POC for grafana/mimir#9072 I saw no unit or help metadata. Our test env: https://github.com/grafana/mimir/tree/main/development/mimir-monolithic-mode doesn't have units, so that was empty and cleared the help due to this bug. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
…tadata Found during testing for grafana/mimir#9072 Debug printout showed: KRAJO: seriesName=cortex_request_duration_seconds_bucket, metricFamily=cortex_request_duration_seconds_bucket, type=GAUGE, help=cortex_bucket_index_load_duration_seconds_sum, unit= which is nonsense. I can imagine more cases where this is the case and makes actual sense. Some targets might miss metadata and if there's a pipeline that loses it. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
As far as I can tell we don't cast Prometheus Remote Write 1.0 histogram into mimirpb.Histogram anymore. On the flip-side this test fails in #10432 because we're going to store RW 2.0 extra field in mimirpb.Histogram. Related to #9072 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
As far as I can tell we don't cast Prometheus Remote Write 1.0 histogram into mimirpb.Histogram anymore. On the flip-side this test fails in #10432 because we're going to store RW 2.0 extra field in mimirpb.Histogram. Related to #9072 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Based on #10432 Task list:
|
While testing POC for grafana/mimir#9072 I saw no unit or help metadata. Our test env: https://github.com/grafana/mimir/tree/main/development/mimir-monolithic-mode doesn't have units, so that was empty and cleared the help due to this bug. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
…tadata Found during testing for grafana/mimir#9072 Debug printout showed: KRAJO: seriesName=cortex_request_duration_seconds_bucket, metricFamily=cortex_request_duration_seconds_bucket, type=GAUGE, help=cortex_bucket_index_load_duration_seconds_sum, unit= which is nonsense. I can imagine more cases where this is the case and makes actual sense. Some targets might miss metadata and if there's a pipeline that loses it. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
As far as I can tell we don't cast Prometheus Remote Write 1.0 histogram into mimirpb.Histogram anymore. On the flip-side this test fails in #10432 because we're going to store RW 2.0 extra field in mimirpb.Histogram. Related to #9072 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Just want to raise that I have done a bunch of investigation on performance and viability of replacing Gogo and remote write v2 is easily our best chance to have the next generation of protos here move off of Gogo. The stuff we have hand-rolled manually hacked into the Gogo codegen, with yoloString, memory pooling and pre-allocation is fully supported now in libraries like So the potential exists to get the same or better performance with a fraction of the maintenance that comes from using the custom types and manually changing copy-pasted marshaling methods.
|
What should we do?
Prometheus added remote write protocol 2.0 experimental support in v2.54.0 (released on 2024-08-09). We should add the support in Mimir too.
How will we do it (roughly)?
Private design doc: https://docs.google.com/document/d/1JSwhdWRODOeGlNIRpYvnEHK6aH7d42n4ZJ_rfFvt-Lo/edit?tab=t.0#heading=h.5sybau7waq2q
Out of the scope of this work:
Size?
Between Medium (= ~1 month) and Large (= ~3 month).
What will we deliver?
What are the documentation dependencies?
Urgency?
Not urgent yet, but we can't lag too much behind Prometheus
The text was updated successfully, but these errors were encountered: