Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disappearing metrics when using prometheus exporter when a histogram metric is received with same name but a different description #36493

Closed
tqi-raurora opened this issue Nov 21, 2024 · 6 comments
Labels
bug Something isn't working exporter/prometheus needs triage New item requiring triage

Comments

@tqi-raurora
Copy link

tqi-raurora commented Nov 21, 2024

Component(s)

exporter/prometheus

What happened?

Description

On otel collector contrib 0.111.0 running as a systemd service, when exposing metrics with Prometheus exporter I can see that sometimes some metrics are missing.

For example http_server_duration_milliseconds_count would most times return 44 samples, but some times return 6 samples, even running the test on the same second

I tested this with curl to discard possible scraping errors:

while true; do curl -s http://localhost:19130/metrics | grep http_server_duration_milliseconds_count | wc -l; done
44
44
6
44
44
44
6
44
6

Of course, this causes the scrape to sometimes have missing data seemingly at random, causing "gaps" on data.

After digging around, I isolated the issue to at least one histogram metric. When I filter out this metric, the issue goes away

http.server.duration{service.name=ps-sac-fe}

In other words, it seem this histogram is somehow breaking the prometheus exporter

Steps to Reproduce

This issue happened in a production collector. I'm still not really sure why it's happening, but I exported the metric that seems to be the culprit to debug, and added it to the "Log output" session.
I am not sure what would be wrong with the metric, but when I filter the metric out, the issue does go away

Expected Result

Multiple curl's to localhost/metrics should return the same number of timeseries somewhat consistently

Actual Result

Multiple curl's to localhost/metrics return different number of timeseries: some timeseries are missing seemingly at random, even when the curl is made at the same minute or even same second

If any other test is needed, please let me know

Collector version

v0.111.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
        max_recv_msg_size_mib: 16

processors:

  filter/leave_only_http_server_duration:
    error_mode: ignore
    metrics:
      metric:
        - 'name != "http.server.duration" '

  filter/leave_only_ps_sac_fe:
    error_mode: ignore
    metrics:
      metric:
        - 'name != "http.server.duration" or not IsMatch(resource.attributes["service.name"], "^ps-sac-fe$")'

exporters:

  prometheus/debug:
    endpoint: "0.0.0.0:19130"
    metric_expiration: 10m

  debug:
    verbosity: detailed
    sampling_initial: 5
    sampling_thereafter: 200

service:
  pipelines:

    metrics/prometheus_debug:
      receivers: [otlp]
      processors: [filter/leave_only_http_server_duration]
      exporters: [prometheus/debug]

    metrics/debug:
      receivers: [otlp]
      processors: [filter/leave_only_ps_sac_fe]
      exporters: [debug]

Log output

2024-11-21T17:05:10.384581-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: 2024-11-21T17:05:10.383-0300#011info#011MetricsExporter#011{"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 1, "data points": 5}
2024-11-21T17:05:10.384733-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: 2024-11-21T17:05:10.384-0300#011info#011ResourceMetrics #0
2024-11-21T17:05:10.384776-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Resource SchemaURL:
2024-11-21T17:05:10.384817-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Resource attributes:
2024-11-21T17:05:10.384906-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> service.name: Str(ps-sac-fe)
2024-11-21T17:05:10.384981-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> telemetry.sdk.language: Str(nodejs)
2024-11-21T17:05:10.385018-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> telemetry.sdk.name: Str(opentelemetry)
2024-11-21T17:05:10.385087-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> telemetry.sdk.version: Str(1.22.0)
2024-11-21T17:05:10.385163-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> process.pid: Int(19)
2024-11-21T17:05:10.385211-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> process.executable.name: Str(node)
2024-11-21T17:05:10.385246-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> process.executable.path: Str(/opt/node18/bin/node)
2024-11-21T17:05:10.385281-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> process.command_args: Slice(["/opt/node18/bin/node","--max-http-header-size=16384","-e","require('./cli/index')('nest');","nest","start"])
2024-11-21T17:05:10.385310-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> process.runtime.version: Str(18.18.2)
2024-11-21T17:05:10.385342-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> process.runtime.name: Str(nodejs)
2024-11-21T17:05:10.385371-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> process.runtime.description: Str(Node.js)
2024-11-21T17:05:10.385400-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> process.command: Str(nest)
2024-11-21T17:05:10.385429-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> process.owner: Str(root)
2024-11-21T17:05:10.385461-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ScopeMetrics #0
2024-11-21T17:05:10.385495-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ScopeMetrics SchemaURL:
2024-11-21T17:05:10.385532-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: InstrumentationScope @opentelemetry/instrumentation-http 0.48.0
2024-11-21T17:05:10.385561-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Metric #0
2024-11-21T17:05:10.385590-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Descriptor:
2024-11-21T17:05:10.385619-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> Name: http.server.duration
2024-11-21T17:05:10.385651-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> Description: Measures the duration of inbound HTTP requests.
2024-11-21T17:05:10.385681-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> Unit: ms
2024-11-21T17:05:10.385713-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> DataType: Histogram
2024-11-21T17:05:10.385748-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> AggregationTemporality: Cumulative
2024-11-21T17:05:10.385817-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: HistogramDataPoints #0
2024-11-21T17:05:10.385888-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Data point attributes:
2024-11-21T17:05:10.385957-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.scheme: Str(http)
2024-11-21T17:05:10.386003-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.method: Str(GET)
2024-11-21T17:05:10.386071-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> net.host.name: Str(10.196.128.37)
2024-11-21T17:05:10.386114-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.flavor: Str(1.1)
2024-11-21T17:05:10.386146-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.status_code: Int(200)
2024-11-21T17:05:10.386250-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> net.host.port: Int(80)
2024-11-21T17:05:10.386321-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.route: Str(/health-check)
2024-11-21T17:05:10.386359-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: StartTimestamp: 2024-11-20 08:38:15.617 +0000 UTC
2024-11-21T17:05:10.386399-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Timestamp: 2024-11-21 20:05:10.323 +0000 UTC
2024-11-21T17:05:10.386437-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Count: 1067
2024-11-21T17:05:10.386473-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Sum: 3442.907700
2024-11-21T17:05:10.386504-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Min: 2.335438
2024-11-21T17:05:10.386540-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Max: 15.444237
2024-11-21T17:05:10.386569-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #0: 0.000000
2024-11-21T17:05:10.386605-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #1: 5.000000
2024-11-21T17:05:10.386639-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #2: 10.000000
2024-11-21T17:05:10.386672-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #3: 25.000000
2024-11-21T17:05:10.386712-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #4: 50.000000
2024-11-21T17:05:10.386749-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #5: 75.000000
2024-11-21T17:05:10.386783-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #6: 100.000000
2024-11-21T17:05:10.386817-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #7: 250.000000
2024-11-21T17:05:10.386855-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #8: 500.000000
2024-11-21T17:05:10.386890-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #9: 750.000000
2024-11-21T17:05:10.387023-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #10: 1000.000000
2024-11-21T17:05:10.387139-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #11: 2500.000000
2024-11-21T17:05:10.387213-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #12: 5000.000000
2024-11-21T17:05:10.387251-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #13: 7500.000000
2024-11-21T17:05:10.387290-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #14: 10000.000000
2024-11-21T17:05:10.387322-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #0, Count: 0
2024-11-21T17:05:10.387366-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #1, Count: 1058
2024-11-21T17:05:10.387404-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #2, Count: 8
2024-11-21T17:05:10.387444-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #3, Count: 1
2024-11-21T17:05:10.387497-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #4, Count: 0
2024-11-21T17:05:10.387536-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #5, Count: 0
2024-11-21T17:05:10.387574-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #6, Count: 0
2024-11-21T17:05:10.387607-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #7, Count: 0
2024-11-21T17:05:10.387647-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #8, Count: 0
2024-11-21T17:05:10.387684-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #9, Count: 0
2024-11-21T17:05:10.387720-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #10, Count: 0
2024-11-21T17:05:10.387755-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #11, Count: 0
2024-11-21T17:05:10.387791-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #12, Count: 0
2024-11-21T17:05:10.387825-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #13, Count: 0
2024-11-21T17:05:10.387864-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #14, Count: 0
2024-11-21T17:05:10.387900-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #15, Count: 0
2024-11-21T17:05:10.387935-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: HistogramDataPoints #1
2024-11-21T17:05:10.387971-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Data point attributes:
2024-11-21T17:05:10.388062-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.scheme: Str(http)
2024-11-21T17:05:10.388112-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.method: Str(GET)
2024-11-21T17:05:10.388143-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> net.host.name: Str(minhasenha.pagseguro.uol.com.br)
2024-11-21T17:05:10.388173-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.flavor: Str(1.1)
2024-11-21T17:05:10.388213-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.status_code: Int(200)
2024-11-21T17:05:10.388246-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> net.host.port: Int(80)
2024-11-21T17:05:10.388370-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.route: Str(*)
2024-11-21T17:05:10.388416-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: StartTimestamp: 2024-11-20 11:26:37.844 +0000 UTC
2024-11-21T17:05:10.388451-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Timestamp: 2024-11-21 20:05:10.323 +0000 UTC
2024-11-21T17:05:10.388483-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Count: 12
2024-11-21T17:05:10.388520-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Sum: 3507.180335
2024-11-21T17:05:10.388555-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Min: 29.730280
2024-11-21T17:05:10.388590-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Max: 654.425868
2024-11-21T17:05:10.388622-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #0: 0.000000
2024-11-21T17:05:10.388659-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #1: 5.000000
2024-11-21T17:05:10.388692-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #2: 10.000000
2024-11-21T17:05:10.388720-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #3: 25.000000
2024-11-21T17:05:10.388748-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #4: 50.000000
2024-11-21T17:05:10.388776-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #5: 75.000000
2024-11-21T17:05:10.388804-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #6: 100.000000
2024-11-21T17:05:10.388834-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #7: 250.000000
2024-11-21T17:05:10.388862-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #8: 500.000000
2024-11-21T17:05:10.388898-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #9: 750.000000
2024-11-21T17:05:10.388927-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #10: 1000.000000
2024-11-21T17:05:10.388958-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #11: 2500.000000
2024-11-21T17:05:10.388988-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #12: 5000.000000
2024-11-21T17:05:10.389018-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #13: 7500.000000
2024-11-21T17:05:10.389052-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #14: 10000.000000
2024-11-21T17:05:10.389084-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #0, Count: 0
2024-11-21T17:05:10.389116-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #1, Count: 0
2024-11-21T17:05:10.389152-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #2, Count: 0
2024-11-21T17:05:10.389185-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #3, Count: 0
2024-11-21T17:05:10.389220-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #4, Count: 5
2024-11-21T17:05:10.389250-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #5, Count: 0
2024-11-21T17:05:10.389283-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #6, Count: 0
2024-11-21T17:05:10.389312-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #7, Count: 2
2024-11-21T17:05:10.389352-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #8, Count: 0
2024-11-21T17:05:10.389381-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #9, Count: 5
2024-11-21T17:05:10.389413-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #10, Count: 0
2024-11-21T17:05:10.389442-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #11, Count: 0
2024-11-21T17:05:10.389472-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #12, Count: 0
2024-11-21T17:05:10.389507-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #13, Count: 0
2024-11-21T17:05:10.389540-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #14, Count: 0
2024-11-21T17:05:10.389570-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #15, Count: 0
2024-11-21T17:05:10.389599-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: HistogramDataPoints #2
2024-11-21T17:05:10.389633-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Data point attributes:
2024-11-21T17:05:10.389666-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.scheme: Str(http)
2024-11-21T17:05:10.389698-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.method: Str(GET)
2024-11-21T17:05:10.389726-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> net.host.name: Str(minhasenha.qa.pagseguro.uol.com.br)
2024-11-21T17:05:10.389761-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.flavor: Str(1.1)
2024-11-21T17:05:10.389794-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.status_code: Int(200)
2024-11-21T17:05:10.389827-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> net.host.port: Int(80)
2024-11-21T17:05:10.389859-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.route: Str(*)
2024-11-21T17:05:10.389891-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: StartTimestamp: 2024-11-20 11:28:47.547 +0000 UTC
2024-11-21T17:05:10.389930-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Timestamp: 2024-11-21 20:05:10.323 +0000 UTC
2024-11-21T17:05:10.389963-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Count: 38
2024-11-21T17:05:10.389996-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Sum: 20740.379206
2024-11-21T17:05:10.390025-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Min: 23.677466
2024-11-21T17:05:10.390055-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Max: 1978.237704
2024-11-21T17:05:10.390094-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #0: 0.000000
2024-11-21T17:05:10.390128-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #1: 5.000000
2024-11-21T17:05:10.390158-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #2: 10.000000
2024-11-21T17:05:10.390203-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #3: 25.000000
2024-11-21T17:05:10.390237-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #4: 50.000000
2024-11-21T17:05:10.390268-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #5: 75.000000
2024-11-21T17:05:10.390301-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #6: 100.000000
2024-11-21T17:05:10.390334-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #7: 250.000000
2024-11-21T17:05:10.390366-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #8: 500.000000
2024-11-21T17:05:10.390396-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #9: 750.000000
2024-11-21T17:05:10.390432-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #10: 1000.000000
2024-11-21T17:05:10.390463-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #11: 2500.000000
2024-11-21T17:05:10.390493-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #12: 5000.000000
2024-11-21T17:05:10.390526-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #13: 7500.000000
2024-11-21T17:05:10.390560-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #14: 10000.000000
2024-11-21T17:05:10.390588-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #0, Count: 0
2024-11-21T17:05:10.390617-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #1, Count: 0
2024-11-21T17:05:10.390650-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #2, Count: 0
2024-11-21T17:05:10.390680-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #3, Count: 1
2024-11-21T17:05:10.390714-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #4, Count: 23
2024-11-21T17:05:10.390752-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #5, Count: 0
2024-11-21T17:05:10.390780-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #6, Count: 0
2024-11-21T17:05:10.390810-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #7, Count: 1
2024-11-21T17:05:10.390848-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #8, Count: 0
2024-11-21T17:05:10.390876-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #9, Count: 0
2024-11-21T17:05:10.390905-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #10, Count: 0
2024-11-21T17:05:10.390934-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #11, Count: 13
2024-11-21T17:05:10.390968-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #12, Count: 0
2024-11-21T17:05:10.390996-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #13, Count: 0
2024-11-21T17:05:10.391032-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #14, Count: 0
2024-11-21T17:05:10.391131-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #15, Count: 0
2024-11-21T17:05:10.391193-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: HistogramDataPoints #3
2024-11-21T17:05:10.391240-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Data point attributes:
2024-11-21T17:05:10.391275-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.scheme: Str(http)
2024-11-21T17:05:10.391568-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.method: Str(GET)
2024-11-21T17:05:10.391619-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> net.host.name: Str(minhasenha.qa.pagseguro.uol.com.br)
2024-11-21T17:05:10.391659-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.flavor: Str(1.1)
2024-11-21T17:05:10.391885-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.status_code: Int(302)
2024-11-21T17:05:10.391986-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> net.host.port: Int(80)
2024-11-21T17:05:10.392178-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.route: Str(*)
2024-11-21T17:05:10.392340-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: StartTimestamp: 2024-11-20 22:54:43.514 +0000 UTC
2024-11-21T17:05:10.392396-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Timestamp: 2024-11-21 20:05:10.323 +0000 UTC
2024-11-21T17:05:10.392427-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Count: 5
2024-11-21T17:05:10.392455-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Sum: 7252.861480
2024-11-21T17:05:10.392484-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Min: 1158.140621
2024-11-21T17:05:10.392580-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Max: 1765.669495
2024-11-21T17:05:10.392639-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #0: 0.000000
2024-11-21T17:05:10.392676-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #1: 5.000000
2024-11-21T17:05:10.392706-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #2: 10.000000
2024-11-21T17:05:10.392739-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #3: 25.000000
2024-11-21T17:05:10.392768-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #4: 50.000000
2024-11-21T17:05:10.392796-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #5: 75.000000
2024-11-21T17:05:10.392824-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #6: 100.000000
2024-11-21T17:05:10.392852-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #7: 250.000000
2024-11-21T17:05:10.392880-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #8: 500.000000
2024-11-21T17:05:10.392908-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #9: 750.000000
2024-11-21T17:05:10.392940-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #10: 1000.000000
2024-11-21T17:05:10.392969-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #11: 2500.000000
2024-11-21T17:05:10.392997-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #12: 5000.000000
2024-11-21T17:05:10.393025-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #13: 7500.000000
2024-11-21T17:05:10.393054-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #14: 10000.000000
2024-11-21T17:05:10.393082-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #0, Count: 0
2024-11-21T17:05:10.393111-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #1, Count: 0
2024-11-21T17:05:10.393139-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #2, Count: 0
2024-11-21T17:05:10.393172-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #3, Count: 0
2024-11-21T17:05:10.393201-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #4, Count: 0
2024-11-21T17:05:10.393230-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #5, Count: 0
2024-11-21T17:05:10.393261-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #6, Count: 0
2024-11-21T17:05:10.393290-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #7, Count: 0
2024-11-21T17:05:10.393318-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #8, Count: 0
2024-11-21T17:05:10.393346-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #9, Count: 0
2024-11-21T17:05:10.393375-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #10, Count: 0
2024-11-21T17:05:10.393404-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #11, Count: 5
2024-11-21T17:05:10.393441-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #12, Count: 0
2024-11-21T17:05:10.393470-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #13, Count: 0
2024-11-21T17:05:10.393498-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #14, Count: 0
2024-11-21T17:05:10.393529-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #15, Count: 0
2024-11-21T17:05:10.393558-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: HistogramDataPoints #4
2024-11-21T17:05:10.393589-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Data point attributes:
2024-11-21T17:05:10.393617-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.scheme: Str(http)
2024-11-21T17:05:10.393663-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.method: Str(GET)
2024-11-21T17:05:10.393692-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> net.host.name: Str(minhasenha.pagseguro.uol.com.br)
2024-11-21T17:05:10.393721-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.flavor: Str(1.1)
2024-11-21T17:05:10.393749-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.status_code: Int(304)
2024-11-21T17:05:10.393786-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> net.host.port: Int(80)
2024-11-21T17:05:10.393818-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]:      -> http.route: Str(*)
2024-11-21T17:05:10.393847-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: StartTimestamp: 2024-11-21 14:45:08.887 +0000 UTC
2024-11-21T17:05:10.393881-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Timestamp: 2024-11-21 20:05:10.323 +0000 UTC
2024-11-21T17:05:10.393919-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Count: 1
2024-11-21T17:05:10.393951-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Sum: 33.281220
2024-11-21T17:05:10.393981-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Min: 33.281220
2024-11-21T17:05:10.394056-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Max: 33.281220
2024-11-21T17:05:10.394114-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #0: 0.000000
2024-11-21T17:05:10.394200-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #1: 5.000000
2024-11-21T17:05:10.394275-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #2: 10.000000
2024-11-21T17:05:10.394342-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #3: 25.000000
2024-11-21T17:05:10.394410-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #4: 50.000000
2024-11-21T17:05:10.394499-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #5: 75.000000
2024-11-21T17:05:10.394592-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #6: 100.000000
2024-11-21T17:05:10.394632-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #7: 250.000000
2024-11-21T17:05:10.394668-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #8: 500.000000
2024-11-21T17:05:10.394702-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #9: 750.000000
2024-11-21T17:05:10.394737-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #10: 1000.000000
2024-11-21T17:05:10.394777-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #11: 2500.000000
2024-11-21T17:05:10.394813-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #12: 5000.000000
2024-11-21T17:05:10.394848-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #13: 7500.000000
2024-11-21T17:05:10.394882-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: ExplicitBounds #14: 10000.000000
2024-11-21T17:05:10.394917-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #0, Count: 0
2024-11-21T17:05:10.394952-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #1, Count: 0
2024-11-21T17:05:10.394986-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #2, Count: 0
2024-11-21T17:05:10.395026-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #3, Count: 0
2024-11-21T17:05:10.395062-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #4, Count: 1
2024-11-21T17:05:10.395104-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #5, Count: 0
2024-11-21T17:05:10.395143-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #6, Count: 0
2024-11-21T17:05:10.395178-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #7, Count: 0
2024-11-21T17:05:10.395213-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #8, Count: 0
2024-11-21T17:05:10.395247-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #9, Count: 0
2024-11-21T17:05:10.395282-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #10, Count: 0
2024-11-21T17:05:10.395317-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #11, Count: 0
2024-11-21T17:05:10.395352-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #12, Count: 0
2024-11-21T17:05:10.395387-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #13, Count: 0
2024-11-21T17:05:10.395422-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #14, Count: 0
2024-11-21T17:05:10.395457-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: Buckets #15, Count: 0
2024-11-21T17:05:10.395495-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: #011{"kind": "exporter", "data_type": "metrics", "name": "debug"}
2024-11-21T17:05:11.198571-03:00 gt-otel-col-loadbalancing-qa1 otelcol-contrib[11894]: 2024-11-21T17:05:11.198-0300#011warn#011prometheusexporter@v0.111.0/accumulator.go:263#011Misaligned starting timestamps#011{"kind": "exporter", "data_type": "metrics", "name": "prometheus", "ip_start_time": "2024-11-21 20:05:09 +0000 UTC", "pp_start_time": "2024-11-21 20:04:36 +0000 UTC", "pp_timestamp": "2024-11-21 20:05:10 +0000 UTC", "ip_timestamp": "2024-11-21 20:05:10 +0000 UTC"}

Additional context

No response

@tqi-raurora tqi-raurora added bug Something isn't working needs triage New item requiring triage labels Nov 21, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dashpole
Copy link
Contributor

How often is the metric being collected? There is an expiration time in the exporter for old points.

@tqi-raurora
Copy link
Author

Hi @dashpole Thanks for your response

Metrics are scraped every minute.

Expiration time on prometheus exporter is set at 10 minutes.

However, when I was testing I actually didn't use the prometheus scrape: instead I logged into the host where otel collector is being run, and used curl to localhost to check on the prometheus exporter endpoint direclty like:

curl -s http://localhost:19130/metrics

I did this curl multiple times on the same minute.
I did this to simulate a scrape, and to discard possible problems with the prometheus scrapig process.

When I filter out the metric below, the problem goes away:

http.server.duration{service.name=ps-sac-fe}

I am trying to simulate a payload equal to the one that seems to be causing the issue, if I'm sucessful I will report it here

@tqi-raurora
Copy link
Author

Hello @dashpole

I did some testing and figured out what is causing the bug.

Cause:

Same metric being received with different descriptions.

Explanation:

The auto-instrumentation agent for nodejs version 1.22.0 sends the histogram metric http.server.duration with the description:
"Measures the duration of inbound HTTP requests."

However, the auto-instrumentation agent for java version 1.34.1 sends the same metric http.server.duration with a different description:
"The duration of the inbound HTTP request"

When this happens, the Prometheus exporter keeps "switching" back and forth which of the metrics it serves, seemingly at random.

How to reproduce

Pretty straightforward.

Use this config.yaml for the otel collector:

receivers:
  otlp:
    protocols:
      http:

exporters:

  prometheus:
    endpoint: "0.0.0.0:9130"
    metric_expiration: 20m

service:
  pipelines:

    metrics/otlp:
      receivers: [otlp]
      processors: []
      exporters: [prometheus]

Send a histogram metric with some description:

metrics_1.json

{
  "resourceMetrics": [
    {
      "resource": {
        "attributes": [
          {
            "key": "service.name",
            "value": {
              "stringValue": "service_a"
            }
          }
        ]
      },
      "scopeMetrics": [
        {
          "scope": {
            "name": "my.library",
            "version": "1.0.0",
            "attributes": [
              {
                "key": "my.scope.attribute",
                "value": {
                  "stringValue": "some scope attribute"
                }
              }
            ]
          },
          "metrics": [
            {
              "name": "my.histogram",
              "unit": "1",
              "description": "I am a Histogram",
              "histogram": {
                "aggregationTemporality": 1,
                "dataPoints": [
                  {
                    "startTimeUnixNano": "1544712660300000000",
                    "timeUnixNano": "1544712660300000000",
                    "count": 2,
                    "sum": 2,
                    "bucketCounts": [1,1],
                    "explicitBounds": [1],
                    "min": 0,
                    "max": 2,
                    "attributes": [
                      {
                        "key": "http.route",
                        "value": {
                          "stringValue": "/route_a"
                        }
                      }
                    ]
                  },
                  {
                    "startTimeUnixNano": "1544712660300000000",
                    "timeUnixNano": "1544712660300000000",
                    "count": 2,
                    "sum": 2,
                    "bucketCounts": [1,1],
                    "explicitBounds": [1],
                    "min": 0,
                    "max": 2,
                    "attributes": [
                      {
                        "key": "http.route",
                        "value": {
                          "stringValue": "/route_b"
                        }
                      }
                    ]
                  }
                ]
              }
            }
          ]
        }
      ]
    }
  ]
}

curl -X POST -H "Content-Type: application/json" -d @metrics_1.json -i localhost:4318/v1/metrics

Then send metric with the same name, but a different description

metrics_2.json

{
  "resourceMetrics": [
    {
      "resource": {
        "attributes": [
          {
            "key": "service.name",
            "value": {
              "stringValue": "service_b"
            }
          }
        ]
      },
      "scopeMetrics": [
        {
          "scope": {
            "name": "my.library",
            "version": "1.0.0",
            "attributes": [
              {
                "key": "my.scope.attribute",
                "value": {
                  "stringValue": "some scope attribute"
                }
              }
            ]
          },
          "metrics": [
            {
              "name": "my.histogram",
              "unit": "1",
              "description": "I am a Histogram with another description",
              "histogram": {
                "aggregationTemporality": 1,
                "dataPoints": [
                  {
                    "startTimeUnixNano": "1544712660300000000",
                    "timeUnixNano": "1544712660300000000",
                    "count": 2,
                    "sum": 2,
                    "bucketCounts": [1,1],
                    "explicitBounds": [1],
                    "min": 0,
                    "max": 2,
                    "attributes": [
                      {
                        "key": "http.route",
                        "value": {
                          "stringValue": "route_c"
                        }
                      }
                    ]
                  }
                ]
              }
            }
          ]
        }
      ]
    }
  ]
}

curl -X POST -H "Content-Type: application/json" -d @metrics_2.json -i localhost:4318/v1/metrics

Finally check the Prometheus exposed endpoint. It will sometimes show the first metric data points, and some times the second metric data points, seemingly at random.

Keep checking the endpoint multiple times repeatedly. Eventually you will see that the metric exposed changes, in an inconsistent manner.

Ex:

> curl -s localhost:9130/metrics
# HELP my_histogram I am a Histogram
# TYPE my_histogram histogram
my_histogram_bucket{http_route="/route_a",job="service_a",le="1"} 1
my_histogram_bucket{http_route="/route_a",job="service_a",le="+Inf"} 2
my_histogram_sum{http_route="/route_a",job="service_a"} 2
my_histogram_count{http_route="/route_a",job="service_a"} 2
my_histogram_bucket{http_route="/route_b",job="service_a",le="1"} 1
my_histogram_bucket{http_route="/route_b",job="service_a",le="+Inf"} 2
my_histogram_sum{http_route="/route_b",job="service_a"} 2
my_histogram_count{http_route="/route_b",job="service_a"} 2

...

> curl -s localhost:9130/metrics
# HELP my_histogram I am a Histogram with another description
# TYPE my_histogram histogram
my_histogram_bucket{http_route="route_c",job="service_b",le="1"} 1
my_histogram_bucket{http_route="route_c",job="service_b",le="+Inf"} 2
my_histogram_sum{http_route="route_c",job="service_b"} 2
my_histogram_count{http_route="route_c",job="service_b"} 2

Why is this a problem

I believe the prometheus exporter should be resilient enough to handle bad input.
As it stands right now, a single bad metric point can mess up every other metric of the same name

I would suggest that when receiving a metric point with same metric name but different description, the component could override the old description, or ignore the new description. Perhaps this could even be configured in the component? Ex:

override_old_description: true

If you need more details please let me know, I'll help however I can

@tqi-raurora tqi-raurora changed the title Disappearing metrics when using prometheus exporter when a specific histogram metric is received Disappearing metrics when using prometheus exporter when a histogram metric is received with same name but a different description Nov 22, 2024
@dashpole
Copy link
Contributor

I think this might be fixed by #36356. Can you try with v0.114.0 of the collector?

@tqi-raurora
Copy link
Author

Thanks, you're right v0.114.0 solves the issue!
It keeps the old description

> dpkg -l | grep otel
ii  otelcol-contrib                            0.114.0                                 amd64        OpenTelemetry Collector - otelcol-contrib

> curl -s localhost:9130/metrics
# HELP my_histogram I am a Histogram
# TYPE my_histogram histogram
my_histogram_bucket{http_route="/route_a",job="service_a",le="1"} 1
my_histogram_bucket{http_route="/route_a",job="service_a",le="+Inf"} 2
my_histogram_sum{http_route="/route_a",job="service_a"} 2
my_histogram_count{http_route="/route_a",job="service_a"} 2
my_histogram_bucket{http_route="/route_b",job="service_a",le="1"} 1
my_histogram_bucket{http_route="/route_b",job="service_a",le="+Inf"} 2
my_histogram_sum{http_route="/route_b",job="service_a"} 2
my_histogram_count{http_route="/route_b",job="service_a"} 2
my_histogram_bucket{http_route="route_c",job="service_b",le="1"} 1
my_histogram_bucket{http_route="route_c",job="service_b",le="+Inf"} 2
my_histogram_sum{http_route="route_c",job="service_b"} 2
my_histogram_count{http_route="route_c",job="service_b"} 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/prometheus needs triage New item requiring triage
Projects
None yet
Development

No branches or pull requests

2 participants