Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM] proposal: fine-grained otel supported metrics detection #180885

Open
SylvainJuge opened this issue Apr 16, 2024 · 2 comments
Open

[APM] proposal: fine-grained otel supported metrics detection #180885

SylvainJuge opened this issue Apr 16, 2024 · 2 comments
Labels
enhancement New value added to drive a business result OpenTelemetry Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team

Comments

@SylvainJuge
Copy link
Member

SylvainJuge commented Apr 16, 2024

In APM Service metrics in Java, we currently provide two variants:

In order to detect which version should be displayed and which metrics to query, the has_otel_process_metrics request is made and a boolean value is returned from the presence or not of some known OTel metrics.


The problem here is that the OTel metrics is a moving target:

  • the 1.x version of java otel agent reports experimental metrics in the process.runtime.jvm. namespace, which is what we rely on until Update APM dashboards with stable JVM OTel metrics  #174445 is fixed.
  • late versions of the 1.x version of java otel agent can also report stable metrics through opt-in configuration, so it's not something that can be inferred from the agent.name or agent.version fields.
  • newer versions of the 2.x version report the stable JVM metrics in the jvm.* namespace (semconv)
  • 2.x agent can also report non-stable metrics through opt-in configuration.

We should be able to provide a dedicated "portable dashboard" for any of the following configurations, ans possibly more in the future:

  • 1.x agent with experimental metrics in the process.runtime.jvm.* namespace (1)
  • 1.x or 2.x agent with stable metrics in the jvm.* namespace (2)
  • 1.x or 2.x agent with stable + experimental metrics in the jvm.* namespace (3)

Each variant would only be displayed when there are matching metrics, and the dashboard selection process would be implemented in a single easy to update function that takes the list of available metrics as input. This function would also provide a heuristic to select which variant has the priority when there is a mix (for example in the case of agents with and without stable JVM metrics in java).

This approach would be relevant for both the Java agent and all of the other agents, which will also have this "moving target" problem as the metrics definition evolve.


In addition to the name of the metrics, the way the data is structured might also change, so we should also query and include known labels that provide breakdown for a given metric in order to select the appropriate dashboard variant.

For example, when taking the "used heap memory" metric, we have different ways to represent it:

  • With Elastic APM agent, the metric is jvm.memory.heap.used or jvm.memory.non_heap.used.
  • With OpenTelemetry 1.x agent, the metric isprocess.runtime.jvm.memory.usage with labels.type that contains heap or non_heap
  • With OpenTelemetry 2.x agent (or stable JVM metrics enabled), the metric is jvm.memory.usage with labels.jvm_memory_type that contains heap or non_heap
  • With OpenTelemetry 2.x agent in the future if proposal: mapping of stable JVM metrics as top-level attributes apm-data#264 is implemented, then the metric jvm.memory.usage could have jvm.memory.type that contains heap or non_heap.
@botelastic botelastic bot added the needs-team Issues missing a team label label Apr 16, 2024
@SylvainJuge SylvainJuge added the Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team label Apr 16, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Apr 16, 2024
@SylvainJuge
Copy link
Member Author

After discussing this with @AlexanderWert today we think that this proposal might be a bit too complex and there might be a simpler alternative in the (hopefully not too far away) future by querying schema_url to provide semconv version used and use a dedicated dashboard per version.

While the current structure of the metrics could be detected by the metric name and the presence of some known attributes, there would still be cases where breaking changes could happen, for example when the metric name remains the same but the data type changes.

So, the short term option will likely to drop support for 1.x agents that do not use stable JVM metrics (which always had been partially working), and focus on #174445 using the stable definitions and use elastic/apm-data#264 to ensure that the dashboards stay relevant in the future.

@smith smith added OpenTelemetry enhancement New value added to drive a business result labels Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result OpenTelemetry Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team
Projects
None yet
Development

No branches or pull requests

3 participants