Skip to content

Commit 85a975b

Browse files
authored
chore(host-metrics): solve TODO comments for alpha(#124)
- drop `process.*` metrics since not used in the UI - use default scope name for host metrics and decided to not make it configurable - `ELASTIC_OTEL_DISABLE_METRICS` config var defined
1 parent ed0f5d6 commit 85a975b

File tree

2 files changed

+43
-26
lines changed

2 files changed

+43
-26
lines changed

packages/opentelemetry-node/lib/metrics/host.js

Lines changed: 39 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,9 @@
2626
* should export all the necessary building blocks like `LastValueAccumulation`
2727
* class and `AggregatorKind` enum
2828
* - we might look for an alternate solution in this case (no proposal for now)
29+
*
30+
* Most of this code will be removed when https://github.com/open-telemetry/opentelemetry-js/issues/4616
31+
* is solved.
2932
*/
3033
/**
3134
* @typedef {import('@opentelemetry/api').HrTime} HrTime
@@ -52,10 +55,12 @@ const {Aggregation, DataPointType, View} = metrics;
5255
const {HostMetrics} = require('@opentelemetry/host-metrics');
5356

5457
/**
55-
* Copied from `@opentelemetry/sdk-metrics` since it's not exported
58+
* TODO: Copied from `@opentelemetry/sdk-metrics` since it's not exported
5659
* https://github.com/open-telemetry/opentelemetry-js/blob/f86251d40fbf615be87319c8a1f5643afb820076/packages/sdk-metrics/src/aggregator/LastValue.ts#L34
5760
*
58-
* @todo remoce this class and require it when exported
61+
* Remove when https://github.com/open-telemetry/opentelemetry-js/issues/4616 is fixed
62+
*
63+
* @todo remove this class when sdk.metrics exports it
5964
* @class
6065
* @implements {Accumulation}
6166
*/
@@ -100,8 +105,9 @@ class LastValueAccumulation {
100105
* @implements {Aggregator<LastValueAccumulation>}
101106
*/
102107
class SystemCpuUtilizationAggregator {
103-
// TODO: hardcoded the value of `AggregatorKind` enum for GAUGE
104-
// remove when exported
108+
// TODO: Hardcoded the value of `AggregatorKind` enum for GAUGE. Remove
109+
// when issue below is fixed
110+
// Issue: https://github.com/open-telemetry/opentelemetry-js/issues/4616
105111
// https://github.com/open-telemetry/opentelemetry-js/blob/f86251d40fbf615be87319c8a1f5643afb820076/packages/sdk-metrics/src/aggregator/types.ts#L23
106112
kind = 2;
107113

@@ -158,8 +164,19 @@ class SystemCpuUtilizationAggregator {
158164
}
159165

160166
/**
161-
* Does the sum of data points grouping by `system.cpu.logical_number` so we have the total
162-
* utilization per CPU. Basically the value would be 1 - idle_value
167+
* Groups data points by `system.cpu.logical_number` so we have the total
168+
* utilization per CPU.
169+
*
170+
* We cannot sum up the utilization of all the states since `os.cpus()` is
171+
* not returning all of the possible states but limited to: user, nice, sys, idle, irq
172+
* https://nodejs.org/api/all.html#all_os_oscpus
173+
*
174+
* where in linux we have more: user, nice, system, idle, iowait, irq, softirq, steal, guest, guest_nice
175+
* https://man7.org/linux/man-pages/man5/proc.5.html
176+
*
177+
* So in order to have the most accurate metric of utilization we use
178+
* the formula 1 - (idle utilization)
179+
*
163180
* As an example given the data points:
164181
* - { value: 0.1, attributes: { 'system.cpu.logical_number': 0, 'system.cpu.state': 'idle' } }
165182
* - { value: 0.5, attributes: { 'system.cpu.logical_number': 0, 'system.cpu.state': 'system' } }
@@ -188,15 +205,6 @@ class SystemCpuUtilizationAggregator {
188205
accumulationByAttributes,
189206
endTime
190207
) {
191-
// We cannot sum up the utilization of all the states since `os.cpus()` is
192-
// not returning all of the possible states but limited to: user, nice, sys, idle, irq
193-
// https://nodejs.org/api/all.html#all_os_oscpus
194-
//
195-
// where in linux we have more: user, nice, system, idle, iowait, irq, softirq, steal, guest, guest_nice
196-
// https://man7.org/linux/man-pages/man5/proc.5.html
197-
//
198-
// So in order to have the most accurate metric of utilization we use
199-
// the formula 1 - (idle utilization)
200208
return {
201209
descriptor,
202210
aggregationTemporality,
@@ -225,27 +233,35 @@ class SystemCpuUtilizationAggregation extends Aggregation {
225233
/** @type {HostMetrics} */
226234
let hostMetricsInstance;
227235
function enableHostMetrics() {
228-
// TODO: make this configurable, user might collect host metrics with a separate utility
229-
hostMetricsInstance = new HostMetrics({
230-
name: '',
231-
});
236+
// @ts-ignore - config interface expects a `name` property but there is a default value
237+
hostMetricsInstance = new HostMetrics({});
232238
hostMetricsInstance.start();
233239
}
234240

241+
// It is known that host metrics sends a lot of data so for now we drop some
242+
// instruments that are not handled by Kibana and doing aggregations
243+
// for others that we want to include shorly (CPU metrics)
244+
// Ref (data amount issue): https://github.com/elastic/elastic-otel-node/issues/51
245+
// Ref (metrics in Kibana): https://github.com/elastic/kibana/pull/174700
235246
/** @type {metrics.View[]} */
236247
const HOST_METRICS_VIEWS = [
237-
// drop `system.network.*` metrics for now
248+
// drop `system.network.*` (not in Kibana)
238249
new View({
239250
instrumentName: 'system.network.*',
240251
aggregation: Aggregation.Drop(),
241252
}),
242-
// drop `system.cpu.time` also
243-
// TODO: check if we can do an aggregation here
253+
// drop `system.cpu.time` (not in Kibana)
244254
new View({
245255
instrumentName: 'system.cpu.time',
246256
aggregation: Aggregation.Drop(),
247257
}),
248-
// use the aggregation we craeted above
258+
// drop `process.*` (not in Kibana)
259+
new View({
260+
instrumentName: 'process.*',
261+
aggregation: Aggregation.Drop(),
262+
}),
263+
// Do an aggregation to avoid cardinality problems because of the possible
264+
// permutations of state & logical_number attributes
249265
new View({
250266
instrumentName: 'system.cpu.utilization',
251267
aggregation: new SystemCpuUtilizationAggregation(),

packages/opentelemetry-node/lib/sdk.js

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -83,14 +83,15 @@ class ElasticNodeSDK extends NodeSDK {
8383
// TODO metrics exporter should do for metrics what `TracerProviderWithEnvExporters` does for traces, does that include `url` export endpoint?
8484
// TODO what `temporalityPreference`?
8585
// TODO make the Millis values configurable. What would otel java do?
86-
// TODO config to disable metrics? E.g. otherwise `http.server.duration` will send every period forever and data can be distracting
87-
const metricsDisabled = process.env.ETEL_METRICS_DISABLED === 'true'; // TODO hack for now
86+
87+
// Disable metrics by config
88+
const metricsDisabled =
89+
process.env.ELASTIC_OTEL_METRICS_DISABLED === 'true';
8890
if (!metricsDisabled) {
8991
const metricsInterval =
9092
Number(process.env.ETEL_METRICS_INTERVAL_MS) || 30000;
9193
defaultConfig.metricReader =
9294
new metrics.PeriodicExportingMetricReader({
93-
// exporter: new metrics.ConsoleMetricExporter(),
9495
exporter: new OTLPMetricExporter(),
9596
exportIntervalMillis: metricsInterval,
9697
exportTimeoutMillis: metricsInterval, // TODO same val appropriate for timeout?

0 commit comments

Comments
 (0)