Skip to content

Commit a8b842c

Browse files
committed
Initial readme for OTEL
1 parent c68bb38 commit a8b842c

File tree

4 files changed

+199
-0
lines changed

4 files changed

+199
-0
lines changed

doc/opentelemetry/README.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Databroker Tracing with OpenTelemetry
2+
3+
OpenTelemetry is an observability framework and toolkit designed to create and manage telemetry data such as traces, metrics, and logs.
4+
5+
By enabling the `otel` build feature, OpenTelemetry Traces are enabled in the databroker binary. When enabled, trace information is being actively sent to an OTLP endpoint, which allows call traces to be analyzed in frontend tools like Jaeger or Zipkin.
6+
7+
_Note: OpenTelemetry Logs and Metrics are not available._
8+
9+
# Manual infrastructure setup
10+
11+
To collect trace information and being able to analyze the data, some infrastructure services are needed. For development and debugging purposes, the Databroker, the OpenTelemetry Collector and the frontend UI (e.g. Jaeger) can be started locally. In a remote scenario, the databroker and OpenTelemetry Collector would be running on the target environment (e.g. in a virtual device or in a high-performance vehicle computer), wheres the backend collectors, its storage service and frontend UI components for analysis would be deployed on a cloud backend.
12+
13+
## Prometheus
14+
15+
_Note: Prometheus is only needed when Metrics will be available in the future._
16+
17+
```
18+
curl --proto '=https' --tlsv1.2 -fOL https://github.com/prometheus/prometheus/releases/download/v3.1.0/prometheus-3.1.0.linux-amd64.tar.gz
19+
tar xvfz prometheus-*.tar.gz
20+
cd prometheus-*
21+
./prometheus
22+
```
23+
24+
## Jaeger
25+
26+
Jaeger is a frontend user interface to visualize call traces.
27+
28+
```
29+
curl --proto '=https' --tlsv1.2 -fOL https://github.com/jaegertracing/jaeger/releases/download/v1.65.0/jaeger-2.2.0-linux-amd64.tar.gz
30+
tar xzf jaeger-2.2.0-linux-amd64.tar.gz
31+
cd jaeger-2.2.0-linux-amd64
32+
./jaeger --config=config-jaeger.yaml
33+
```
34+
35+
## OpenTelemetry Collector
36+
37+
The collector is the OTLP endpoint to which databroker is sending otel data.
38+
39+
```
40+
cd doc/opentelemetry
41+
curl --proto '=https' --tlsv1.2 -fOL https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.118.0/otelcol_0.118.0_linux_amd64.tar.gz
42+
tar -xvf otelcol_0.118.0_linux_amd64.tar.gz
43+
./otelcol --config=config-otel-collector.yaml
44+
```
45+
46+
## Kuksa Databroker
47+
48+
Enable the `otel` feature and start databroker binary with an increased buffer size for OTEL messages, as the trace information from databroker is extensive.
49+
50+
```
51+
# in $workspace
52+
cargo build --features=otel
53+
OTEL_BSP_MAX_QUEUE_SIZE=8192 target/debug/databroker --vss data/vss-core/vss_release_4.0.json --enable-databroker-v1 --insecure
54+
```
55+
56+
Open the Jaeger UI at http://localhost:16686
57+
58+
# Testing
59+
60+
To test the OpenTelemetry Trace feature, invoke Kuksa API operations.
61+
The simplest way to do this is to use the databroker-cli, subscribe to a vehicle signal, list metadata and publish/actuare new data.
62+
63+
## Use databroker-cli to invoke some methods
64+
65+
```
66+
databroker-cli
67+
```
68+
69+
# Troubleshooting
70+
71+
## Channel is full
72+
Error Message:
73+
```
74+
OpenTelemetry trace error occurred. cannot send span to the batch span processor because the channel is full
75+
```
76+
Solution:
77+
- Increase `OTEL_BSP_MAX_QUEUE_SIZE` to 8192 or more, depending on the situation. The default is 2048, which is not enough for the amount of data being recorded during tracing.
78+
79+
80+
## Connection refused
81+
82+
Repeated messages when OTLP server is down:
83+
```
84+
OpenTelemetry trace error occurred. Exporter otlp encountered the following error(s): the grpc server returns error (The service is currently unavailable): , detailed error message: error trying to connect: tcp connect error: Connection refused (os error 111)
85+
```
86+
Solution:
87+
- (Re)Start the OpenTelemetry Collector
88+
- Ensure hostname and port number are properly configured. Default is `localhost:4317` for HTTP-based communication. Set environment variable `OTEL_ENDPOINT` to override default.

doc/opentelemetry/config-jaeger.yaml

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
service:
2+
extensions: [jaeger_storage, jaeger_query, remote_sampling, healthcheckv2]
3+
pipelines:
4+
traces:
5+
receivers: [otlp, jaeger, zipkin]
6+
processors: [batch, adaptive_sampling]
7+
exporters: [jaeger_storage_exporter]
8+
telemetry:
9+
resource:
10+
service.name: jaeger
11+
metrics:
12+
level: detailed
13+
address: 0.0.0.0:8888
14+
logs:
15+
level: debug
16+
# TODO Initialize telemetry tracer once OTEL released new feature.
17+
# https://github.com/open-telemetry/opentelemetry-collector/issues/10663
18+
19+
extensions:
20+
healthcheckv2:
21+
use_v2: true
22+
http:
23+
24+
# pprof:
25+
# endpoint: 0.0.0.0:1777
26+
# zpages:
27+
# endpoint: 0.0.0.0:55679
28+
29+
jaeger_query:
30+
storage:
31+
traces: some_store
32+
traces_archive: another_store
33+
# The maximum duration that is considered for clock skew adjustments.
34+
# Defaults to 0 seconds, which means it's disabled.
35+
max_clock_skew_adjust: 0s
36+
37+
jaeger_storage:
38+
backends:
39+
some_store:
40+
memory:
41+
max_traces: 100000
42+
another_store:
43+
memory:
44+
max_traces: 100000
45+
46+
remote_sampling:
47+
# You can either use file or adaptive sampling strategy in remote_sampling
48+
# file:
49+
# path: ./cmd/jaeger/sampling-strategies.json
50+
adaptive:
51+
sampling_store: some_store
52+
initial_sampling_probability: 0.1
53+
http:
54+
grpc:
55+
56+
receivers:
57+
otlp:
58+
protocols:
59+
grpc:
60+
endpoint: 127.0.0.1:4417
61+
62+
jaeger:
63+
protocols:
64+
grpc:
65+
66+
zipkin:
67+
68+
processors:
69+
batch:
70+
# Adaptive Sampling Processor is required to support adaptive sampling.
71+
# It expects remote_sampling extension with `adaptive:` config to be enabled.
72+
adaptive_sampling:
73+
74+
exporters:
75+
jaeger_storage_exporter:
76+
trace_storage: some_store
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
receivers:
2+
otlp:
3+
protocols:
4+
grpc:
5+
endpoint: 0.0.0.0:4317
6+
exporters:
7+
debug:
8+
# 'basic' or 'detailed'
9+
verbosity: basic
10+
# Data sources: metrics
11+
prometheusremotewrite:
12+
endpoint: http://localhost:9090/api/v1/write
13+
tls:
14+
insecure: true
15+
# Actually jaeger
16+
otlp:
17+
endpoint: localhost:4417
18+
tls:
19+
insecure: true
20+
21+
service:
22+
pipelines:
23+
traces:
24+
receivers: [otlp]
25+
exporters: [debug,otlp]
26+
metrics:
27+
receivers: [otlp]
28+
exporters: [debug,prometheusremotewrite]
29+
logs:
30+
receivers: [otlp]
31+
exporters: [debug]

doc/opentelemetry/prometheus.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
scrape_configs:
2+
- job_name: "otel"
3+
static_configs:
4+
- targets: ['localhost:8888']

0 commit comments

Comments
 (0)