|
| 1 | +# Databroker Tracing with OpenTelemetry |
| 2 | + |
| 3 | +OpenTelemetry is an observability framework and toolkit designed to create and manage telemetry data such as traces, metrics, and logs. |
| 4 | + |
| 5 | +By enabling the `otel` build feature, OpenTelemetry Traces are enabled in the databroker binary. When enabled, trace information is being actively sent to an OTLP endpoint, which allows call traces to be analyzed in frontend tools like Jaeger or Zipkin. |
| 6 | + |
| 7 | +_Note: OpenTelemetry Logs and Metrics are not available._ |
| 8 | + |
| 9 | +# Manual infrastructure setup |
| 10 | + |
| 11 | +To collect trace information and being able to analyze the data, some infrastructure services are needed. For development and debugging purposes, the Databroker, the OpenTelemetry Collector and the frontend UI (e.g. Jaeger) can be started locally. In a remote scenario, the databroker and OpenTelemetry Collector would be running on the target environment (e.g. in a virtual device or in a high-performance vehicle computer), wheres the backend collectors, its storage service and frontend UI components for analysis would be deployed on a cloud backend. |
| 12 | + |
| 13 | +## Prometheus |
| 14 | + |
| 15 | +_Note: Prometheus is only needed when Metrics will be available in the future._ |
| 16 | + |
| 17 | +``` |
| 18 | +curl --proto '=https' --tlsv1.2 -fOL https://github.com/prometheus/prometheus/releases/download/v3.1.0/prometheus-3.1.0.linux-amd64.tar.gz |
| 19 | +tar xvfz prometheus-*.tar.gz |
| 20 | +cd prometheus-* |
| 21 | +./prometheus |
| 22 | +``` |
| 23 | + |
| 24 | +## Jaeger |
| 25 | + |
| 26 | +Jaeger is a frontend user interface to visualize call traces. |
| 27 | + |
| 28 | +``` |
| 29 | +curl --proto '=https' --tlsv1.2 -fOL https://github.com/jaegertracing/jaeger/releases/download/v1.65.0/jaeger-2.2.0-linux-amd64.tar.gz |
| 30 | +tar xzf jaeger-2.2.0-linux-amd64.tar.gz |
| 31 | +cd jaeger-2.2.0-linux-amd64 |
| 32 | +./jaeger --config=config-jaeger.yaml |
| 33 | +``` |
| 34 | + |
| 35 | +## OpenTelemetry Collector |
| 36 | + |
| 37 | +The collector is the OTLP endpoint to which databroker is sending otel data. |
| 38 | + |
| 39 | +``` |
| 40 | +cd doc/opentelemetry |
| 41 | +curl --proto '=https' --tlsv1.2 -fOL https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.118.0/otelcol_0.118.0_linux_amd64.tar.gz |
| 42 | +tar -xvf otelcol_0.118.0_linux_amd64.tar.gz |
| 43 | +./otelcol --config=config-otel-collector.yaml |
| 44 | +``` |
| 45 | + |
| 46 | +## Kuksa Databroker |
| 47 | + |
| 48 | +Enable the `otel` feature and start databroker binary with an increased buffer size for OTEL messages, as the trace information from databroker is extensive. |
| 49 | + |
| 50 | +``` |
| 51 | +# in $workspace |
| 52 | +cargo build --features=otel |
| 53 | +OTEL_BSP_MAX_QUEUE_SIZE=8192 target/debug/databroker --vss data/vss-core/vss_release_4.0.json --enable-databroker-v1 --insecure |
| 54 | +``` |
| 55 | + |
| 56 | +Open the Jaeger UI at http://localhost:16686 |
| 57 | + |
| 58 | +# Testing |
| 59 | + |
| 60 | +To test the OpenTelemetry Trace feature, invoke Kuksa API operations. |
| 61 | +The simplest way to do this is to use the databroker-cli, subscribe to a vehicle signal, list metadata and publish/actuare new data. |
| 62 | + |
| 63 | +## Use databroker-cli to invoke some methods |
| 64 | + |
| 65 | +``` |
| 66 | +databroker-cli |
| 67 | +``` |
| 68 | + |
| 69 | +# Troubleshooting |
| 70 | + |
| 71 | +## Channel is full |
| 72 | +Error Message: |
| 73 | +``` |
| 74 | +OpenTelemetry trace error occurred. cannot send span to the batch span processor because the channel is full |
| 75 | +``` |
| 76 | +Solution: |
| 77 | +- Increase `OTEL_BSP_MAX_QUEUE_SIZE` to 8192 or more, depending on the situation. The default is 2048, which is not enough for the amount of data being recorded during tracing. |
| 78 | + |
| 79 | + |
| 80 | +## Connection refused |
| 81 | + |
| 82 | +Repeated messages when OTLP server is down: |
| 83 | +``` |
| 84 | +OpenTelemetry trace error occurred. Exporter otlp encountered the following error(s): the grpc server returns error (The service is currently unavailable): , detailed error message: error trying to connect: tcp connect error: Connection refused (os error 111) |
| 85 | +``` |
| 86 | +Solution: |
| 87 | +- (Re)Start the OpenTelemetry Collector |
| 88 | +- Ensure hostname and port number are properly configured. Default is `localhost:4317` for HTTP-based communication. Set environment variable `OTEL_ENDPOINT` to override default. |
0 commit comments