diff --git a/daprdocs/content/en/concepts/observability-concept.md b/daprdocs/content/en/concepts/observability-concept.md index 270be27cbf6..9de8295b931 100644 --- a/daprdocs/content/en/concepts/observability-concept.md +++ b/daprdocs/content/en/concepts/observability-concept.md @@ -7,42 +7,68 @@ description: > Observe applications through tracing, metrics, logs and health --- -When building an application, understanding how the system is behaving is an important part of operating it - this includes having the ability to observe the internal calls of an application, gauging its performance and becoming aware of problems as soon as they occur. This is challenging for any system, but even more so for a distributed system comprised of multiple microservices where a flow, made of several calls, may start in one microservice but continue in another. Observability is critical in production environments, but also useful during development to understand bottlenecks, improve performance and perform basic debugging across the span of microservices. +When building an application, understanding the system behavior is an important, yet challenging part of operating it, such as: +- Observing the internal calls of an application +- Gauging its performance +- Becoming aware of problems as soon as they occur -While some data points about an application can be gathered from the underlying infrastructure (for example memory consumption, CPU usage), other meaningful information must be collected from an "application-aware" layer–one that can show how an important series of calls is executed across microservices. This usually means a developer must add some code to instrument an application for this purpose. Often, instrumentation code is simply meant to send collected data such as traces and metrics to observability tools or services that can help store, visualize and analyze all this information. +This can be particularly challenging for a distributed system comprised of multiple microservices, where a flow made of several calls may start in one microservice and continue in another. -Having to maintain this code, which is not part of the core logic of the application, is a burden on the developer, sometimes requiring understanding the observability tools' APIs, using additional SDKs etc. This instrumentation may also add to the portability challenges of an application, which may require different instrumentation depending on where the application is deployed. For example, different cloud providers offer different observability tools and an on-premises deployment might require a self-hosted solution. +Observability into your application is critical in production environments, and can be useful during development to: +- Understand bottlenecks +- Improve performance +- Perform basic debugging across the span of microservices + +While some data points about an application can be gathered from the underlying infrastructure (memory consumption, CPU usage), other meaningful information must be collected from an "application-aware" layer – one that can show how an important series of calls is executed across microservices. Typically, you'd add some code to instrument an application, which simply sends collected data (such as traces and metrics) to observability tools or services that can help store, visualize, and analyze all this information. + +Maintaining this instrumentation code, which is not part of the core logic of the application, requires understanding the observability tools' APIs, using additional SDKs, etc. This instrumentation may also present portability challenges for your application, requiring different instrumentation depending on where the application is deployed. For example: +- Different cloud providers offer different observability tools +- An on-premises deployment might require a self-hosted solution ## Observability for your application with Dapr -When building an application which leverages Dapr API building blocks to perform service-to-service calls and pub/sub messaging, Dapr offers an advantage with respect to [distributed tracing]({{}}). Because this inter-service communication flows through the Dapr runtime (or "sidecar"), Dapr is in a unique position to offload the burden of application-level instrumentation. +When you leverage Dapr API building blocks to perform service-to-service calls and pub/sub messaging, Dapr offers an advantage with respect to [distributed tracing]({{< ref develop-tracing >}}). Since this inter-service communication flows through the Dapr runtime (or "sidecar"), Dapr is in a unique position to offload the burden of application-level instrumentation. ### Distributed tracing -Dapr can be [configured to emit tracing data]({{}}), and because Dapr does so using the widely adopted protocols of [Open Telemetry (OTEL)](https://opentelemetry.io/) and [Zipkin](https://zipkin.io), it can be easily integrated with multiple observability tools. +Dapr can be [configured to emit tracing data]({{< ref setup-tracing.md >}}) using the widely adopted protocols of [Open Telemetry (OTEL)](https://opentelemetry.io/) and [Zipkin](https://zipkin.io). This makes it easily integrated with multiple observability tools. Distributed tracing with Dapr ### Automatic tracing context generation -Dapr uses [W3C tracing]({{}}) specification for tracing context, included as part Open Telemetry (OTEL), to generate and propagate the context header for the application or propagate user-provided context headers. This means that you get tracing by default with Dapr. +Dapr uses [W3C tracing]({{< ref w3c-tracing-overview >}}) specification for tracing context, included as part Open Telemetry (OTEL), to generate and propagate the context header for the application or propagate user-provided context headers. This means that you get tracing by default with Dapr. ## Observability for the Dapr sidecar and control plane -You also want to be able to observe Dapr itself, by collecting metrics on performance, throughput and latency and logs emitted by the Dapr sidecar, as well as the Dapr control plane services. Dapr sidecars have a health endpoint that can be probed to indicate their health status. +You can also observe Dapr itself, by: +- Generating logs emitted by the Dapr sidecar and the Dapr control plane services +- Collecting metrics on performance, throughput, and latency +- Using health endpoints probes to indicate the Dapr sidecar health status Dapr sidecar metrics, logs and health checks ### Logging -Dapr generates [logs]({{}}) to provide visibility into sidecar operation and to help users identify issues and perform debugging. Log events contain warning, error, info, and debug messages produced by Dapr system services. Dapr can also be configured to send logs to collectors such as [Fluentd]({{< ref fluentd.md >}}), [Azure Monitor]({{< ref azure-monitor.md >}}), and other observability tools, so that logs can be searched and analyzed to provide insights. +Dapr generates [logs]({{< ref logs.md >}}) to: +- Provide visibility into sidecar operation +- Help users identify issues and perform debugging + +Log events contain warning, error, info, and debug messages produced by Dapr system services. You can also configure Dapr to send logs to collectors, such as Open Telemetry Collector, [Fluentd]({{< ref fluentd.md >}}), [New Relic]({{< ref "operations/monitoring/logging/newrelic.md" >}}), [Azure Monitor]({{< ref azure-monitor.md >}}), and other observability tools, so that logs can be searched and analyzed to provide insights. ### Metrics -Metrics are the series of measured values and counts that are collected and stored over time. [Dapr metrics]({{}}) provide monitoring capabilities to understand the behavior of the Dapr sidecar and control plane. For example, the metrics between a Dapr sidecar and the user application show call latency, traffic failures, error rates of requests, etc. Dapr [control plane metrics](https://github.com/dapr/dapr/blob/master/docs/development/dapr-metrics.md) show sidecar injection failures and the health of control plane services, including CPU usage, number of actor placements made, etc. +Metrics are a series of measured values and counts collected and stored over time. [Dapr metrics]({{< ref metrics >}}) provide monitoring capabilities to understand the behavior of the Dapr sidecar and control plane. For example, the metrics between a Dapr sidecar and the user application show call latency, traffic failures, error rates of requests, etc. + +Dapr [control plane metrics](https://github.com/dapr/dapr/blob/master/docs/development/dapr-metrics.md) show sidecar injection failures and the health of control plane services, including CPU usage, number of actor placements made, etc. ### Health checks -The Dapr sidecar exposes an HTTP endpoint for [health checks]({{}}). With this API, user code or hosting environments can probe the Dapr sidecar to determine its status and identify issues with sidecar readiness. +The Dapr sidecar exposes an HTTP endpoint for [health checks]({{< ref sidecar-health.md >}}). With this API, user code or hosting environments can probe the Dapr sidecar to determine its status and identify issues with sidecar readiness. + +Conversely, Dapr can be configured to probe for the [health of your application]({{< ref app-health.md >}}), and react to changes in the app's health, including stopping pub/sub subscriptions and short-circuiting service invocation calls. + +## Next steps -Conversely, Dapr can be configured to probe for the [health of your application]({{}}), and react to changes in the app's health, including stopping pub/sub subscriptions and short-circuiting service invocation calls. +- [Learn more about observability in developing with Dapr]({{< ref develop-tracing >}}) +- [Learn more about observability in operating with Dapr]({{< ref tracing >}}) \ No newline at end of file diff --git a/daprdocs/content/en/developing-applications/building-blocks/observability/app-health.md b/daprdocs/content/en/developing-applications/building-blocks/observability/app-health.md index 4834bb5432b..97c7189293c 100644 --- a/daprdocs/content/en/developing-applications/building-blocks/observability/app-health.md +++ b/daprdocs/content/en/developing-applications/building-blocks/observability/app-health.md @@ -2,17 +2,22 @@ type: docs title: "App health checks" linkTitle: "App health checks" -weight: 300 +weight: 100 description: Reacting to apps' health status changes --- -App health checks is a feature that allows probing for the health of your application and reacting to status changes. +The app health checks feature allows probing for the health of your application and reacting to status changes. -Applications can become unresponsive for a variety of reasons: for example, they could be too busy to accept new work, could have crashed, or be in a deadlock state. Sometimes the condition can be transitory, for example if the app is just busy (and will eventually be able to resume accepting new work), or if the application is being restarted for whatever reason and is in its initialization phase. +Applications can become unresponsive for a variety of reasons. For example, your application: +- Could be too busy to accept new work; +- Could have crashed; or +- Could be in a deadlock state. -When app health checks are enabled, the Dapr *runtime* (sidecar) periodically polls your application via HTTP or gRPC calls. +Sometimes the condition can be transitory, for example: +- If the app is just busy and will resume accepting new work eventually +- If the application is being restarted for whatever reason and is in its initialization phase -When it detects a failure in the app's health, Dapr stops accepting new work on behalf of the application by: +App health checks are disabled by default. Once you enable app health checks, the Dapr runtime (sidecar) periodically polls your application via HTTP or gRPC calls. When it detects a failure in the app's health, Dapr stops accepting new work on behalf of the application by: - Unsubscribing from all pub/sub subscriptions - Stopping all input bindings @@ -20,15 +25,14 @@ When it detects a failure in the app's health, Dapr stops accepting new work on These changes are meant to be temporary, and Dapr resumes normal operations once it detects that the application is responsive again. -App health checks are disabled by default. - Diagram showing the app health feature. Running Dapr with app health enabled causes Dapr to periodically probe the app for its health. -### App health checks vs platform-level health checks +## App health checks vs platform-level health checks App health checks in Dapr are meant to be complementary to, and not replace, any platform-level health checks, like [liveness probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) when running on Kubernetes. Platform-level health checks (or liveness probes) generally ensure that the application is running, and cause the platform to restart the application in case of failures. + Unlike platform-level health checks, Dapr's app health checks focus on pausing work to an application that is currently unable to accept it, but is expected to be able to resume accepting work *eventually*. Goals include: - Not bringing more load to an application that is already overloaded. @@ -36,7 +40,9 @@ Unlike platform-level health checks, Dapr's app health checks focus on pausing w In this regard, Dapr's app health checks are "softer", waiting for an application to be able to process work, rather than terminating the running process in a "hard" way. -> Note that for Kubernetes, a failing App Health check won't remove a pod from service discovery: this remains the responsibility of the Kubernetes liveness probe, _not_ Dapr. +{{% alert title="Note" color="primary" %}} +For Kubernetes, a failing app health check won't remove a pod from service discovery: this remains the responsibility of the Kubernetes liveness probe, _not_ Dapr. +{{% /alert %}} ## Configuring app health checks @@ -52,34 +58,46 @@ The full list of options are listed in this table: | CLI flags | Kubernetes deployment annotation | Description | Default value | | ----------------------------- | ----------------------------------- | ----------- | ------------- | | `--enable-app-health-check` | `dapr.io/enable-app-health-check` | Boolean that enables the health checks | Disabled | -| `--app-health-check-path` | `dapr.io/app-health-check-path` | Path that Dapr invokes for health probes when the app channel is HTTP (this value is ignored if the app channel is using gRPC) | `/healthz` | -| `--app-health-probe-interval` | `dapr.io/app-health-probe-interval` | Number of *seconds* between each health probe | `5` | -| `--app-health-probe-timeout` | `dapr.io/app-health-probe-timeout` | Timeout in *milliseconds* for health probe requests | `500` | -| `--app-health-threshold` | `dapr.io/app-health-threshold` | Max number of consecutive failures before the app is considered unhealthy | `3` | +| [`--app-health-check-path`]({{< ref "app-health.md#health-check-paths" >}}) | `dapr.io/app-health-check-path` | Path that Dapr invokes for health probes when the app channel is HTTP (this value is ignored if the app channel is using gRPC) | `/healthz` | +| [`--app-health-probe-interval`]({{< ref "app-health.md#intervals-timeouts-and-thresholds" >}}) | `dapr.io/app-health-probe-interval` | Number of *seconds* between each health probe | `5` | +| [`--app-health-probe-timeout`]({{< ref "app-health.md#intervals-timeouts-and-thresholds" >}}) | `dapr.io/app-health-probe-timeout` | Timeout in *milliseconds* for health probe requests | `500` | +| [`--app-health-threshold`]({{< ref "app-health.md#intervals-timeouts-and-thresholds" >}}) | `dapr.io/app-health-threshold` | Max number of consecutive failures before the app is considered unhealthy | `3` | -> See the [full Dapr arguments and annotations reference]({{}}) for all options and how to enable them. +> See the [full Dapr arguments and annotations reference]({{< ref arguments-annotations-overview >}}) for all options and how to enable them. -Additionally, app health checks are impacted by the protocol used for the app channel, which is configured with the `--app-protocol` flag (self-hosted) or the `dapr.io/app-protocol` annotation (Kubernetes); supported values are `http` (default), `grpc`, `https`, `grpcs`, and `h2c` (HTTP/2 Cleartext). +Additionally, app health checks are impacted by the protocol used for the app channel, which is configured with the following flag or annotation: + +| CLI flag | Kubernetes deployment annotation | Description | Default value | +| ----------------------------- | ----------------------------------- | ----------- | ------------- | +| [`--app-protocol`]({{< ref "app-health.md#health-check-paths" >}}) | `dapr.io/app-protocol` | Protocol used for the app channel. supported values are `http`, `grpc`, `https`, `grpcs`, and `h2c` (HTTP/2 Cleartext). | `http` | ### Health check paths +#### HTTP When using HTTP (including `http`, `https`, and `h2c`) for `app-protocol`, Dapr performs health probes by making an HTTP call to the path specified in `app-health-check-path`, which is `/health` by default. + For your app to be considered healthy, the response must have an HTTP status code in the 200-299 range. Any other status code is considered a failure. Dapr is only concerned with the status code of the response, and ignores any response header or body. +#### gRPC When using gRPC for the app channel (`app-protocol` set to `grpc` or `grpcs`), Dapr invokes the method `/dapr.proto.runtime.v1.AppCallbackHealthCheck/HealthCheck` in your application. Most likely, you will use a Dapr SDK to implement the handler for this method. While responding to a health probe request, your app *may* decide to perform additional internal health checks to determine if it's ready to process work from the Dapr runtime. However, this is not required; it's a choice that depends on your application's needs. ### Intervals, timeouts, and thresholds -When app health checks are enabled, by default Dapr probes your application every 5 seconds. You can configure the interval, in seconds, with `app-health-probe-interval`. These probes happen regularly, regardless of whether your application is healthy or not. +#### Intervals +By default, when app health checks are enabled, Dapr probes your application every 5 seconds. You can configure the interval, in seconds, with `app-health-probe-interval`. These probes happen regularly, regardless of whether your application is healthy or not. +#### Timeouts When the Dapr runtime (sidecar) is initially started, Dapr waits for a successful health probe before considering the app healthy. This means that pub/sub subscriptions, input bindings, and service invocation requests won't be enabled for your application until this first health check is complete and successful. -Health probe requests are considered successful if the application sends a successful response (as explained above) within the timeout configured in `app-health-probe-timeout`. The default value is 500, corresponding to 500 milliseconds (i.e. half a second). +Health probe requests are considered successful if the application sends a successful response (as explained above) within the timeout configured in `app-health-probe-timeout`. The default value is 500, corresponding to 500 milliseconds (half a second). +#### Thresholds Before Dapr considers an app to have entered an unhealthy state, it will wait for `app-health-threshold` consecutive failures, whose default value is 3. This default value means that your application must fail health probes 3 times *in a row* to be considered unhealthy. + If you set the threshold to 1, any failure causes Dapr to assume your app is unhealthy and will stop delivering work to it. + A threshold greater than 1 can help exclude transient failures due to external circumstances. The right value for your application depends on your requirements. Thresholds only apply to failures. A single successful response is enough for Dapr to consider your app to be healthy and resume normal operations. diff --git a/daprdocs/content/en/developing-applications/building-blocks/observability/develop-tracing/_index.md b/daprdocs/content/en/developing-applications/building-blocks/observability/develop-tracing/_index.md new file mode 100644 index 00000000000..bc0df410947 --- /dev/null +++ b/daprdocs/content/en/developing-applications/building-blocks/observability/develop-tracing/_index.md @@ -0,0 +1,7 @@ +--- +type: docs +title: "Tracing" +linkTitle: "Tracing" +weight: 300 +description: Learn more about tracing scenarios and how to use tracing for visibility in your application +--- \ No newline at end of file diff --git a/daprdocs/content/en/developing-applications/building-blocks/observability/develop-tracing/tracing-overview.md b/daprdocs/content/en/developing-applications/building-blocks/observability/develop-tracing/tracing-overview.md new file mode 100644 index 00000000000..9331a75f906 --- /dev/null +++ b/daprdocs/content/en/developing-applications/building-blocks/observability/develop-tracing/tracing-overview.md @@ -0,0 +1,113 @@ +--- +type: docs +title: "Distributed tracing" +linkTitle: "Distributed tracing" +weight: 300 +description: "Use tracing to get visibility into your application" +--- + +Dapr uses the Open Telemetry (OTEL) and Zipkin protocols for distributed traces. OTEL is the industry standard and is the recommended trace protocol to use. + +Most observability tools support OTEL, including: +- [Google Cloud Operations](https://cloud.google.com/products/operations) +- [New Relic](https://newrelic.com) +- [Azure Monitor](https://azure.microsoft.com/services/monitor/) +- [Datadog](https://www.datadoghq.com) +- Instana +- [Jaeger](https://www.jaegertracing.io/) +- [SignalFX](https://www.signalfx.com/) + +## Scenarios + +Tracing is used with service invocaton and pub/sub APIs. You can flow trace context between services that uses these APIs. There are two scenarios for how tracing is used: + + 1. Dapr generates the trace context and you propagate the trace context to another service. + 1. You generate the trace context and Dapr propagates the trace context to a service. + +### Scenario 1: Dapr generates trace context headers + +#### Propagating sequential service calls + +Dapr takes care of creating the trace headers. However, when there are more than two services, you're responsible for propagating the trace headers between them. Let's go through the scenarios with examples: + +##### Single service invocation call + +For example, `service A -> service B`. + +Dapr generates the trace headers in `service A`, which are then propagated from `service A` to `service B`. No further propagation is needed. + +##### Multiple sequential service invocation calls + +For example, `service A -> service B -> propagate trace headers to -> service C` and so on to further Dapr-enabled services. + +Dapr generates the trace headers at the beginning of the request in `service A`, which are then propagated to `service B`. You are now responsible for taking the headers and propagating them to `service C`, since this is specific to your application. + +In other words, if the app is calling to Dapr and wants to trace with an existing trace header (span), it must always propagate to Dapr (from `service B` to `service C`, in this example). Dapr always propagates trace spans to an application. + +{{% alert title="Note" color="primary" %}} +No helper methods are exposed in Dapr SDKs to propagate and retrieve trace context. You need to use HTTP/gRPC clients to propagate and retrieve trace headers through HTTP headers and gRPC metadata. +{{% /alert %}} + +##### Request is from external endpoint + +For example, `from a gateway service to a Dapr-enabled service A`. + +An external gateway ingress calls Dapr, which generates the trace headers and calls `service A`. `Service A` then calls `service B` and further Dapr-enabled services. + +You must propagate the headers from `service A` to `service B`. For example: `Ingress -> service A -> propagate trace headers -> service B`. This is similar to [case 2]({{< ref "tracing-overview.md#multiple-sequential-service-invocation-calls" >}}). + +##### Pub/sub messages + +Dapr generates the trace headers in the published message topic. These trace headers are propagated to any services listening on that topic. + +#### Propagating multiple different service calls + +In the following scenarios, Dapr does some of the work for you, with you then creating or propagating trace headers. + +##### Multiple service calls to different services from single service + +When you are calling multiple services from a single service, you need to propagate the trace headers. For example: + +``` +service A -> service B +[ .. some code logic ..] +service A -> service C +[ .. some code logic ..] +service A -> service D +[ .. some code logic ..] +``` + +In this case: +1. When `service A` first calls `service B`, Dapr generates the trace headers in `service A`. +1. The trace headers in `service A` are propagated to `service B`. +1. These trace headers are returned in the response from `service B` as part of response headers. +1. You then need to propagate the returned trace context to the next services, like `service C` and `service D`, as Dapr does not know you want to reuse the same header. + +### Scenario 2: You generate your own trace context headers from non-Daprized applications + +Generating your own trace context headers is more unusual and typically not required when calling Dapr. + +However, there are scenarios where you could specifically choose to add W3C trace headers into a service call. For example, you have an existing application that does not use Dapr. In this case, Dapr still propagates the trace context headers for you. + +If you decide to generate trace headers yourself, there are three ways this can be done: + +1. Standard OpenTelemetry SDK + + You can use the industry standard [OpenTelemetry SDKs](https://opentelemetry.io/docs/instrumentation/) to generate trace headers and pass these trace headers to a Dapr-enabled service. _This is the preferred method_. + +1. Vendor SDK + + You can use a vendor SDK that provides a way to generate W3C trace headers and pass them to a Dapr-enabled service. + +1. W3C trace context + + You can handcraft a trace context following [W3C trace context specifications](https://www.w3.org/TR/trace-context/) and pass them to a Dapr-enabled service. + + Read [the trace context overview]({{< ref w3c-tracing-overview >}}) for more background and examples on W3C trace context and headers. + +## Related Links + +- [Observability concepts]({{< ref observability-concept.md >}}) +- [W3C Trace Context for distributed tracing]({{< ref w3c-tracing-overview >}}) +- [W3C Trace Context specification](https://www.w3.org/TR/trace-context/) +- [Observability quickstart](https://github.com/dapr/quickstarts/tree/master/tutorials/observability) diff --git a/daprdocs/content/en/developing-applications/building-blocks/observability/develop-tracing/w3c-tracing-overview.md b/daprdocs/content/en/developing-applications/building-blocks/observability/develop-tracing/w3c-tracing-overview.md new file mode 100644 index 00000000000..53d315d6032 --- /dev/null +++ b/daprdocs/content/en/developing-applications/building-blocks/observability/develop-tracing/w3c-tracing-overview.md @@ -0,0 +1,91 @@ +--- +type: docs +title: "W3C trace context" +linkTitle: "W3C trace context" +weight: 2000 +description: Background and scenarios for using W3C tracing with Dapr +type: docs +--- + +Dapr uses the [Open Telemetry protocol](https://opentelemetry.io/), which in turn uses the [W3C trace context](https://www.w3.org/TR/trace-context/) for distributed tracing for both service invocation and pub/sub messaging. Dapr generates and propagates the trace context information, which can be sent to observability tools for visualization and querying. + +## Background + +Distributed tracing is a methodology implemented by tracing tools to follow, analyze, and debug a transaction across multiple software components. + +Typically, a distributed trace traverses more than one service, which requires it to be uniquely identifiable. **Trace context propagation** passes along this unique identification. + +In the past, trace context propagation was implemented individually by each different tracing vendor. In multi-vendor environments, this causes interoperability problems, such as: + +- Traces collected by different tracing vendors can't be correlated, as there is no shared unique identifier. +- Traces crossing boundaries between different tracing vendors can't be propagated, as there is no forwarded, uniformly agreed set of identification. +- Vendor-specific metadata might be dropped by intermediaries. +- Cloud platform vendors, intermediaries, and service providers cannot guarantee to support trace context propagation, as there is no standard to follow. + +Previously, most applications were monitored by a single tracing vendor and stayed within the boundaries of a single platform provider, so these problems didn't have a significant impact. + +Today, an increasing number of applications are distributed and leverage multiple middleware services and cloud platforms. This transformation of modern applications requires a distributed tracing context propagation standard. + +The [W3C trace context specification](https://www.w3.org/TR/trace-context/) defines a universally agreed-upon format for the exchange of trace context propagation data (referred to as trace context). Trace context solves the above problems by providing: + +- A unique identifier for individual traces and requests, allowing trace data of multiple providers to be linked together. +- An agreed-upon mechanism to forward vendor-specific trace data and avoid broken traces when multiple tracing tools participate in a single transaction. +- An industry standard that intermediaries, platforms, and hardware providers can support. + +This unified approach for propagating trace data improves visibility into the behavior of distributed applications, facilitating problem and performance analysis. + +## W3C trace context and headers format + +### W3C trace context + +Dapr uses the standard W3C trace context headers. + +- For HTTP requests, Dapr uses `traceparent` header. +- For gRPC requests, Dapr uses `grpc-trace-bin` header. + +When a request arrives without a trace ID, Dapr creates a new one. Otherwise, it passes the trace ID along the call chain. + +### W3C trace headers +These are the specific trace context headers that are generated and propagated by Dapr for HTTP and gRPC. + +{{< tabs "HTTP" "gRPC" >}} + +{{% codetab %}} + +Copy these headers when propagating a trace context header from an HTTP response to an HTTP request: + +**Traceparent header** + +The traceparent header represents the incoming request in a tracing system in a common format, understood by all vendors: + +``` +traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01 +``` + +[Learn more about the traceparent fields details](https://www.w3.org/TR/trace-context/#traceparent-header). + +**Tracestate header** + +The tracestate header includes the parent in a potentially vendor-specific format: + +``` +tracestate: congo=t61rcWkgMzE +``` + +[Learn more about the tracestate fields details](https://www.w3.org/TR/trace-context/#tracestate-header). + +{{% /codetab %}} + + + +{{% codetab %}} + +In the gRPC API calls, trace context is passed through `grpc-trace-bin` header. + +{{% /codetab %}} + +{{< /tabs >}} + +## Related Links +- [Learn more about distributed tracing in Dapr]({{< ref tracing-overview.md >}}) +- [W3C Trace Context specification](https://www.w3.org/TR/trace-context/) diff --git a/daprdocs/content/en/developing-applications/building-blocks/observability/sidecar-health.md b/daprdocs/content/en/developing-applications/building-blocks/observability/sidecar-health.md index 9385473dd9f..b81efeef9b9 100644 --- a/daprdocs/content/en/developing-applications/building-blocks/observability/sidecar-health.md +++ b/daprdocs/content/en/developing-applications/building-blocks/observability/sidecar-health.md @@ -11,7 +11,7 @@ Dapr provides a way to determine its health using an [HTTP `/healthz` endpoint]( - Probed for its health - Determined for readiness and liveness -The Dapr `/healthz` endpoint can be used by health probes from the application hosting platform (for example Kubernetes). This topic describes how Dapr integrates with probes from different hosting platforms. +In this guide, you learn how the Dapr `/healthz` endpoint integrate with health probes from the application hosting platform (for example, Kubernetes). When deploying Dapr to a hosting platform like Kubernetes, the Dapr health endpoint is automatically configured for you. @@ -23,20 +23,10 @@ Dapr actors also have a health API endpoint where Dapr probes the application fo Kubernetes uses *readiness* and *liveness* probes to determines the health of the container. -The kubelet uses liveness probes to know when to restart a container. -For example, liveness probes could catch a deadlock, where an application is running but is unable to make progress. Restarting a container in such a state can help to make the application more available despite having bugs. +### Liveness +The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock (a running application that is unable to make progress). Restarting a container in such a state can help to make the application more available despite having bugs. -The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A pod is considered ready when all of its containers are ready. One use of this readiness signal is to control which pods are used as backends for Kubernetes services. When a pod is not ready, it is removed from Kubernetes service load balancers. - -{{% alert title="Note" color="primary" %}} -The Dapr sidecar will be in ready state once the application is accessible on its configured port. The application cannot access the Dapr components during application start up/initialization. -{{% /alert %}} - -When integrating with Kubernetes, the Dapr sidecar is injected with a Kubernetes probe configuration telling it to use the Dapr healthz endpoint. This is done by the "Sidecar Injector" system service. The integration with the kubelet is shown in the diagram below. - -Diagram of Dapr services interacting - -### How to configure a liveness probe in Kubernetes +#### How to configure a liveness probe in Kubernetes In the pod configuration file, the liveness probe is added in the containers spec section as shown below: @@ -53,7 +43,14 @@ In the above example, the `periodSeconds` field specifies that the kubelet shoul Any HTTP status code between 200 and 399 indicates success; any other status code indicates failure. -### How to configure a readiness probe in Kubernetes +### Readiness +The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A pod is considered ready when all of its containers are ready. One use of this readiness signal is to control which pods are used as backends for Kubernetes services. When a pod is not ready, it is removed from Kubernetes service load balancers. + +{{% alert title="Note" color="primary" %}} +The Dapr sidecar will be in ready state once the application is accessible on its configured port. The application cannot access the Dapr components during application start up/initialization. +{{% /alert %}} + +#### How to configure a readiness probe in Kubernetes Readiness probes are configured similarly to liveness probes. The only difference is that you use the `readinessProbe` field instead of the `livenessProbe` field: @@ -66,7 +63,13 @@ Readiness probes are configured similarly to liveness probes. The only differenc periodSeconds: 3 ``` -### How the Dapr sidecar health endpoint is configured with Kubernetes +### Sidecar Injector + +When integrating with Kubernetes, the Dapr sidecar is injected with a Kubernetes probe configuration telling it to use the Dapr `healthz` endpoint. This is done by the "Sidecar Injector" system service. The integration with the kubelet is shown in the diagram below. + +Diagram of Dapr services interacting + +#### How the Dapr sidecar health endpoint is configured with Kubernetes As mentioned above, this configuration is done automatically by the Sidecar Injector service. This section describes the specific values that are set on the liveness and readiness probes. @@ -91,7 +94,7 @@ Dapr has its HTTP health endpoint `/v1.0/healthz` on port 3500. This can be used failureThreshold: 3 ``` -For more information refer to: +## Related links - [Endpoint health API]({{< ref health_api.md >}}) - [Actor health API]({{< ref "actors_api.md#health-check" >}}) diff --git a/daprdocs/content/en/developing-applications/building-blocks/observability/tracing-overview.md b/daprdocs/content/en/developing-applications/building-blocks/observability/tracing-overview.md deleted file mode 100644 index 38ac85d258e..00000000000 --- a/daprdocs/content/en/developing-applications/building-blocks/observability/tracing-overview.md +++ /dev/null @@ -1,118 +0,0 @@ ---- -type: docs -title: "Distributed tracing" -linkTitle: "Distributed tracing" -weight: 100 -description: "Use tracing to get visibility into your application" ---- - -Dapr uses the Open Telemetry (OTEL) and Zipkin protocols for distributed traces. OTEL is the industry standard and is the recommended trace protocol to use. - - Most observability tools support OTEL. For example [Google Cloud Operations](https://cloud.google.com/products/operations), [New Relic](https://newrelic.com), [Azure Monitor](https://azure.microsoft.com/services/monitor/), [Datadog](https://www.datadoghq.com), Instana, [Jaeger](https://www.jaegertracing.io/), and [SignalFX](https://www.signalfx.com/). - -## Scenarios -Tracing is used with service invocaton and pub/sub APIs. You can flow trace context between services that uses these APIs. - -There are two scenarios for how tracing is used: - - 1. Dapr generates the trace context and you propagate the trace context to another service. - 2. You generate the trace context and Dapr propagates the trace context to a service. - -### Propagating sequential service calls - -Dapr takes care of creating the trace headers. However, when there are more than two services, you're responsible for propagating the trace headers between them. Let's go through the scenarios with examples: - -1. Single service invocation call (`service A -> service B`) - - Dapr generates the trace headers in service A, which are then propagated from service A to service B. No further propagation is needed. - -2. Multiple sequential service invocation calls ( `service A -> service B -> service C`) - - Dapr generates the trace headers at the beginning of the request in service A, which are then propagated to service B. You are now responsible for taking the headers and propagating them to service C, since this is specific to your application. - - `service A -> service B -> propagate trace headers to -> service C` and so on to further Dapr-enabled services. - - In other words, if the app is calling to Dapr and wants to trace with an existing span (trace header), it must always propagate to Dapr (from service B to service C in this case). Dapr always propagates trace spans to an application. - -{{% alert title="Note" color="primary" %}} -There are no helper methods exposed in Dapr SDKs to propagate and retrieve trace context. You need to use HTTP/gRPC clients to propagate and retrieve trace headers through HTTP headers and gRPC metadata. -{{% /alert %}} - -3. Request is from external endpoint (for example, `from a gateway service to a Dapr-enabled service A`) - - An external gateway ingress calls Dapr, which generates the trace headers and calls service A. Service A then calls service B and further Dapr-enabled services. You must propagate the headers from service A to service B: `Ingress -> service A -> propagate trace headers -> service B`. This is similar to case 2 above. - -4. Pub/sub messages - Dapr generates the trace headers in the published message topic. These trace headers are propagated to any services listening on that topic. - -### Propagating multiple different service calls - -In the following scenarios, Dapr does some of the work for you and you need to either create or propagate trace headers. - -1. Multiple service calls to different services from single service - - When you are calling multiple services from a single service (see example below), you need to propagate the trace headers: - - ``` - service A -> service B - [ .. some code logic ..] - service A -> service C - [ .. some code logic ..] - service A -> service D - [ .. some code logic ..] - ``` - - In this case, when service A first calls service B, Dapr generates the trace headers in service A, which are then propagated to service B. These trace headers are returned in the response from service B as part of response headers. You then need to propagate the returned trace context to the next services, service C and service D, as Dapr does not know you want to reuse the same header. - -### Generating your own trace context headers from non-Daprized applications - -You may have chosen to generate your own trace context headers. -Generating your own trace context headers is more unusual and typically not required when calling Dapr. However, there are scenarios where you could specifically choose to add W3C trace headers into a service call; for example, you have an existing application that does not use Dapr. In this case, Dapr still propagates the trace context headers for you. If you decide to generate trace headers yourself, there are three ways this can be done: - -1. You can use the industry standard [OpenTelemetry SDKs](https://opentelemetry.io/docs/instrumentation/) to generate trace headers and pass these trace headers to a Dapr-enabled service. This is the preferred method. - -2. You can use a vendor SDK that provides a way to generate W3C trace headers and pass them to a Dapr-enabled service. - -3. You can handcraft a trace context following [W3C trace context specifications](https://www.w3.org/TR/trace-context/) and pass them to a Dapr-enabled service. - -## W3C trace context - -Dapr uses the standard W3C trace context headers. - -- For HTTP requests, Dapr uses `traceparent` header. -- For gRPC requests, Dapr uses `grpc-trace-bin` header. - -When a request arrives without a trace ID, Dapr creates a new one. Otherwise, it passes the trace ID along the call chain. - -Read [trace context overview]({{< ref w3c-tracing-overview >}}) for more background on W3C trace context. - -## W3C trace headers -These are the specific trace context headers that are generated and propagated by Dapr for HTTP and gRPC. - -### Trace context HTTP headers format -When propagating a trace context header from an HTTP response to an HTTP request, you copy these headers. - -#### Traceparent header -The traceparent header represents the incoming request in a tracing system in a common format, understood by all vendors. -Here’s an example of a traceparent header. - -`traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01` - - Find the traceparent fields detailed [here](https://www.w3.org/TR/trace-context/#traceparent-header). - -#### Tracestate header -The tracestate header includes the parent in a potentially vendor-specific format: - -`tracestate: congo=t61rcWkgMzE` - -Find the tracestate fields detailed [here](https://www.w3.org/TR/trace-context/#tracestate-header). - -### Trace context gRPC headers format -In the gRPC API calls, trace context is passed through `grpc-trace-bin` header. - -## Related Links - -- [Observability concepts]({{< ref observability-concept.md >}}) -- [W3C Trace Context for distributed tracing]({{< ref w3c-tracing-overview >}}) -- [W3C Trace Context specification](https://www.w3.org/TR/trace-context/) -- [Observability quickstart](https://github.com/dapr/quickstarts/tree/master/tutorials/observability) diff --git a/daprdocs/content/en/developing-applications/building-blocks/observability/w3c-tracing-overview.md b/daprdocs/content/en/developing-applications/building-blocks/observability/w3c-tracing-overview.md deleted file mode 100644 index fe168c75301..00000000000 --- a/daprdocs/content/en/developing-applications/building-blocks/observability/w3c-tracing-overview.md +++ /dev/null @@ -1,33 +0,0 @@ ---- -type: docs -title: "Trace context" -linkTitle: "Trace context" -weight: 4000 -description: Background and scenarios for using W3C tracing with Dapr -type: docs ---- - -Dapr uses the [Open Telemetry protocol](https://opentelemetry.io/), which in turn uses the [W3C trace context](https://www.w3.org/TR/trace-context/) for distributed tracing for both service invocation and pub/sub messaging. Dapr generates and propagates the trace context information, which can be sent to observability tools for visualization and querying. - -## Background -Distributed tracing is a methodology implemented by tracing tools to follow, analyze, and debug a transaction across multiple software components. Typically, a distributed trace traverses more than one service which requires it to be uniquely identifiable. Trace context propagation passes along this unique identification. - -In the past, trace context propagation has typically been implemented individually by each different tracing vendor. In multi-vendor environments, this causes interoperability problems, such as: - -- Traces that are collected by different tracing vendors cannot be correlated as there is no shared unique identifier. -- Traces that cross boundaries between different tracing vendors can not be propagated as there is no forwarded, uniformly agreed set of identification. -- Vendor-specific metadata might be dropped by intermediaries. -- Cloud platform vendors, intermediaries, and service providers cannot guarantee to support trace context propagation as there is no standard to follow. - -In the past, these problems did not have a significant impact, as most applications were monitored by a single tracing vendor and stayed within the boundaries of a single platform provider. Today, an increasing number of applications are distributed and leverage multiple middleware services and cloud platforms. - -This transformation of modern applications called for a distributed tracing context propagation standard. The [W3C trace context specification](https://www.w3.org/TR/trace-context/) defines a universally agreed-upon format for the exchange of trace context propagation data - referred to as trace context. Trace context solves the problems described above by: - -* Providing a unique identifier for individual traces and requests, allowing trace data of multiple providers to be linked together. -* Providing an agreed-upon mechanism to forward vendor-specific trace data and avoid broken traces when multiple tracing tools participate in a single transaction. -* Providing an industry standard that intermediaries, platforms, and hardware providers can support. - -A unified approach for propagating trace data improves visibility into the behavior of distributed applications, facilitating problem and performance analysis. - -## Related Links -- [W3C Trace Context specification](https://www.w3.org/TR/trace-context/)