Skip to content

Commit

Permalink
Feature: Sampling delegation (#59)
Browse files Browse the repository at this point in the history
* core implementation + simple test

* add tests

* wip: delegate example

* self code review

* Rework docker-compose

* code review

* fix compilation

* Apply suggestions from code review

Co-authored-by: David Goffredo <david.goffredo@datadoghq.com>

* Revising implementation

* documentation minutiae

* proxy-example  ->  http-proxy-example

* revise comments in tracingutil.hpp

* .hpp  ->  .h

* remove enum base type

* struct InjectionOptions in its own header

* document TraceSegment::{read,write}_sampling_delegation_response

* add explicit SIGTERM handler to examples/http-server/proxy

* revise sampling delegation

* make it easier to always use the two-argument overload of Span::inject

* don't interpret the delegation request header

* prevent delegation from overriding sampling

* address review comments:

- Protect `struct SamplingDelegation` with a mutex.
- Check the result of `finalize_config(config3)` in `test_tracer.cpp`.

* enable_sampling_delegation -> delegate_trace_sampling

* it IS implemented!

* add developer documentation for sampling delegation

* restore a comment in the example proxy

* mention the Span sampling delegation methods in the docs

* remove some unnecessary includes from span.h

* undo unnecessary inline

* assert what you assume

* DD_SAMPLING_DELEGATION_HEADER -> sampling_delegation_request_header

* revise the description of TracerConfig::delegate_trace_sampling

* compromise between compilers

* initialize POD member of struct ExtractedData

* remove TODO in CMakeLists.txt

* it's C++, not NodeJS

* Apply suggestions from code review

---------

Co-authored-by: David Goffredo <david.goffredo@datadoghq.com>
  • Loading branch information
dmehala and dgoffredo authored Jan 12, 2024
1 parent 89a8e36 commit bcee9a4
Show file tree
Hide file tree
Showing 41 changed files with 3,223 additions and 1,307 deletions.
1 change: 1 addition & 0 deletions BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ cc_library(
"src/datadog/hex.h",
"src/datadog/http_client.h",
"src/datadog/id_generator.h",
"src/datadog/injection_options.h",
"src/datadog/json.hpp",
"src/datadog/json_fwd.hpp",
"src/datadog/limiter.h",
Expand Down
11 changes: 6 additions & 5 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ cmake_minimum_required(VERSION 3.24)
project(dd-trace-cpp)

option(BUILD_COVERAGE "Build code with code coverage profiling instrumentation" OFF)
option(BUILD_HASHER_EXAMPLE "Build the example program examples/hasher" OFF)
option(BUILD_EXAMPLES "Build example programs" OFF)
option(BUILD_TESTING "Build the unit tests (test/)" OFF)
option(BUILD_FUZZERS "Build fuzzers" OFF)
option(BUILD_BENCHMARK "Build benchmark binaries" OFF)
Expand Down Expand Up @@ -120,8 +120,8 @@ target_sources(dd_trace_cpp-objects PRIVATE
src/datadog/span_matcher.cpp
src/datadog/span_sampler_config.cpp
src/datadog/span_sampler.cpp
src/datadog/tag_propagation.cpp
src/datadog/tags.cpp
src/datadog/tag_propagation.cpp
src/datadog/threaded_event_scheduler.cpp
src/datadog/tracer_config.cpp
src/datadog/tracer_telemetry.cpp
Expand Down Expand Up @@ -163,6 +163,7 @@ target_sources(dd_trace_cpp-objects PUBLIC
src/datadog/hex.h
src/datadog/http_client.h
src/datadog/id_generator.h
src/datadog/injection_options.h
src/datadog/json_fwd.hpp
src/datadog/json.hpp
src/datadog/limiter.h
Expand Down Expand Up @@ -212,7 +213,6 @@ find_package(Threads REQUIRED)
target_link_libraries(dd_trace_cpp-objects
PUBLIC
libcurl
PUBLIC
Threads::Threads
${COVERAGE_LIBRARIES}
${COREFOUNDATION_LIBRARY}
Expand All @@ -239,8 +239,9 @@ if(BUILD_TESTING)
add_subdirectory(test)
endif()

# Each example has its own build flag.
add_subdirectory(examples)
if(BUILD_EXAMPLES)
add_subdirectory(examples)
endif()

if(BUILD_BENCHMARK)
add_subdirectory(benchmark)
Expand Down
189 changes: 189 additions & 0 deletions doc/sampling-delegation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
# Sampling Delegation
This document is a technical description of how sampling delegation works in
this library. The intended audience is maintainers of the library.

Sampling delegation allows a tracer to use the trace sampling decision of a
service that it calls. The purpose of sampling delegation is to allow reverse
proxies at the ingress of a system (gateways) to use trace sampling decisions
that are decided by the actual services, as opposed to having to decide the
trace sampling decision at the proxy. The idea is that putting a reverse proxy
in front of your service(s) should not change how you configure sampling.

See the `sampling-delegation` directory in Datadog's internal architecture
repository for the specification of sampling delegation.

## Roles
In sampling delegation, a tracer plays one or both of two roles:

- The _delegator_ is the tracer that is configured to delegate its trace
sampling decision. The delegator will request a sampling decision from one of
the services it calls.
- It will send the `X-Datadog-Delegate-Trace-Sampling` request header.
- If it is the root service, and if delegation succeeded, then it will set the
`_dd.is_sampling_decider:0` tag to indicate that some other service made the
sampling decision.
- The _delegatee_ is the tracer that has received a request whose headers
indicate that the client is delegating the sampling decision. The delegatee
will make a trace sampling decision using its own configuration, and then
convey that decision back to the client.
- It will send the `X-Datadog-Trace-Sampling-Decision` response header.
- If its sampling decision was made locally, as opposed to delegated to yet
another service, then it will set the `_dd.is_sampling_decider:1` tag to
indicate that it is the service that made the sampling decision.

For a given trace, the tracer might act as the delegator, the delegatee, both,
or neither.

## Tracer Configuration
Whether a tracer should act as a delegator is determined by its configuration.

`bool TracerConfig::delegate_trace_sampling` is defined in [tracer_config.h][1]
and defaults to `false`. Its value is overridden by the
`DD_TRACE_DELEGATE_SAMPLING` environment variable. If `delegate_trace_sampling`
is `true`, then the tracer will act as delegator.

## Runtime State
Whether a tracer should act as a delegatee is determined by whether the
extracted trace context includes the `X-Datadog-Delegate-Trace-Sampling` request
header. If trace context is extracted in the Datadog style, and if the
extracted context includes the `X-Datadog-Delegate-Trace-Sampling` header, then
the tracer will act as delegatee.

All logic relevant to sampling delegation happens in `TraceSegment`, defined in
[trace_segment.h][2]. The `Tracer` that creates the `TraceSegment` passes
two booleans into `TraceSegment`'s constructor:

- `bool sampling_delegation_enabled` indicates whether the `TraceSegment` will
act as delegator.
- `bool sampling_decision_was_delegated_to_me` indicates whether the
`TraceSegment` will act as delegatee.

`TraceSegment` then keeps track of its sampling delegation relevant state in a
private data structure, `struct SamplingDelegation` (also defined in
[trace_segment.h][2]). `struct SamplingDelegation` contains the two booleans
passed into `TraceSegment`'s constructor, and additional booleans used
throughout the trace segment's lifetime.

### `bool TraceSegment::SamplingDelegation::sent_request_header`
`send_request_header` indicates that, as delegator, the trace segment included
the `X-Datadog-Delegate-Trace-Sampling` request header as part of trace context
sent to another service.

`sent_request_header` is used to prevent sampling delegation from being
requested of two or more services. Once a trace segment has requested sampling
delegation once, it will not request sampling delegation again, even if it never
receives the delegated decision in response.

### `bool TraceSegment::SamplingDelegation::received_matching_response_header`
`received_matching_response_header` indicates that, as delegator, the trace
segment received a valid `X-Datadog-Trace-Sampling-Decision` response header
from a service to which the trace segment had previously sent the
`X-Datadog-Delegate-Trace-Sampling` request header.

The `X-Datadog-Trace-Sampling-Decision` response header is valid if it is valid
JSON of the form `{"priority": int, "mechanism": int}`. See
`parse_sampling_delegation_response`, defined in [trace_segment.cpp][3].

`received_matching_response_header` is used as part of determining whether to
set the `_dd.is_sampling_decider:1` tag as delegatee. If a trace segment is
acting as delegatee, and if it made the sampling decision, then it sets the tag
`_dd.is_sampling_decider:1` on its local root span. However, the trace segment
might also be acting as delegator. `received_matching_response_header` allows
the trace segment to determine whether it delegated its decision to another
service, and thus is not the "sampling decider."

An alternative way to determine whether a trace segment delegated its sampling
decision is to see whether its `SamplingDecision::origin` has the value
`SamplingDecision::Origin::DELEGATED` (see [sampling_decision.h][4]). However,
a trace segment's sampling decision might be overridden at any time by
`TraceSegment::override_sampling_priority(int)`. So, to answer the question
"did we delegate to another service?" it is better to keep track of whether the
trace segment received a valid and expected `X-Datadog-Trace-Sampling-Decision`
response header, which is what `received_matching_response_header` does.

### `bool TraceSegment::SamplingDelegation::sent_response_header`
`sent_response_header` indicates that, as delegatee, the trace segment sent its trace sampling
decision back to the client in the `X-Datadog-Trace-Sampling-Decision` response
header.

`sent_response_header` is used as part of determining whether to set the
`_dd.is_sampling_decider:1` tag as delegatee. The trace segment would not claim
to be the "sampling decider" if the service that delegated to it does not know
about the decision. If `sent_response_header` is true, then the trace segment
can be fairly confident that the client will receive the sampling decision.

### `bool Span::expecting_delegated_sampling_decision_`
In addition to the state maintained in `TraceSegment`, `Span` also has a
sampling delegation related `bool`. See [span.h][5].

When sampling delegation is requested for an injected `Span`, that span
remembers that it injected the `X-Datadog-Delegate-Trace-Sampling` header.

Later, when the corresponding response is examined, the `Span` knows whether to
expect the `X-Datadog-Trace-Sampling-Decision` response header to be present.

`bool Span::expecting_delegated_sampling_decision_` prevents a `Span` from
interpreting an `X-Datadog-Trace-Sampling-Decision` response header when none
was requested.

## Reading and Writing Responses
Distributed tracing typically does not involve RPC _responses_. When a service
X makes an HTTP/gRPC/etc. request to another service Y, X injects information
about the trace in request metadata (e.g. HTTP request headers). Y then
extracts that information from the request.

Responses aren't involved.

Now, with sampling delegation, responses _are_ involved.

Trace context injection and extraction are about _requests_ (sending a receiving,
respectively). For _responses_ the tracing library needs a new notion.

`TraceSegment` has two member functions for producing and consuming
response-related metadata (see [trace_segment.h][2]):

- `void TraceSegment::write_sampling_delegation_response(DictWriter&)` writes
the `X-Datadog-Trace-Sampling-Decision` response header, if appropriate. This
is something that a _delegatee_ does.
- `void TraceSegment::read_sampling_delegation_response(const DictReader&)`
reads the `X-Datadog-Delegate-Trace-Sampling` response header, if present.
This is something that a _delegator_ does.

`TraceSegment::read_sampling_delegation_response` is not called directly by an
instrumented application.
Instead, an instrumented application calls
`Span::read_sampling_delegation_response` on the `Span` that performed the
injection whose response is being examined.
`Span::read_sampling_delegation_response` then might call
`TraceSegment::read_sampling_delegation_response`.

`TraceSegment::write_sampling_delegation_response` is called directly by an
instrumented application.

Just as `Tracer::extract_span` and `Span::inject` must be called by an
instrumented application in order for trace context propagation to work,
`Span::read_sampling_delegation_response` and
`TraceSegment::write_sampling_delegation_response` must be called by an
instrumented application in order for sampling delegation to work.

## Per-Trace Configuration
In addition to the `Tracer`-wide configuration option `bool
TracerConfig::delegate_trace_sampling`, there is also a per-injection option
`Optional<bool> InjectionOptions::delegate_sampling_decision`.

`Span::inject` has an overload
`void inject(DictWriter&, const InjectionOptions&) const`. The
`InjectionOptions` can be used to specify sampling delegation (or its absence)
for this particular injection site. If
`InjectionOptions::delegate_sampling_decision` is null, which is the default,
then the tracer-wide configuration option is used instead.

This granularity of control is useful in NGINX, where one `location` (i.e.
upstream or backend) might be configured for sampling delegation, while another
`location` might not.

[1]: ../src/datadog/tracer_config.h
[2]: ../src/datadog/trace_segment.h
[3]: ../src/datadog/trace_segment.cpp
[4]: ../src/datadog/sampling_decision.h
[5]: ../src/datadog/sampling_decision.h
5 changes: 2 additions & 3 deletions examples/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
if (BUILD_HASHER_EXAMPLE)
add_subdirectory(hasher)
endif()
add_subdirectory(hasher)
add_subdirectory(http-server)
4 changes: 2 additions & 2 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@ can be used to add Datadog tracing to a C++ application.

- [hasher](hasher) is a command-line tool that creates a complete trace
involving only one service.
- [http-server](http-server) is an ensemble of services, including one C++
service traced using this library. The traces generated are distributed
- [http-server](http-server) is an ensemble of services, including two C++
services traced using this library. The traces generated are distributed
across all of the services in the example.
2 changes: 2 additions & 0 deletions examples/http-server/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
add_subdirectory(proxy)
add_subdirectory(server)
15 changes: 15 additions & 0 deletions examples/http-server/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from ubuntu:22.04

WORKDIR /dd-trace-cpp

ARG DEBIAN_FRONTEND=noninteractive
ARG BRANCH=v0.1.12

run apt update -y \
&& apt install -y g++ make git wget sed \
&& git clone --branch "${BRANCH}" 'https://github.com/datadog/dd-trace-cpp' . \
&& bin/install-cmake \
&& mkdir dist \
&& cmake -B .build -DBUILD_EXAMPLES=1 . \
&& cmake --build .build -j \
&& cmake --install .build --prefix=dist
2 changes: 1 addition & 1 deletion examples/http-server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Click one of the results to display a flame graph of the associated trace.

![screenshot of flame graph](diagrams/flame-graph.png)

At the top is the Node.js proxy that we called using `curl`. Below that is the
At the top is the C++ proxy that we called using `curl`. Below that is the
C++ server to which the proxy forwarded our request. Below that is the
Python database service, including a span indicating its use of SQLite.

Expand Down
Loading

0 comments on commit bcee9a4

Please sign in to comment.