-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
heterogeneous trace context extraction #72
heterogeneous trace context extraction #72
Conversation
BenchmarksBenchmark execution time: 2023-11-17 20:49:17 Comparing candidate commit e69b0e8 in PR branch Found 0 performance improvements and 1 performance regressions! Performance is the same for 0 metrics, 0 unstable metrics. scenario:BM_TraceTinyCCSource
|
This is used by bin/check to do a "dry run" format.
33c2926
to
89a565c
Compare
I'll run these against Zach's parametric system tests tomorrow. |
89a565c
to
fa83395
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be an issue with the merge
algorithm. Otherwise, LGTM . Thank you for extraction_util
:
NOTES
The structure of the code managing extraction according to propagation style reflects too much the transition we have internally. Let me explain: I haven't read the RFCs, but it seems that we wanted to process one type of propagation at a time, then now we decide to process several at the same time to the point they can be interchangeable. IMO it would be easier and more efficient to process and validate all propagation style in one pass.
Another thing, I noticed that the code to extract traceparent uses std::regex
. Although very useful, this library is notoriously known to be very slow. Since traceparent format is easy, we can achieve the same result without it. Here is a quick benchmark.
} | ||
extracted_contexts.push_back(std::move(*data)); | ||
extracted_contexts.back().headers_examined = audited_reader.entries_found; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible extract
to add those examined headers instead? It's a bit weird to have an extract
function and the caller need to enrich the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be possible, yes. That was my initial design.
I talked myself into the current design because:
- I didn't want to add code to the extractors saying "I looked at this header" when the
DictReader
could just do that for them. - Tracking the headers examined is for error messages. The extractors are responsible for creating their own error messages, but I did not want them to contribute information to the error messages of their caller.
As I read this, it doesn't sound very convincing. Suffice it to say, I looked into my c++ crystal ball and didn't like the code that notes the headers within the extractors, and so did this two-step instead.
Maybe adding style
and headers_examined
as members to ExtractedData
was a mistake. Unnecessary, for sure. It doesn't bother me yet, though.
Maybe I just wanted an excuse to make further use of the DictReader
interface... as it stands, the implementation of visit
is yet more dead code.
You are free to change my mind about this. But again, I'll leave this as is for this PR.
@@ -33,6 +34,13 @@ struct ExtractedData { | |||
// `additional_datadog_w3c_tracestate` is null. | |||
// `additional_datadog_w3c_tracestate` is used for the `W3C` injection style. | |||
Optional<std::string> additional_datadog_w3c_tracestate; | |||
// `style` is the extraction style used to obtain this `ExtractedData`. It's | |||
// for diagnostics. | |||
Optional<PropagationStyle> style; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional<PropagationStyle> style; | |
PropagationStyle style; |
The propagation style is not inferred from the input, it can't be missing. That's why I do think it semantically makes sense to use PropagationStyle::NONE
instead. Plus, it's easier to manipulate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I had this non-Optional
. Then I noticed the case where all of the ExtractedData
had neither trace ID nor parent ID. What then is the style? You suggest NONE
, but that denotes an explicit style that can be configured; it is not a placeholder for the absence of a style.
NONE
's existence is questionable, but it is specified both in the current RFC that I am following, as well as in its earlier incarnation that I implemented previously (1, 2).
Maybe what you're noticing is the awkwardness of ExtractedData::style
and ExtractedData::headers_examined
. After all, they are not the extracted data.
Or maybe what you're noticing is the now dual purpose of ExtractedData
: It used to be the result of examining headers for one propagation style. Now it is either that, or the result of merge
ing zero or more styles. Perhaps ExtractedData
is the wrong type for this latter notion.
This also speaks to your summarizing comment that this hybrid "one at a time, but not really" design is questionable. I see your point there, but I think that the "one at a time, with special merging logic" is precisely what is needed to satisfy the RFC. For what it's worth, it's also what Ruby does. As we continue to discover heterogeneous propagation scenarios "in the wild," we will likely modify this design again.
I'm going to leave this as is for this PR, but you make a good point. Let's improve this in a future PR.
{{"traceparent", "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01"}, | ||
{"tracestate", "dd=foo:bar,lol=wut"}}, | ||
{{"traceparent", "00-4bf92f3577b34da6a3ce929d0e0e4736-000000000000002a-01"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why the parent-id
changed from 00f067aa0ba902b7
to 000000000000002a
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parent ID that was extracted is the ID of some remote span (in the calling service). Then we create a new span, with its own ID, and inject its information. So, the injected span ID is the ID that we generated for our one local span.
It's 0x2a
because of MockIDGenerator
.
Tracer tracer{*finalized_config, std::make_shared<MockIDGenerator>()}; | ||
|
||
MockDictReader reader{test_case.extracted_headers}; | ||
auto span = tracer.extract_span(reader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do know how Catch2 will handle the exception thrown by MockIDGenerator
(a more suggestive name could beForbiddenIDGenerator
?) if any, I guess it will be better to handle it with REQUIRE_NOTHROW
.
Expected<Span> span;
REQUIRE_NOTHROW(span = tracer.extract_span(reader))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption "this code won't generate a trace ID" is not what is under test here.
Yours is a good idea, but that code won't compile due to the definitions of Span
and Expected
.
Here's the best that I could come up with:
Tracer tracer{*finalized_config, std::make_shared<MockIDGenerator>()};
MockDictReader reader{test_case.extracted_headers};
MockDictWriter writer;
REQUIRE_NOTHROW([&]() {
auto span = tracer.extract_span(reader);
REQUIRE(span);
span->inject(writer);
}());
REQUIRE(writer.items == test_case.expected_injected_headers);
It bothers me that now we're asserting the non-throwness of three statements instead of the one expression tracer.extract_span(reader)
.
Another way around this is to use a level of indirection:
Optional<Expected<Span>> maybe_span;
REQUIRE_NOTHROW(maybe_span = tracer.extract_span(reader));
REQUIRE(maybe_span.has_value());
REQUIRE(*maybe_span);
Span& span = **maybe_span;
There are probably other ways to put a REQUIRE_NOTHROW
in there, but since that behavior is not what is under test, I'll leave the code as is.
src/datadog/tracer.cpp
Outdated
@@ -2,11 +2,13 @@ | |||
|
|||
#include <algorithm> | |||
#include <cassert> | |||
#include <sstream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
if (result.style == PropagationStyle::W3C) { | ||
return result; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems there is an implicit assumption on contexts order in the vector. From my understanding with std::vector<ExtractedData> context{w3c_extracted, datadog_extracted}
with w3c_extracted
and datatdog_extracted
respectively being ExtractedData
from extract_w3c()
and extract_datadog()
then datadog headers well not be merged. Here is a failing test case:
{__LINE__, "tracestate from subsequent style",
{PropagationStyle::W3C, PropagationStyle::DATADOG}, ///< I switched W3C & DATADOG order
{PropagationStyle::W3C},
{{"x-datadog-trace-id", "48"}, {"x-datadog-parent-id", "64"},
{"x-datadog-origin", "Kansas"}, {"x-datadog-sampling-priority", "2"},
{"traceparent", "00-00000000000000000000000000000030-0000000000000040-01"},
{"tracestate", "competitor=stuff,dd=o:Nebraska;s:1;ah:choo"}}, // origin is different
{{"traceparent", "00-00000000000000000000000000000030-000000000000002a-01"},
{"tracestate", "dd=s:2;o:Kansas;ah:choo,competitor=stuff"}}},
I does work most of the time because extract_datadog
is called before extract_w3c
. Is it the intended behavior am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case that you illustrate, the traceparent
and tracestate
headers will be accepted by the W3C style, which is listed first, and then all x-datadog-*
headers will be ignored. This is indeed the intended behavior. The merge operation is not "symmetric" with respect to the relative ordering of W3C and non-W3C styles.
The sole initial goal of these changes, as I understand them, is to make sure that information from the tracestate
header is always propagated when it is known to be related to whichever trace we decide to be part of.
In your example, the actual and expected injected headers would be traceparent
unchanged, and then tracestate
mostly unchanged (perhaps the dd
items could be reordered and others could appear, and the dd=
would be first).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thoughtful review, Damien.
I haven't read the RFCs, but it seems that we wanted to process one type of propagation at a time, then now we decide to process several at the same time to the point they can be interchangeable. IMO it would be easier and more efficient to process and validate all propagation style in one pass.
The logic specified in the RFC, as I understand it, is that still we are trying styles in order, but there's a new caveat with regard to the tracestate
header, under certain conditions.
DictReader
has both lookup
and visit
because I anticipated a "look at all of the headers in one pass as they're parsed" use case, which is how nginx does things under the hood. However, Envoy exposes a map-like interface. The resulting compromise is that dd-trace-cpp uses a map-like interface, and the map is constructed upfront if necessary (e.g. in nginx).
So, there is little benefit to processing all styles in one pass, and doing so would obscure the fact that the relative priority of configured styles is still important.
Another thing, I noticed that the code to extract traceparent uses std::regex. Although very useful, this library is notoriously known to be very slow. Since traceparent format is easy, we can achieve the same result without it. Here is a quick benchmark.
On the contrary -- std::regex
is not slow, it's fast. So that solves that problem!
I considered writing the traceparent
parser without regex. tracestate
, x-datadog-tags
, and most other parsing done by the library is by-hand, without regex. I chose to use regex for traceparent
for the following reasons:
- The nine lines of commented regex pattern at the top of
extract_traceparent
describe the grammar of what is being parsed, and the parser has no choice but to comply with the grammar. - The numeric parsing done after the regex match does not need to be checked for errors, because the correctness of the hex encoding is enforced by the regular expression.
- Every time I see a hand-written parser, I curse under my breath and reverse engineer the regular expression that it is implementing. There are usually bugs.
Most other parsing done by the library is of the kind "X-separated list of zero or more Y." This does not lend itself particularly well to regex, and is straightforward enough to parse with some careful looping. traceparent
, on the other hand, has a known length prefix with a known structure, and so lends itself well to parsing via regex.
I appreciate your working benchmark and alternative non-regex implementation. The alternative implementation (and the adapted regex-based one) does not parse the hex integers, and so the notational burden of that error handling is omitted from your example. Also, your code (which is impressive) requires that I learn the simple machine model of the parser, explain to myself the numeric offsets, and simulate the parse for a variety of examples until I am confident that it does what I like.
Compare this to the regex version, which is pretty much just a single regex pattern.
If you can prove that Envoy or nginx are burdened by this use of std::regex
, then I will happily accept your alternative parser. For now, though, let's use std::regex
.
I have some immediate changes based on your review that I will push shortly.
Then I'll mull over your review a little longer to see what else I might change.
Then I'll check these changes against Datadog's internal system tests.
Then I'll merge.
@@ -33,6 +34,13 @@ struct ExtractedData { | |||
// `additional_datadog_w3c_tracestate` is null. | |||
// `additional_datadog_w3c_tracestate` is used for the `W3C` injection style. | |||
Optional<std::string> additional_datadog_w3c_tracestate; | |||
// `style` is the extraction style used to obtain this `ExtractedData`. It's | |||
// for diagnostics. | |||
Optional<PropagationStyle> style; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I had this non-Optional
. Then I noticed the case where all of the ExtractedData
had neither trace ID nor parent ID. What then is the style? You suggest NONE
, but that denotes an explicit style that can be configured; it is not a placeholder for the absence of a style.
NONE
's existence is questionable, but it is specified both in the current RFC that I am following, as well as in its earlier incarnation that I implemented previously (1, 2).
Maybe what you're noticing is the awkwardness of ExtractedData::style
and ExtractedData::headers_examined
. After all, they are not the extracted data.
Or maybe what you're noticing is the now dual purpose of ExtractedData
: It used to be the result of examining headers for one propagation style. Now it is either that, or the result of merge
ing zero or more styles. Perhaps ExtractedData
is the wrong type for this latter notion.
This also speaks to your summarizing comment that this hybrid "one at a time, but not really" design is questionable. I see your point there, but I think that the "one at a time, with special merging logic" is precisely what is needed to satisfy the RFC. For what it's worth, it's also what Ruby does. As we continue to discover heterogeneous propagation scenarios "in the wild," we will likely modify this design again.
I'm going to leave this as is for this PR, but you make a good point. Let's improve this in a future PR.
if (result.style == PropagationStyle::W3C) { | ||
return result; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case that you illustrate, the traceparent
and tracestate
headers will be accepted by the W3C style, which is listed first, and then all x-datadog-*
headers will be ignored. This is indeed the intended behavior. The merge operation is not "symmetric" with respect to the relative ordering of W3C and non-W3C styles.
The sole initial goal of these changes, as I understand them, is to make sure that information from the tracestate
header is always propagated when it is known to be related to whichever trace we decide to be part of.
In your example, the actual and expected injected headers would be traceparent
unchanged, and then tracestate
mostly unchanged (perhaps the dd
items could be reordered and others could appear, and the dd=
would be first).
// `ExtractedData`. The order of the elements of `contexts` must correspond to | ||
// the order of the configured extraction propagation styles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this last sentence in the function contract.
src/datadog/tracer.cpp
Outdated
@@ -2,11 +2,13 @@ | |||
|
|||
#include <algorithm> | |||
#include <cassert> | |||
#include <sstream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
} | ||
extracted_contexts.push_back(std::move(*data)); | ||
extracted_contexts.back().headers_examined = audited_reader.entries_found; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be possible, yes. That was my initial design.
I talked myself into the current design because:
- I didn't want to add code to the extractors saying "I looked at this header" when the
DictReader
could just do that for them. - Tracking the headers examined is for error messages. The extractors are responsible for creating their own error messages, but I did not want them to contribute information to the error messages of their caller.
As I read this, it doesn't sound very convincing. Suffice it to say, I looked into my c++ crystal ball and didn't like the code that notes the headers within the extractors, and so did this two-step instead.
Maybe adding style
and headers_examined
as members to ExtractedData
was a mistake. Unnecessary, for sure. It doesn't bother me yet, though.
Maybe I just wanted an excuse to make further use of the DictReader
interface... as it stands, the implementation of visit
is yet more dead code.
You are free to change my mind about this. But again, I'll leave this as is for this PR.
{{"traceparent", "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01"}, | ||
{"tracestate", "dd=foo:bar,lol=wut"}}, | ||
{{"traceparent", "00-4bf92f3577b34da6a3ce929d0e0e4736-000000000000002a-01"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parent ID that was extracted is the ID of some remote span (in the calling service). Then we create a new span, with its own ID, and inject its information. So, the injected span ID is the ID that we generated for our one local span.
It's 0x2a
because of MockIDGenerator
.
Tracer tracer{*finalized_config, std::make_shared<MockIDGenerator>()}; | ||
|
||
MockDictReader reader{test_case.extracted_headers}; | ||
auto span = tracer.extract_span(reader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption "this code won't generate a trace ID" is not what is under test here.
Yours is a good idea, but that code won't compile due to the definitions of Span
and Expected
.
Here's the best that I could come up with:
Tracer tracer{*finalized_config, std::make_shared<MockIDGenerator>()};
MockDictReader reader{test_case.extracted_headers};
MockDictWriter writer;
REQUIRE_NOTHROW([&]() {
auto span = tracer.extract_span(reader);
REQUIRE(span);
span->inject(writer);
}());
REQUIRE(writer.items == test_case.expected_injected_headers);
It bothers me that now we're asserting the non-throwness of three statements instead of the one expression tracer.extract_span(reader)
.
Another way around this is to use a level of indirection:
Optional<Expected<Span>> maybe_span;
REQUIRE_NOTHROW(maybe_span = tracer.extract_span(reader));
REQUIRE(maybe_span.has_value());
REQUIRE(*maybe_span);
Span& span = **maybe_span;
There are probably other ways to put a REQUIRE_NOTHROW
in there, but since that behavior is not what is under test, I'll leave the code as is.
When I run the parametric system tests locally, I get 97 errors of the following type:
The referenced log file ( I'm going to proceed with a release of this code. |
These changes remedy the following scenario:
A tracer is configured to inject in the W3C (tracecontext, i.e. OpenTelemetry) style, perhaps among other styles. The same tracer is configured to extract first in one or more non-W3C styles (DATADOG and/or B3), and then in the W3C style.
When an incoming request contains the desired non-W3C headers, such as
X-Datadog-Trace-Id
, then trace context is extracted from them, and any W3C headers in the request are ignored.This is problematic when the
tracestate
W3C header contains non-Datadog information. We have ways of putting Datadog information intotracestate
, but there is not a way to put arbitrarytracestate
into Datadog headers.So, this pull request adds the following logic: If we extract due to non-W3C headers, but we are configured also to extract W3C headers, then look for the
tracestate
header and keep some of its data if the trace ID from the correspondingtraceparent
header matches the trace ID extracted from the non-W3C header.See the included unit test for clarification.
Note to the reviewer: There are some unnecessary changes included in this pull request, per discussions we've had previously about where the extraction code should live. I didn't go all-in on the reorganization, but
extraction_util
is a start. See the individual commits if you want to subtract that noise.