Skip to content

feat: ensure tracecontext headers take precedence over datadog (AIT-10281) #142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 2, 2024

Conversation

zacharycmontoya
Copy link
Contributor

Description

Adds support for "W3C Phase 3" so that when multiple trace contexts are extracted (with the same trace-id) and the tracecontext format is used, the resulting trace context will have:

  • span.parent_id set to the tracecontext parent-id value
  • span.tags["_dd.parent_id"] set to either the value of tracestate[dd][p] or hex(x-datadog-parent-id), in that order of precedence

…n")->SECTION("W3C Phase 3 support - Preferring tracecontext") with the following test inputs and outputs:

- INPUT: traceparent
- INPUT: tracestate
- INPUT: x-datadog-trace-id
- INPUT: x-datadog-parent-id
- INPUT: x-datadog-tags
- OUTPUT: expected_parent_id
- OUTPUT: expected_datadog_w3c_parent_id

This tests the system-test case test_tracestate_w3c_p_extract_datadog_w3c for W3C Phase 3.
@zacharycmontoya zacharycmontoya requested a review from a team as a code owner July 19, 2024 21:46
@zacharycmontoya zacharycmontoya requested review from Anilm3 and removed request for a team July 19, 2024 21:46
@dmehala dmehala self-requested a review July 20, 2024 10:25
Copy link
Collaborator

@dmehala dmehala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my understanding of the RFC is correct, the sole addition of phase 3 is:

MUST follow the rules below if _dd.parent_id tracestate if the p value is not set

  • Fallback 1: _dd.parent_id SHOULD be set using the parent_id extracted from datadog (x-datadog-parent-id). The value must be encoded according to W3C Phase 2 Plan
  • Fallback 2: Do not set _dd.parent_id

In that case, your change is unnecessary and only this section of the code needs to be updated.

EDIT: I forgot that the parent id must come from x-datadog-parent-id. Your change is still perfectible, when tracecontext is the first extraction style, and p is missing from the tracestate, then _dd.parent_id will not be set.

Searching for attribute X for propagation style Y seems inevitable, may I suggest a bigger change? Instead of pushing all extracted data to a std::vector<ExtractedData> here, use an std::unordered_map<PropagationStyle, ExtractedData>.

Accessing an attributes for a given propagation style will be constant and improve the code readability. What do you think? :)

@zacharycmontoya
Copy link
Contributor Author

zacharycmontoya commented Jul 24, 2024

EDIT: I forgot that the parent id must come from x-datadog-parent-id. Your change is still perfectible, when tracecontext is the first extraction style, and p is missing from the tracestate, then _dd.parent_id will not be set.

In the case where tracestate[dd][p] is not present, the expected behavior is that _dd.parent_id will not be set, so my current implementation matches that. @mabdinur can you double-check that my interpretation of the requirement is correct? This implementation passes the system-tests so it seems like this is functioning as-expected.

Searching for attribute X for propagation style Y seems inevitable, may I suggest a bigger change? Instead of pushing all extracted data to a std::vector<ExtractedData> here, use an std::unordered_map<PropagationStyle, ExtractedData>.

Accessing an attributes for a given propagation style will be constant and improve the code readability. What do you think? :)

Yes this seems like a good change, I will follow up with a change to do that 👍🏼

EDIT: updated in 1742ee9

…d the first_style to be used as the main context. This improves the lookup and logic of the merge at the cost of moving up the error cases up a level to Tracer::extract_span
@pr-commenter
Copy link

pr-commenter bot commented Jul 24, 2024

Benchmarks

Benchmark execution time: 2024-08-02 18:29:11

Comparing candidate commit 5a17ca1 in PR branch zach.montoya/w3c-phase3 with baseline commit f6d6836 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics.

@codecov-commenter
Copy link

codecov-commenter commented Jul 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.46%. Comparing base (f6d6836) to head (5a17ca1).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #142      +/-   ##
==========================================
- Coverage   94.47%   94.46%   -0.02%     
==========================================
  Files          73       73              
  Lines        3800     3811      +11     
==========================================
+ Hits         3590     3600      +10     
- Misses        210      211       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dmehala
Copy link
Collaborator

dmehala commented Jul 25, 2024

In the case where tracestate[dd][p] is not present, the expected behavior is that _dd.parent_id will not be set, so my current implementation matches that. @mabdinur can you double-check that my interpretation of the requirement is correct? This implementation passes the system-tests so it seems like this is functioning as-expected.

That's partially true. There's two contradictory fallbacks. I'll let @mabdinur decide but my understanding is: set _dd.parent_id using the datadog parent header if available or do nothing.

If we agree on the same understanding, then:

  • tracestate[dd][p] will not be set while it should and that may be an indication of a missing sys-test.
  • That section of the RFC must be reworded.

@zacharycmontoya
Copy link
Contributor Author

Oh I think I understand where the confusion is, there's two different scenarios:

  1. tracecontext is the first context
  2. tracecontext is not the first context

If tracestate[dd][p] is not set, do we use the x-datadog-parent-id as _dd.parent_id in both scenarios? Before my latest changes I only did this for Scenario 2. But now my changes should do this for Scenario 1 as well.

@mabdinur
Copy link
Contributor

mabdinur commented Jul 25, 2024

Oh I think I understand where the confusion is, there's two different scenarios:

  1. tracecontext is the first context
  2. tracecontext is not the first context

If tracestate[dd][p] is not set, do we use the x-datadog-parent-id as _dd.parent_id in both scenarios? Before my latest changes I only did this for Scenario 2. But now my changes should do this for Scenario 1 as well.

I updated the design over to state:

This design adds the following logic for trace context header extraction: When tracecontext and other supported distributed tracing headers contain identical trace_ids but conflicting parent_ids tracer libraries:

  • MUST attempt to extract the last Datadog parent id from the tracestate headers (p value in dd=) regardless of the primary propagation style. The extracted value must be stored in a trace level tag with the key _dd.parent_id.
  • If the tracestate header does not contain the last Datadog parent id AND the x-datadog-parent-id header is received, the value of the x-datadog-parent-id header must be converted to a 16 character hexidecimal and this value MUST be used to set the _dd.parent_id trace level tag.
  • MUST use the span id from the transparent header to continue the distributed trace. The span id extracted from traceparent headers should take precedence over ANY headers containing span ids, regardless of the primary propagation style.

The W3C Phase 3 plan was supposed to be additive and suppliment the W3C Phase 2 doc. When writing the phase 3 plan I tried to omit details stated in Phase 2. I think this is the source of confusion. I updated the Phase 3 design overview to clearly state our end goals.

@zacharycmontoya
Copy link
Contributor Author

zacharycmontoya commented Jul 25, 2024

  • If the tracestate header does not contain the last Datadog parent id AND the x-datadog-parent-id header is received, the value of the x-datadog-parent-id header must be converted to a 16 character hexidecimal and this value MUST be used to set the _dd.parent_id trace level tag.

I am still confused on the following case:

  1. Configured propagators are tracecontext,Datadog
  2. The result is two contexts with the same trace-id but different span-ids
  3. The context from tracecontext does not have the Last Datadog Parent Id in the tracestate

In this case, the RFC text says that we should use the x-datadog-parent-id value as the Last Datadog Parent Id (meta[_dd.parent_id]). Is this the desired outcome? The reason I ask is because other implementations and the pseudocode do not do this behavior

@mabdinur
Copy link
Contributor

  • If the tracestate header does not contain the last Datadog parent id AND the x-datadog-parent-id header is received, the value of the x-datadog-parent-id header must be converted to a 16 character hexidecimal and this value MUST be used to set the _dd.parent_id trace level tag.

I am still confused on the following case:

  1. Configured propagators are tracecontext,Datadog
  2. The result is two contexts with the same trace-id but different span-ids
  3. The context from tracecontext does not have the Last Datadog Parent Id in the tracestate

In this case, the RFC text says that we should use the x-datadog-parent-id value as the Last Datadog Parent Id (meta[_dd.parent_id]). Is this the desired outcome? The reason I ask is because other implementations and the pseudocode do not do this behavior

We don't need to handle tracecontext,datadog. If the primary/first propagation style is tracecontext then we don't need to do any of the reparenting logic.

In that scenario the span_id and trace_id from the tracecontext headers are used and datadog headers are ignored. I can reaword the spec to make this clear

Copy link
Collaborator

@dmehala dmehala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Once you've replace (*data). for data->, it's good to merge. Thanks for contributing.

zacharycmontoya and others added 2 commits August 2, 2024 11:22
Fixes field accesses from pointers

Co-authored-by: Damien Mehala <damien.mehala@datadoghq.com>
@zacharycmontoya zacharycmontoya merged commit dc5e762 into main Aug 2, 2024
22 checks passed
@zacharycmontoya zacharycmontoya deleted the zach.montoya/w3c-phase3 branch August 2, 2024 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants