Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for APM Remote Configuration #74

Merged
merged 9 commits into from
Jan 9, 2024
Merged

Conversation

dmehala
Copy link
Collaborator

@dmehala dmehala commented Dec 4, 2023

Description

Remote Configuration allows to remotely configure the library. For now, it is only possible to update the trace sampling rate remotely.

How to test

An API Key with Remote Configuration enabled is required.

  1. Run the hasher example.
  2. Generate some traces so the service can appear in the Service Catalogue.
  3. In Datadog's UI go to the Service Catalogue and select the dd-trace-cpp-example service.
  4. An APM Instrumentation banner should be visible.

output_trimmed_enhanced

@dmehala dmehala requested review from cgilmour and dgoffredo December 4, 2023 09:56
@dgoffredo
Copy link
Contributor

I'm playing with my own branch, making suggestions for this PR.

In looking at TracerId and the related parts of the remote config API, I noticed the following paragraph from our internal "[RFC] Integrating with Remote Config in a Tracer":

The Client.ClientState field in ClientGetConfigsRequest MUST contain the global state of RC in the tracer. There MUST NOT exist separate instances of RC clients in a tracer making separate ClientGetConfigsRequest with their own separated Client.ClientState. We use this state to track individual tracers in the agent and the backend. Having multiple components in the same tracer each call this API with separated state will duplicate a lot of data and cause issues in the attribution of configurations.

What is meant by "separate" here? My guess is that each Client.ClientState is specific to a unique tracer. What then determines the identify of a "tracer"? Is it just client.id? Is it client.id combined with client.client_tracer.*, as you have largely done in defining struct TracerId?

The notion of "tracer ID" is suspect in my mind. How does it all fit together?

@dmehala
Copy link
Collaborator Author

dmehala commented Dec 8, 2023

What is meant by "separate" here? My guess is that each Client.ClientState is specific to a unique tracer. What then determines the identify of a "tracer"? Is it just client.id? Is it client.id combined with client.client_tracer.*, as you have largely done in defining struct TracerId?

I asked RC people, here are there answer:

client.id is used to identify individual tracers
There is also assumptions being made such as "it's impossible to have two different client.id for a single client.runtimeID". Breaking those assumptions will likely break some UIs and potentially confuse systems that distribute configurations to specific tracers

From my understanding it is a combination client.id and client.client_tracer.runtime_id. If I remember they should not be the same but one client.id is associated to one client.client_tracer.runtime_id. Did I missed something?

The notion of "tracer ID" is suspect in my mind. How does it all fit together?

My main objective in incorporating the TracerID data structure was to enhance code simplicity and improve expressiveness.

@dgoffredo
Copy link
Contributor

There is also assumptions being made such as "it's impossible to have two different client.id for a single client.runtimeID".

Currently, each class Tracer creates its own RuntimeID. When envoy or nginx creates worker threads/processes, each gets its own Tracer, and so each gets its own RuntimeID.

@cgilmour and I decided that it would be better if these workers shared the same RuntimeID. They are, after all, part of the same runtime. The first bit of code that would enforce this are the proposed fixes to our envoy code that enable newer versions of dd-trace-cpp. See DatadogTracerFactory::makeConfig in that change set.

I'm concerned that now each worker will have the same "tracer ID," (same runtime, env, service) but that the workers' client.id will all be different. client.id is generated once per Tracer.

What I'm trying to get at, asking these questions, is what is a tracer? Worst case scenario, there is no clear definition.

Here's my opinion: Workers within a proxy are the "same tracer." They're part of the same runtime and share the same configuration. So, I figure each of these workers should appear identical to the remote config backend. Will that be a problem? I think it will be a problem.

Alternatively, workers can share a runtime ID, env, and service, but have different client IDs. What is meant by "client"? What is the relationship between "client" and "tracer"?

Let's iron out these definitions before committing the code. At the very least, it will allow us to write good documentation for the TracerID component.

dgoffredo and others added 2 commits December 11, 2023 10:44
- update fuzz documentation
- revise the base64 decoder
- move ConfigManager into its own .cpp
- remove namespace httputil::header
- more specific error code
- RuntimeID rc_id_  -->  std::string client_id_
- TracerId  -->  TracerID
- remove disambiguating "template" keyword
- document k_apm_capabilities
- missed a spot when removing namespace httputil::header
- no need to optimize target file lookup
- slightly safer product parsing
- Update base64 implementation to not rely on little endian.
- Move ConfigUpdate in its own heade file
@dmehala dmehala force-pushed the dmehala/remote-config-impl branch from 58166f5 to f105c8f Compare December 11, 2023 10:58
@pr-commenter
Copy link

pr-commenter bot commented Dec 11, 2023

Benchmarks

Benchmark execution time: 2024-01-09 15:58:19

Comparing candidate commit 99c1c65 in PR branch dmehala/remote-config-impl with baseline commit cc703b2 in branch main.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 0 metrics, 0 unstable metrics.

scenario:BM_TraceTinyCCSource

  • 🟩 execution_time [-12.009ms; -11.938ms] or [-14.495%; -14.409%]

Copy link
Contributor

@dgoffredo dgoffredo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Please see my suggestions in #83.

Once we've discussed that, I'll mark this approved.

@dmehala dmehala requested a review from dgoffredo January 9, 2024 11:17
@dmehala
Copy link
Collaborator Author

dmehala commented Jan 9, 2024

@dgoffredo I added your changes in 8a9d59b and 3444245

Copy link
Contributor

@dgoffredo dgoffredo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! This feature came out well.

Once this is merged, I'll see whether it breaks Envoy's unit tests. We can keep the two concerns decoupled.

src/datadog/tracer_signature.h Show resolved Hide resolved
@dmehala dmehala merged commit 17098c1 into main Jan 9, 2024
19 checks passed
@dmehala dmehala deleted the dmehala/remote-config-impl branch January 9, 2024 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants