DEBUG-2334 dynamic instrumentation probe notification builder #4011

p-datadog · 2024-10-17T16:31:31Z

Change log entry

None, DI is not customer-visible at this time.

What does this PR do?

This PR adds a component to create status and snapshot payloads for probes.

Motivation:

Initial DI implementation in Ruby.

Additional Notes:

Most of the test coverage is in integration tests for DI overall. This PR has minimal test coverage.

Correctness of generated payloads is meant to ultimately be verified by system tests.

How to test the change?

Unsure? Have a question? Request a review!

lib/datadog/di/probe_notification_builder.rb

codecov-commenter · 2024-10-17T17:08:59Z

Codecov Report

Attention: Patch coverage is 96.13260% with 7 lines in your changes missing coverage. Please review.

Project coverage is 97.84%. Comparing base (321f513) to head (0080237).
Report is 5 commits behind head on master.

Files with missing lines	Patch %	Lines
lib/datadog/di/probe_notification_builder.rb	92.10%	6 Missing ⚠️
.../di/integration/probe_notification_builder_spec.rb	98.07%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4011      +/-   ##
==========================================
- Coverage   97.86%   97.84%   -0.02%     
==========================================
  Files        1321     1324       +3     
  Lines       79330    79511     +181     
  Branches     3934     3956      +22     
==========================================
+ Hits        77635    77797     +162     
- Misses       1695     1714      +19

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pr-commenter · 2024-10-17T17:23:53Z

Benchmarks

Benchmark execution time: 2024-10-25 18:02:49

Comparing candidate commit 0080237 in PR branch di-probe-notification-builder with baseline commit 321f513 in branch master.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 23 metrics, 2 unstable metrics.

scenario:profiler - sample timeline=false

🟩 throughput [+0.525op/s; +0.537op/s] or [+8.413%; +8.599%]

spec/datadog/di/integration/probe_notification_builder_spec.rb

This component creates status and snapshot payloads for probes.

ivoanjo · 2024-10-18T03:25:22Z

lib/datadog/di/probe_notification_builder.rb

+      def thread_id
+        thread = Thread.current
+        if thread.respond_to?(:native_thread_id)
+          # Ruby 3.1+
+          thread.native_thread_id


Be careful with using native_thread_id here -- in Ruby 3.3+ ruby threads can move across OS threads (see M:N announcement). This behavior change is off by default in the main ractor but that may change...

I don't use it myself for anything and I made a note to inquire with my team what it is supposed to be.

I removed thread_id sending because currently DI does not need it.

ivoanjo · 2024-10-18T03:27:02Z

lib/datadog/di/probe_notification_builder.rb

+          "dd.trace_id": 136035165280417366521542318182735500431,
+          "dd.span_id": 17576285113343575026,


These look like placeholders -- should there be a FIXME or something here?

Good catch, I forgot about these completely - they probably need to be filled out somehow.

But, I can do that in a separate PR.

ivoanjo · 2024-10-22T14:51:20Z

lib/datadog/di/probe_notification_builder.rb

+        callers.map do |caller|
+          if caller =~ /\A([^:]+):(\d+):in `([^']+)'\z/
+            {
+              fileName: $1, function: $3, lineNumber: Integer($2),
+            }
+          else


It's possible to avoid having to parse stack trace strings -- Ruby has two sets of APIs for stacktraces (at the Ruby level...): You can use backtrace and backtrace_locations (or caller/caller_locations). The _locations version of it gives you an object that you can query for the individual parts, no need for parsing strings to split it back out ;)

Done, thank you for the pointer.

ivoanjo

Did a quick pass and left a few notes :)

ivoanjo · 2024-10-24T09:05:09Z

lib/datadog/di/probe_notification_builder.rb

+            # Normally caller_locations should always be filled for a line probe
+            # but in the test suite we don't always provide all arguments.
+            actual_file_basename = File.basename(probe.file)
+            caller_locations&.detect do |loc|
+              # TODO record actual path that probe was installed into,
+              # perform exact match here against that path.
+              File.basename(loc.path) == actual_file_basename
+            end&.path || probe.file


This seems like a slightly expensive thing to do for just the tests 🤔. Or can there be situations in user code where this gets triggered?

We need the path to the file in the payload, but the path should be determined during instrumentation as per the comment just above. I've hacked path matching and processing by doing basename comparisons but this isn't sufficient, for v1 I will have the actual path stored in the probe after instrumentation and nothing should be calculated here.

ivoanjo · 2024-10-24T09:07:48Z

lib/datadog/di/probe_notification_builder.rb

+          duration: duration ? (duration * 10**9).to_i : nil,
+          host: nil,
+          logger: {
+            name: probe.file,
+            method: probe.method_name || 'no_method',
+            thread_name: Thread.current.name,
+            thread_id: thread_id,
+            version: 2,
+          },
+          # TODO add tests that the trace/span id is correctly propagated
+          "dd.trace_id": Datadog::Tracing.active_trace&.id,
+          "dd.span_id": Datadog::Tracing.active_span&.id,
+          ddsource: 'dd_debugger',
+          message: probe.template && evaluate_template(probe.template,
+            duration: duration ? duration * 1000 : nil),


Minor: It may be worth encoding the units of duration somewhere in the variable names (duration_ms?) or using an explicit type, since units are getting mixed and converted a bit and it seems easy to make a mistake...

The convention I use is when units are not specified, time is measured in seconds. In this particular case, DI specifies that duration in the template should be measured in milliseconds, but the variable is called duration. Thus I am not sure how to add a suffix here.

Even if the key itself in the data is duration, maybe our variable can be duration_s or duration_secs?

I added a comment stating the units of duration for the method parameter. I personally have the convention that time units without an explicitly stated unit are in seconds, which I believe is how everything in Ruby standard library works, and to me adding a suffix for seconds is redundant. But if you want a suffix let's go with whichever one you prefer.

I totally agree with @ivoanjo regarding naming with postfix values with different scale. Comments are nice, but code is better. Proper name will prevent from storing a wrong value scale in it, comment - not.

As mentioned in #4028 I would like to leave the time units alone until there is team consensus on how they should be indicated and all of the other pending DI code is merged.

ivoanjo · 2024-10-24T09:10:39Z

lib/datadog/di/probe_notification_builder.rb

+        message = template.dup
+        vars.each do |key, value|
+          message.gsub!("{@#{key}}", value.to_s)
+        end


Minor: Not sure if worth it, but occurred to me that we could turn the template into an input for sprintf or String#% and thus we'd be able to template all vars at one.

For the message, there is actually an AST that is supposed to be evaluated which contains complex expressions. This evaluation will not be implemented for initial DI implementation. I have the above substitution as a "quick demo" of variable evaluation, but this entire implementation is temporary.

I made a note to revisit this part of the code but I think there won't be a large gain realized here with % because it'll still require some parsing or massaging and a straightforward string replacement should be very fast (and users should only have a couple of variables at most in their log message, normally).

ivoanjo · 2024-10-24T09:12:02Z

sig/datadog/di/probe_notification_builder.rbs

+
+      def build_executed: (Probe probe, ?trace_point: untyped?, ?rv: untyped?, ?duration: untyped?, ?callers: untyped?, ?args: untyped?, ?kwargs: untyped?, ?serialized_entry_args: untyped?) -> untyped
+
+      def build_snapshot: (Probe probe, ?rv: untyped?, ?snapshot: untyped?, ?duration: untyped?, ?callers: untyped?, ?args: untyped?, ?kwargs: untyped?, ?serialized_entry_args: untyped?) -> { service: untyped, :"debugger.snapshot" => { id: untyped, timestamp: untyped, evaluationErrors: ::Array[untyped], probe: { id: untyped, version: 0, location: untyped }, language: "ruby", stack: untyped, captures: untyped }, duration: untyped, host: nil, logger: { name: untyped, method: untyped, thread_name: untyped, thread_id: untyped, version: 2 }, :"dd.trace_id" => 136035165280417366521542318182735500431, :"dd.span_id" => 17576285113343575026, ddsource: "dd_debugger", message: untyped, timestamp: untyped }


Minor: This return looks quite hard to maintain as-is. Perhaps worth either generalizing (into just Hash) or introducing a class for this?

* master: DEBUG-2334 Probe Notifier Worker component (DataDog#4028) DEBUG-2334 dynamic instrumentation probe notification builder (DataDog#4011) Handle low-level libddwaf exception in Context [NO-TICKET] Minor: Fix typos in safe_dup_spec.rb Remove libdatadog musl Remove ffi's after installation Remove cached gems

p-datadog requested a review from a team as a code owner October 17, 2024 16:31

datadog-datadog-prod-us1 bot reviewed Oct 17, 2024

View reviewed changes

lib/datadog/di/probe_notification_builder.rb Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Oct 17, 2024

View reviewed changes

lib/datadog/di/probe_notification_builder.rb Fixed Show fixed Hide fixed

p-datadog force-pushed the di-probe-notification-builder branch from 3d5055d to 6f847de Compare October 17, 2024 16:47

p-datadog force-pushed the di-probe-notification-builder branch from 6f847de to 1173c5f Compare October 17, 2024 17:13

p-datadog force-pushed the di-probe-notification-builder branch from 1173c5f to 659bec1 Compare October 17, 2024 17:44

datadog-datadog-prod-us1 bot reviewed Oct 17, 2024

View reviewed changes

spec/datadog/di/integration/probe_notification_builder_spec.rb Show resolved Hide resolved

p-datadog force-pushed the di-probe-notification-builder branch 2 times, most recently from 7e9ced8 to 382ad93 Compare October 17, 2024 18:06

DEBUG-2334 dynamic instrumentation probe notification builder

b33be73

This component creates status and snapshot payloads for probes.

p-datadog force-pushed the di-probe-notification-builder branch from 382ad93 to b33be73 Compare October 17, 2024 18:08

annotate test with di_test

dd2d8f1

ivoanjo reviewed Oct 18, 2024

View reviewed changes

p added 2 commits October 18, 2024 11:30

take trace_id & span_id from active span

3090e16

fix types

5828dc6

p-datadog requested a review from a team as a code owner October 18, 2024 16:14

p-datadog added the dev/internal Other internal work that does not need to be included in the changelog label Oct 18, 2024

ivoanjo reviewed Oct 22, 2024

View reviewed changes

use caller_locations instead of caller, stop parsing stack traces

67bbb7e

ivoanjo reviewed Oct 24, 2024

View reviewed changes

p and others added 6 commits October 24, 2024 17:51

stop sending thread id

8df14ff

note duration is in seconds

a4342c5

types

9637f2b

Merge branch 'master' into di-probe-notification-builder

3cff768

add missing spec helper require

872d09d

Merge branch 'master' into di-probe-notification-builder

0080237

Strech approved these changes Oct 28, 2024

View reviewed changes

p-datadog merged commit 5ed1f0f into master Oct 29, 2024
270 checks passed

p-datadog deleted the di-probe-notification-builder branch October 29, 2024 12:52

github-actions bot added this to the 2.5.0 milestone Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEBUG-2334 dynamic instrumentation probe notification builder #4011

DEBUG-2334 dynamic instrumentation probe notification builder #4011

p-datadog commented Oct 17, 2024 •

edited

Loading

codecov-commenter commented Oct 17, 2024 •

edited

Loading

pr-commenter bot commented Oct 17, 2024 •

edited

Loading

ivoanjo Oct 18, 2024

p-datadog Oct 18, 2024

p-datadog Oct 24, 2024

ivoanjo Oct 18, 2024

p-datadog Oct 18, 2024

p-datadog Oct 18, 2024

p-datadog Oct 24, 2024

ivoanjo Oct 22, 2024

p-datadog Oct 24, 2024

ivoanjo left a comment

ivoanjo Oct 24, 2024

p-datadog Oct 24, 2024

ivoanjo Oct 24, 2024

p-datadog Oct 24, 2024

ivoanjo Oct 24, 2024

p-datadog Oct 24, 2024

Strech Oct 25, 2024

p-datadog Oct 25, 2024

ivoanjo Oct 24, 2024

p-datadog Oct 24, 2024

ivoanjo Oct 24, 2024

p-datadog Oct 24, 2024

		"dd.trace_id": 136035165280417366521542318182735500431,
		"dd.span_id": 17576285113343575026,


		def build_executed: (Probe probe, ?trace_point: untyped?, ?rv: untyped?, ?duration: untyped?, ?callers: untyped?, ?args: untyped?, ?kwargs: untyped?, ?serialized_entry_args: untyped?) -> untyped

		def build_snapshot: (Probe probe, ?rv: untyped?, ?snapshot: untyped?, ?duration: untyped?, ?callers: untyped?, ?args: untyped?, ?kwargs: untyped?, ?serialized_entry_args: untyped?) -> { service: untyped, :"debugger.snapshot" => { id: untyped, timestamp: untyped, evaluationErrors: ::Array[untyped], probe: { id: untyped, version: 0, location: untyped }, language: "ruby", stack: untyped, captures: untyped }, duration: untyped, host: nil, logger: { name: untyped, method: untyped, thread_name: untyped, thread_id: untyped, version: 2 }, :"dd.trace_id" => 136035165280417366521542318182735500431, :"dd.span_id" => 17576285113343575026, ddsource: "dd_debugger", message: untyped, timestamp: untyped }

DEBUG-2334 dynamic instrumentation probe notification builder #4011

DEBUG-2334 dynamic instrumentation probe notification builder #4011

Conversation

p-datadog commented Oct 17, 2024 • edited Loading

codecov-commenter commented Oct 17, 2024 • edited Loading

Codecov Report

pr-commenter bot commented Oct 17, 2024 • edited Loading

Benchmarks

scenario:profiler - sample timeline=false

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivoanjo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p-datadog commented Oct 17, 2024 •

edited

Loading

codecov-commenter commented Oct 17, 2024 •

edited

Loading

pr-commenter bot commented Oct 17, 2024 •

edited

Loading