Reduce test flakiness for telemetry request spec #3962

p-datadog · 2024-09-30T17:26:28Z

What does this PR do?

This PR inserts a brief wait (up to 0.25 seconds) for telemetry request spec to avoid the code under test and the test code happening in different whole seconds, which would then fail a test assertion.

Motivation:

Repair the following observed CI failure:


  1) Datadog::Core::Telemetry::Request.build_payload is expected to eq {:api_version=>"v2", :application=>{:env=>"env", :language_name=>"ruby", :language_version=>"2.5.9", ...e_id=>"40935bf2-7e96-4797-af75-b687b45ad719", :seq_id=>#<Double "seq_id">, :tracer_time=>1727716350}
     Failure/Error:
       is_expected.to eq(
         api_version: api_version,
         application: application,
         debug: debug,
         host: host,
         payload: payload,
         request_type: request_type,
         runtime_id: runtime_id,
         seq_id: seq_id,
         tracer_time: tracer_time,

       expected: {:api_version=>"v2", :application=>{:env=>"env", :language_name=>"ruby", :language_version=>"2.5.9", ...e_id=>"40935bf2-7e96-4797-af75-b687b45ad719", :seq_id=>#<Double "seq_id">, :tracer_time=>1727716350}
            got: {:api_version=>"v2", :application=>{:env=>"env", :language_name=>"ruby", :language_version=>"2.5.9", ...e_id=>"40935bf2-7e96-4797-af75-b687b45ad719", :seq_id=>#<Double "seq_id">, :tracer_time=>1727716349}

       (compared using ==)

       Diff:
       @@ -6,5 +6,5 @@
        :request_type => #<Double "request_type">,
        :runtime_id => "40935bf2-7e96-4797-af75-b687b45ad719",
        :seq_id => #<Double "seq_id">,
       -:tracer_time => 1727716350,
       +:tracer_time => 1727716349,
     # ./spec/datadog/core/telemetry/request_spec.rb:65:in `block (3 levels) in <top (required)>'
     # ./spec/spec_helper.rb:231:in `block (2 levels) in <top (required)>'
     # ./spec/spec_helper.rb:116:in `block (2 levels) in <top (required)>'

Additional Notes:

How to test the change?

On my machine the failure can be reliably reproduced by changing the sleep into

    sleep 0 until Time.now.usec > 995700

The required time is the approximate time of the second test's execution subtracted from 1,000,000.

Unsure? Have a question? Request a review!

codecov-commenter · 2024-09-30T17:39:42Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.86%. Comparing base (74363cc) to head (8427d6f).
Report is 2 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #3962   +/-   ##
=======================================
  Coverage   97.85%   97.86%           
=======================================
  Files        1305     1305           
  Lines       78224    78222    -2     
  Branches     3887     3887           
=======================================
+ Hits        76549    76553    +4     
+ Misses       1675     1669    -6

Flag	Coverage Δ
	`97.86% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pr-commenter · 2024-09-30T18:02:21Z

Benchmarks

Benchmark execution time: 2024-09-30 18:02:10

Comparing candidate commit 8427d6f in PR branch reduce-test-flakiness with baseline commit 69ac3b8 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 23 metrics, 2 unstable metrics.

ivoanjo · 2024-10-01T07:47:08Z

spec/datadog/core/telemetry/request_spec.rb

+  before do
+    # The tests here assert on time equality after rounding to a second;
+    # these assertions will fail if the code executes just before
+    # a second boundary and the assertions run just after the second boundary.
+    # Align the runs closer to the beginning of a second to reduce flakiness.
+    sleep 0 until Time.now.usec < 750000
+  end


In the profiler tests I have a few similar situations where I don't know exactly the expected timestamp.

My suggestion here, rather than the sleep/loop (which doesn't exactly guarantee the timings we need -- just makes it more likely to happen) would be to do something like:

let!(:before_time) { Time.now.to_i } # ... it do is_expected.to eq( api_version: api_version, application: application, debug: debug, host: host, payload: payload, request_type: request_type, runtime_id: runtime_id, seq_id: seq_id, tracer_time: be_between(before_time, Time.now.to_i), ) end

This way we validate the timestamp is correct without the loop or any flakiness.

I implemented this in #3966.

Use range match for tracer time instead of rounding and requiring exact match to rounded time, which does not always work (see #3962).

* Remove flakiness from telemetry request spec Use range match for tracer time instead of rounding and requiring exact match to rounded time, which does not always work (see #3962). * use match instead of eq --------- Co-authored-by: Oleg Pudeyev <code@olegp.name>

Reduce test flakiness for telemetry request spec

8427d6f

p-datadog requested a review from a team as a code owner September 30, 2024 17:26

ivoanjo reviewed Oct 1, 2024

View reviewed changes

p-datadog pushed a commit that referenced this pull request Oct 1, 2024

Remove flakiness from telemetry request spec

4951037

Use range match for tracer time instead of rounding and requiring exact match to rounded time, which does not always work (see #3962).

p-datadog mentioned this pull request Oct 1, 2024

Remove flakiness from telemetry request spec #3966

Merged

p-datadog closed this Oct 1, 2024

p-datadog deleted the reduce-test-flakiness branch October 1, 2024 15:38

p-datadog pushed a commit that referenced this pull request Oct 1, 2024

Remove flakiness from telemetry request spec

d8e8be0

Use range match for tracer time instead of rounding and requiring exact match to rounded time, which does not always work (see #3962).

p-datadog pushed a commit that referenced this pull request Oct 1, 2024

Remove flakiness from telemetry request spec

9e5a629

Use range match for tracer time instead of rounding and requiring exact match to rounded time, which does not always work (see #3962).

p-datadog pushed a commit that referenced this pull request Oct 1, 2024

Remove flakiness from telemetry request spec

8234ac0

Use range match for tracer time instead of rounding and requiring exact match to rounded time, which does not always work (see #3962).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce test flakiness for telemetry request spec #3962

Reduce test flakiness for telemetry request spec #3962

p-datadog commented Sep 30, 2024 •

edited

Loading

codecov-commenter commented Sep 30, 2024

pr-commenter bot commented Sep 30, 2024

ivoanjo Oct 1, 2024

p-datadog Oct 1, 2024

Reduce test flakiness for telemetry request spec #3962

Reduce test flakiness for telemetry request spec #3962

Conversation

p-datadog commented Sep 30, 2024 • edited Loading

codecov-commenter commented Sep 30, 2024

Codecov Report

pr-commenter bot commented Sep 30, 2024

Benchmarks

ivoanjo Oct 1, 2024

Choose a reason for hiding this comment

p-datadog Oct 1, 2024

Choose a reason for hiding this comment

p-datadog commented Sep 30, 2024 •

edited

Loading