Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-7440] Add approximate thread state categorization for timeline #3162

Merged
merged 7 commits into from
Oct 3, 2023

Commits on Sep 27, 2023

  1. Rename cpu_samples field to cpu_or_wall_samples

    This field is set when there is a cpu+wall sample (e.g. we take both
    together), so I decided to rename it internally to make that clearer.
    
    I decided not to change what we emit to pprof.
    
    (This value is not currently shown in the profiling UX)
    ivoanjo committed Sep 27, 2023
    Configuration menu
    Copy the full SHA
    9356692 View commit details
    Browse the repository at this point in the history
  2. Introduce sample_labels struct to carry labels passed to libdatadog

    This is a refactor in preparation for introducing the new `state` label.
    
    The `state` label is different from other labels in that it will be set
    after looking at the thread stack. Thus, I needed a way to get more
    information about labels from the `ThreadContext` collector to the
    `Stack` collector, and thus the `sample_labels` struct was born.
    ivoanjo committed Sep 27, 2023
    Configuration menu
    Copy the full SHA
    4f2ebc7 View commit details
    Browse the repository at this point in the history
  3. [PROF-7440] Add approximate thread state categorization for timeline

    **What does this PR do?**
    
    This PR adds a new feature to the Ruby profiler: thread state
    categorization.
    
    When taking a cpu/wall-time sample, the profiler will peek at the
    top of the stack, and then add a `state` label to the sample that
    categorizes what the thread was doing.
    
    **Motivation:**
    
    By categorizing what threads are doing when they're not on the cpu,
    we can synthesize nice events for timeline, e.g. paint a big
    rectangle when a thread was blocked on some mutex, or was waiting for
    the network.
    
    **Additional Notes:**
    
    Because we're matching only on "name of method at the top of the stack"
    + "is this method a native method" this approach is approximate, and
    we hope to replace it with a more accurate mechanism soon.
    
    For now, we will only expose this information as part of the timeline
    feature.
    
    A few tests needed to be refactored, because they assumed aggregation
    on the samples that they took, and the new label makes samples not
    aggregate some times.
    
    I considered if the tests should be changed, or the feature, but
    concluded that the tests were, in a sense, wrong for assuming
    aggregation, because in the future, we'll have timeline on by default
    and in such a future aggregation never happens.
    
    **How to test the change?**
    
    This change includes code coverage. You can also test it by enabling
    the timeline feature, and observing that the new states show up
    on Ruby timeline profiles (you'll need a frontend feature flag to
    see timeline profiles).
    ivoanjo committed Sep 27, 2023
    Configuration menu
    Copy the full SHA
    393b67d View commit details
    Browse the repository at this point in the history

Commits on Sep 28, 2023

  1. Avoid using Array#sum, it didn't exist on Ruby 2.3

    Sigh old Rubies :(
    ivoanjo committed Sep 28, 2023
    Configuration menu
    Copy the full SHA
    86e8f5d View commit details
    Browse the repository at this point in the history
  2. Use tcp socket instead of pipe for select unit test

    At least on Ruby 2.3 the `select` returned immediately, so let's use
    a tcp socket to get the same effect.
    ivoanjo committed Sep 28, 2023
    Configuration menu
    Copy the full SHA
    c728fa4 View commit details
    Browse the repository at this point in the history
  3. Workaround buggy warning in tracer-2.7 docker image

    Compilation was failing with:
    
    ```
    ../../../../ext/ddtrace_profiling_native_extension/collectors_thread_context.c: In function 'trigger_sample_for_thread':
    ../../../../ext/ddtrace_profiling_native_extension/collectors_thread_context.c:759:5: error: missing initializer for field 'num' of 'ddog_prof_Label' [-Werror=missing-field-initializers]
      759 |     };
          |     ^
    In file included from /usr/local/bundle/gems/libdatadog-4.0.0.1.0-x86_64-linux/vendor/libdatadog-4.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/pkgconfig/../../include/datadog/profiling.h:12,
                     from ../../../../ext/ddtrace_profiling_native_extension/collectors_stack.h:3,
                     from ../../../../ext/ddtrace_profiling_native_extension/collectors_thread_context.c:5:
    /usr/local/bundle/gems/libdatadog-4.0.0.1.0-x86_64-linux/vendor/libdatadog-4.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/pkgconfig/../../include/datadog/common.h:369:11: note: 'num' declared here
      369 |   int64_t num;
          |           ^~~
    cc1: all warnings being treated as errors
    ```
    
    which doesn't make a lot of sense since the same pattern works for
    other label initializations and doesn't generate a warning/error.
    
    But oh well, it's easier to just make gcc happy.
    ivoanjo committed Sep 28, 2023
    Configuration menu
    Copy the full SHA
    21aa188 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f508e71 View commit details
    Browse the repository at this point in the history