chore(ray): add metadata and entrypoint to ray job root span #14715

imran-at-datadog · 2025-09-26T06:19:47Z

Description

MLOB-3969 Add user metadata to root span tags
MLOB-3980 Tag the entry point on the root span

Testing

pip install .
export DD_SERVICE="imran-ray-metadata-test-014"
RAY_LOGGING_CONFIG_ENCODING=JSON DD_ENV=dev ray start --head --dashboard-host=127.0.0.1 --tracing-startup-hook=ddtrace.contrib.ray:setup_tracing
ray job submit --metadata-json='{"job_name": "train_my_model", "test":"1"}' --submission-id="imran-ray-metadata-test-014" -- python /Users/imran.hendley/go/src/github.com/DataDog/dd-trace-py/tests/contrib/ray/jobs/simple_task.py arg1 --arg2=value2

And running again with DD_RAY_REDACT_ENTRYPOINT_PATHS set to false results in an unredacted path in the entrypoint:

Risks

None

Additional Notes

Do we need to support recreating these tags in RaySpanManager._recreate_job_span?

ANSWER: No, the span with tags is copied already.

This reverts commit 48731a9.

Right now [the DJM intake expects Ray spans](https://github.com/DataDog/logs-backend/blob/79793e12095e033e3998ff6318416c5db0507907/domains/apm/apps/apm-processing/src/main/java/com/dd/logs/processing/processors/track/spans/JobSpansProcessor.java#L28) to have span type `producer` or `consumer`. It used to be `ray.producer` or `ray.consumer`, but after discussing last week we agreed to remove the `ray.` prefix to more closely match the spans produced by Ray's OpenTelemetry instrumentation. Our Ray integration [currently produces spans of three types](https://dd.datad0g.com/internal/events-ui/queries?group_by=type&index_name=djm-search&query_string=%40component%3Aray&query_type=aggregate&timerange=1755708134662-1756312934662l&track=trace): `serving`, `worker`, and `ml`. In this PR I am making it replace `serving` with `producer`, and `worker` and `ml` with `consumer` for now, just so the DJM intake recognizes that it needs to pick them up. For testing, I [opened this file in my local dd-source](https://github.com/DataDog/dd-source/blob/d67d0dd42507de7ab369761afa1b15e4652bed20/domains/data_science/apps/ray-cluster/image/aip-practice/aip-tracing/Dockerfile#L17) and replaced `dubloom/ray-integration` with `yakov.shapiro/MLOB-3768/update-span-type`, the name of this branch. I then followed [the steps from this comment on MLOB-3676](https://datadoghq.atlassian.net/browse/MLOB-3676?focusedCommentId=2568529). I verified that the type on the resulting spans [is now set to ray](https://dd.datad0g.com/internal/events-ui/queries?group_by=job_name&index_name=djm-search&query_string=%40component%3Aray&query_type=list&timerange=1756404208851-1756418608851&track=trace). ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

## Overview The change allows to capture host name which in conjunction with process ID will provide GPU utilization information. ## Checklist - [ ] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [ ] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

dubloom

small nits but we are almost good to go.

ddtrace/contrib/internal/ray/__init__.py

ddtrace/contrib/internal/ray/utils.py

dubloom

LGTM, thanks for addressing all my comments !

dubloom and others added 30 commits August 25, 2025 18:51

feat(ray): add task integrations and basic actors support

229d337

chore(ray): remove actor support

48731a9

chore(ray): formatting

f0225e0

Revert "chore(ray): remove actor support"

3594ca7

This reverts commit 48731a9.

feat(ray): add actor support

80e253b

feat(ray): refactoring and better support of context

f05a0b2

fix(ray): empty submission ID

285a48d

chore(ray): add ray tags and small refactoring

730bddb

tests(ray): add tests

faa18d4

Merge branch 'main' into dubloom/ray-v0

684bf70

Merge branch 'main' into dubloom/ray-v0

4977e41

fix(ray): better handling of aiohttp and grpc filtering

09a908e

feat(ray): enable ray with hook and fix missing spans

74cddba

fix: ci

78e1f6e

fix(ray): inject tracing at task submission instead of init

931598f

prepare PR

3fdc71c

Prepare PR: 2

0a9bfac

Merge branch 'main' into dubloom/ray-v0

1ca02d3

feat(ray): add submission id to root span

92838d6

fix(ray): delete tracer.py and add hostname in the new TraceProcessor

e207a54

feat(ray): add support for long running job/task/actor_method

ef6db1e

chore(ray): lint

ea4e761

chore: update riot-requirements

767ecf8

feat(ray): improve tags and name

68c3d41

Merge branch 'main' into dubloom/ray-v0

14f177f

finalize PR

8288ac6

Merge branch 'main' into dubloom/ray-v0

d28a112

fix(ray): fix tests

6b942e1

dubloom and others added 6 commits October 1, 2025 18:56

Merge branch 'main' into dubloom/ray-v0

d81a626

chore: change into forksafe lock

35f3817

redact paths and other cleanup for pr feedback

a046f03

Merge branch 'dubloom/ray-v0' into imran-hendley/ray-root-span-metadata

15e1fdf

fix merge

b48fd0e

add env config var docs

0585dd3

Base automatically changed from dubloom/ray-v0 to main October 2, 2025 07:29

dubloom requested review from a team as code owners October 2, 2025 07:29

dubloom requested review from brettlangdon, dubloom and juanjux October 2, 2025 07:29

Merge branch 'main' into imran-hendley/ray-root-span-metadata

d44468f

dubloom reviewed Oct 2, 2025

View reviewed changes

imran-at-datadog added 7 commits October 2, 2025 09:45

Merge branch 'main' into imran-hendley/ray-root-span-metadata

922d723

pr feedback cleanup

4fba5c4

merge and fix conflicts

3f30222

fix flatten_metadata_dict docstring

ac5fe3a

style

65f7274

Merge branch 'main' into imran-hendley/ray-root-span-metadata

3f902b3

Merge branch 'main' into imran-hendley/ray-root-span-metadata

f5385a5

dubloom approved these changes Oct 3, 2025

View reviewed changes

change doc

cdb67b8

dubloom enabled auto-merge (squash) October 3, 2025 09:27

fixing docs

8f712b4

dubloom merged commit d807ee8 into main Oct 3, 2025
437 checks passed

dubloom deleted the imran-hendley/ray-root-span-metadata branch October 3, 2025 10:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(ray): add metadata and entrypoint to ray job root span #14715

chore(ray): add metadata and entrypoint to ray job root span #14715

Uh oh!

imran-at-datadog commented Sep 26, 2025 •

edited

Loading

Uh oh!

dubloom left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dubloom left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

chore(ray): add metadata and entrypoint to ray job root span #14715

chore(ray): add metadata and entrypoint to ray job root span #14715

Uh oh!

Conversation

imran-at-datadog commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Risks

Additional Notes

Uh oh!

dubloom left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dubloom left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

imran-at-datadog commented Sep 26, 2025 •

edited

Loading