Skip to content

Conversation

@machichima
Copy link
Contributor

@machichima machichima commented Jan 7, 2026

Description

Currently ray attach only allows opening an SSH session on the head node. It could be useful to allow attaching to worker nodes to check what state the execution environment and file system are in (e.g. running conda list, examining config files such as ~/.keras/keras.json).

Related issues

Closes #7064

Additional information

This PR add --node-ip args to ray attach to specify the node IP to attach to. Usage: ray attach cluster.yaml --node-ip <node ip>. Default to head node if the --node-ip is not provided.

Add unit test and tested on GCP (see #59931 (comment))

Signed-off-by: machichima <nary12321@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a useful --ip option to ray attach, allowing users to connect to a specific node in the cluster by its IP address. The implementation is solid, and the changes are well-contained. I've provided a couple of suggestions to enhance code quality and efficiency by reusing an existing object and tidying up some redundant code. Overall, this is a great addition.

Signed-off-by: machichima <nary12321@gmail.com>
…rker-node

Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
@machichima
Copy link
Contributor Author

machichima commented Jan 11, 2026

Test on GCP with following simple cluster setup:

#cluster-gcp.yaml
cluster_name: my-gcp-cluster

max_workers: 1

provider:
  type: gcp
  region: YOUR_REGION  # e.g., us-central1, asia-east1
  availability_zone: YOUR_ZONE  # e.g., us-central1-a
  project_id: YOUR_GCP_PROJECT_ID

available_node_types:
  ray-head-default:
    resources: {}
    node_config:
      machineType: e2-medium
      disks:
        - boot: true
          autoDelete: true
          type: PERSISTENT
          initializeParams:
            diskSizeGb: 20
            # Using Ubuntu 22.04 LTS
            sourceImage: projects/ubuntu-os-cloud/global/images/family/ubuntu-2204-lts
      # Ensure VM gets an external IP
      networkInterfaces:
        - subnetwork: regions/YOUR_REGION/subnetworks/default
          accessConfigs:
            - name: External NAT
              type: ONE_TO_ONE_NAT
      # Add SSH public key to VM metadata
      metadata:
        items:
          - key: ssh-keys
            value: "ubuntu:YOUR_SSH_PUBLIC_KEY_HERE"

  ray-worker-default:
    min_workers: 1
    max_workers: 1
    resources: {}
    node_config:
      machineType: e2-medium
      disks:
        - boot: true
          autoDelete: true
          type: PERSISTENT
          initializeParams:
            diskSizeGb: 20
            sourceImage: projects/ubuntu-os-cloud/global/images/family/ubuntu-2204-lts
      networkInterfaces:
        - subnetwork: regions/YOUR_REGION/subnetworks/default
          accessConfigs:
            - name: External NAT
              type: ONE_TO_ONE_NAT
      # Add SSH public key to VM metadata
      metadata:
        items:
          - key: ssh-keys
            value: "ubuntu:YOUR_SSH_PUBLIC_KEY_HERE"

head_node_type: ray-head-default

auth:
  ssh_user: ubuntu
  ssh_private_key: ~/.ssh/google_compute_engine

file_mounts: {}
cluster_synced_files: []
file_mounts_sync_continuously: false
rsync_exclude: []
rsync_filter: []

# Setup commands - install Ray from source
setup_commands:
  # Install git and build tools
  - sudo apt-get update && sudo apt-get install -y git python3-pip python3-venv
  # Install ray
  - sudo pip3 install -U pip
  - sudo pip3 install -U "ray[default]"

head_start_ray_commands:
  - ray stop
  - ray start --head --port=6379 --ray-client-server-port=10001 --system-config='{"enable_ray_event":true}' --dashboard-host=0.0.0.0 --autoscaling-config=~/ray_bootstrap_config.yaml

worker_start_ray_commands:
  - ray stop
  - ray start --address=$RAY_HEAD_IP:6379
  1. ray up cluster-gcp.yaml to start head and worker nodes on GCP
  2. Get worker ip by ray get-worker-ips cluster-gcp.yaml
  3. ray attach cluster-gcp.yaml --node-ip WORKER_IP to attach to worker node
image

Signed-off-by: machichima <nary12321@gmail.com>
@machichima machichima marked this pull request as ready for review January 11, 2026 05:28
@machichima machichima requested a review from a team as a code owner January 11, 2026 05:29
Signed-off-by: machichima <nary12321@gmail.com>
@machichima machichima force-pushed the attach-to-worker-node branch from b90cd98 to a5bf5a5 Compare January 11, 2026 07:04
@ray-gardener ray-gardener bot added core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Jan 11, 2026
Signed-off-by: machichima <nary12321@gmail.com>
@machichima
Copy link
Contributor Author

@edoakes PTAL. Thank you!

@edoakes edoakes added the go add ONLY when ready to merge, run all tests label Jan 13, 2026
Copy link
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roughly looks good to me. nit: let's call the arg --node-ip to make it a little more clear what it is specifying

…rker-node

Signed-off-by: machichima <nary12321@gmail.com>
@machichima
Copy link
Contributor Author

machichima commented Jan 14, 2026

Roughly looks good to me. nit: let's call the arg --node-ip to make it a little more clear what it is specifying

Thank you! Fixed in 475940f. Also rename variables to node_ip

Signed-off-by: machichima <nary12321@gmail.com>
@machichima machichima force-pushed the attach-to-worker-node branch from 3f77818 to 475940f Compare January 14, 2026 15:24
Signed-off-by: machichima <nary12321@gmail.com>
@edoakes
Copy link
Collaborator

edoakes commented Jan 15, 2026

test_cli is failing: https://buildkite.com/ray-project/premerge/builds/57711#019bc20c-cf6b-4639-9cad-58b70f24ba7b/L3934

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Signed-off-by: machichima <nary12321@gmail.com>
@machichima machichima requested a review from edoakes January 20, 2026 02:02
Copy link
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@edoakes edoakes merged commit 6efc009 into ray-project:master Jan 20, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[autoscaler] Attach to a worker node

2 participants