[feat][kubectl-plugin] add `scale` command #2926

davidxia · 2025-02-05T21:39:51Z

to scale a RayCluster's worker group.

closes #110

Example Usage

$ kubectl ray scale cluster -h                                                                                                                (base)
Scale a Ray cluster's worker group.

Usage:
  ray scale cluster (WORKERGROUP) (-c/--ray-cluster RAYCLUSTER) (-r/--replicas N) [flags]

Examples:
  # Scale a Ray cluster's worker group to 3 replicas
  kubectl ray scale cluster my-workergroup --ray-cluster my-raycluster --replicas 3

$ kubectl ray scale default-group --ray-cluster NONEXISTENT --replicas 0
Error: failed to scale worker group default-group in Ray cluster NONEXISTENT in namespace default: rayclusters.ray.io "NONEXISTENT" not found

$ kubectl ray scale DEADBEEF --ray-cluster dxia-test --replicas 1
Error: worker group DEADBEEF not found in Ray cluster dxia-test in namespace default. Available worker groups: default-group, another-group, yet-another-group

$ kubectl ray scale default-group --ray-cluster dxia-test --replicas 3
Scaled worker group default-group in Ray cluster dxia-test in namespace default from 0 to 3 replicas

$ kubectl ray scale default-group --ray-cluster dxia-test --replicas 1
Scaled worker group default-group in Ray cluster dxia-test in namespace default from 3 to 1 replicas

$ kubectl ray scale default-group --ray-cluster dxia-test --replicas -1
Error: must specify -r/--replicas with a non-negative integer

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

MortalHappiness · 2025-02-06T00:31:55Z

@davidxia let me know when this is ready for review.

davidxia · 2025-02-06T02:46:25Z

ready for review 🙏

MortalHappiness · 2025-02-06T03:33:40Z

kubectl-plugin/pkg/cmd/scale/scale.go

+	cmdFactory := cmdutil.NewFactory(options.configFlags)
+
+	cmd := &cobra.Command{
+		Use:               "scale [WORKERGROUP] [-c/--raycluster CLUSTERNAME] [-r/--replicas N]",


I think kubectl scale cluster (CLUSTERNAME) (WORKERGROUP) [flags] would be better.

That is, create a cluster sub-command under scale, and make both the cluster name and workergroup name required arguments.

Also, note that required arguments should be wrapped with () and optional parameters with [] to be consistent with kubectl. You can check kubectl get --help or kubectl ray session --help for details.

cc @kevin85421 @andrewsykim What do you think?

I like those semantics more. Are there other resources besides RayCluster that can be scaled?

I updated to be a sub-command kubectl ray scale cluster, but I kept the -c/--ray-cluster flag because

Prefer flags to args. It’s a bit more typing, but it makes it much clearer what is going on. It also makes it easier to make changes to how you accept input in the future. Sometimes when using args, it’s impossible to add new input without breaking existing behavior or creating ambiguity.

If you’ve got two or more arguments for different things, you’re probably doing something wrong.

— https://clig.dev/

Lmk if this works.

Then how about also make worker group as a new flag like --worker-group?

Hm, I kind of like how the current command follows the VERB NOUN NAME semantics of the other commands like kubectl ray create cluster NAME or kubectl ray get workergroups NAME. I think the help message makes it apparent the first positional arg has to be the worker group name.

But not a strong opinion. Happy to change especially if others like that more too.

kubectl-plugin/pkg/cmd/scale/scale.go

We plan to add a [command to scale a worker group][1] in an existing RayCluster, e.g. `kubectl ray scale cluster (CLUSTER_NAME) (WORKER_GROUP)`. This requires the user to know the group name. There's currently no way to get the worker group names in a RayCluster other than getting the resource with kubectl and looking for or parsing out the names. A command to get worker groups details for a cluster might be helpful. [1]: ray-project#2926 Signed-off-by: David Xia <david@davidxia.com>

We plan to add a [command to scale a worker group][1] in an existing RayCluster, e.g. `kubectl ray scale cluster (CLUSTER_NAME) (WORKER_GROUP)`. This requires the user to know the group name. There's currently no way to get the worker group names in a RayCluster other than getting the resource with kubectl and looking for or parsing out the names. A command to get worker groups details for a cluster might be helpful. ## Example Usage ```console $ kubectl ray get workergroups -A NAMESPACE NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER default default-group 1/1 2 0 0 4Gi dxia-test explorer-one gpuWorker 1/1 36 4 0 256Gi yzhao foundation-models cpuWorker 1/1 20 0 0 200Gi jacquelinew-v3 foundation-models gpuWorker 1/1 200 8 0 1000Gi jacquelinew-v3 foundation-models redis 1/1 13 0 0 112Gi jacquelinew-v3 $ kubectl ray get workergroups -n default NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER default-group 1/1 2 0 0 4Gi dxia-test $ kubectl ray get workergroups -n foundation-models -c jacquelinew-v3 NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER cpuWorker 1/1 20 0 0 200Gi jacquelinew-v3 gpuWorker 1/1 200 8 0 1000Gi jacquelinew-v3 redis 1/1 13 0 0 112Gi jacquelinew-v3 $ kubectl ray get workergroups gpuWorker -A NAMESPACE NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER explorer-one gpuWorker 1/1 36 4 0 256Gi yzhao foundation-models gpuWorker 1/1 200 8 0 1000Gi jacquelinew-v3 ``` [1]: ray-project#2926 Signed-off-by: David Xia <david@davidxia.com>

We plan to add a [command to scale a worker group][1] in an existing RayCluster, e.g. `kubectl ray scale cluster (CLUSTER_NAME) (WORKER_GROUP)`. This requires the user to know the group name. There's currently no way to get the worker group names in a RayCluster other than getting the resource with kubectl and looking for or parsing out the names. A command to get worker groups details for a cluster might be helpful. ## Example Usage ```console $ kubectl ray get workergroups -A NAMESPACE NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER default default-group 1/1 2 0 0 4Gi dxia-test explorer-one gpuWorker 1/1 36 4 0 256Gi yzhao foundation-models cpuWorker 1/1 20 0 0 200Gi jacquelinew-v3 foundation-models gpuWorker 1/1 200 8 0 1000Gi jacquelinew-v3 foundation-models redis 1/1 13 0 0 112Gi jacquelinew-v3 $ kubectl ray get workergroups -n default NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER default-group 1/1 2 0 0 4Gi dxia-test $ kubectl ray get workergroups -n foundation-models -c jacquelinew-v3 NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER cpuWorker 1/1 20 0 0 200Gi jacquelinew-v3 gpuWorker 1/1 200 8 0 1000Gi jacquelinew-v3 redis 1/1 13 0 0 112Gi jacquelinew-v3 $ kubectl ray get workergroups gpuWorker -A NAMESPACE NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER explorer-one gpuWorker 1/1 36 4 0 256Gi yzhao foundation-models gpuWorker 1/1 200 8 0 1000Gi jacquelinew-v3 ``` [1]: #2926 Signed-off-by: David Xia <david@davidxia.com>

davidxia · 2025-02-14T14:45:54Z

@MortalHappiness ready for another review. Do you think we need to address #110 (comment)?

I noticed that if I scale down a group from, for example, 3 to 1, sometimes all worker Pods are terminated and a new one created. Is this expected controller behavior?

MortalHappiness · 2025-02-17T06:21:23Z

Do you think we need to address #110 (comment)?

cc @kevin85421 @andrewsykim WDYT?

I noticed that if I scale down a group from, for example, 3 to 1, sometimes all worker Pods are terminated and a new one created. Is this expected controller behavior?

I don't think this is the expected behavior.

davidxia · 2025-02-17T13:59:34Z

I don't think this is the expected behavior.

The behavior is unrelated to this PR since it's the controller. But I'm curious if others can repro. I can repro it just with kubectl edit raycluster, scaling a worker group with 3 replicas down to 1.

to scale a RayCluster's worker group. closes ray-project#110 ## Example Usage ```console $ kubectl ray scale cluster -h (base) Scale a Ray cluster's worker group. Usage: ray scale cluster (WORKERGROUP) (-c/--ray-cluster RAYCLUSTER) (-r/--replicas N) [flags] Examples: # Scale a Ray cluster's worker group to 3 replicas kubectl ray scale cluster my-workergroup --ray-cluster my-raycluster --replicas 3 $ kubectl ray scale default-group --ray-cluster NONEXISTENT --replicas 0 Error: failed to scale worker group default-group in Ray cluster NONEXISTENT in namespace default: rayclusters.ray.io "NONEXISTENT" not found $ kubectl ray scale DEADBEEF --ray-cluster dxia-test --replicas 1 Error: worker group DEADBEEF not found in Ray cluster dxia-test in namespace default. Available worker groups: default-group, another-group, yet-another-group $ kubectl ray scale default-group --ray-cluster dxia-test --replicas 3 Scaled worker group default-group in Ray cluster dxia-test in namespace default from 0 to 3 replicas $ kubectl ray scale default-group --ray-cluster dxia-test --replicas 1 Scaled worker group default-group in Ray cluster dxia-test in namespace default from 3 to 1 replicas $ kubectl ray scale default-group --ray-cluster dxia-test --replicas -1 Error: must specify -r/--replicas with a non-negative integer ``` Signed-off-by: David Xia <david@davidxia.com>

kevin85421 assigned MortalHappiness Feb 5, 2025

davidxia force-pushed the scale branch 2 times, most recently from e779b60 to aa8e35f Compare February 6, 2025 01:22

davidxia marked this pull request as ready for review February 6, 2025 02:46

MortalHappiness reviewed Feb 6, 2025

View reviewed changes

davidxia mentioned this pull request Feb 10, 2025

[feat][kubectl-plugin] add get workergroup cmd #2996

Merged

4 tasks

davidxia marked this pull request as draft February 13, 2025 22:13

davidxia force-pushed the scale branch 2 times, most recently from f966a7c to 413dba6 Compare February 14, 2025 14:43

davidxia marked this pull request as ready for review February 14, 2025 14:44

davidxia requested a review from MortalHappiness February 14, 2025 14:44

davidxia force-pushed the scale branch from 413dba6 to a356189 Compare February 17, 2025 14:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat][kubectl-plugin] add `scale` command #2926

[feat][kubectl-plugin] add `scale` command #2926

davidxia commented Feb 5, 2025 •

edited

Loading

MortalHappiness commented Feb 6, 2025

davidxia commented Feb 6, 2025

MortalHappiness Feb 6, 2025

davidxia Feb 6, 2025

davidxia Feb 14, 2025

MortalHappiness Feb 17, 2025

davidxia Feb 17, 2025 •

edited

Loading

davidxia commented Feb 14, 2025

MortalHappiness commented Feb 17, 2025

davidxia commented Feb 17, 2025

[feat][kubectl-plugin] add scale command #2926

Are you sure you want to change the base?

[feat][kubectl-plugin] add scale command #2926

Conversation

davidxia commented Feb 5, 2025 • edited Loading

Example Usage

Checks

MortalHappiness commented Feb 6, 2025

davidxia commented Feb 6, 2025

MortalHappiness Feb 6, 2025

Choose a reason for hiding this comment

davidxia Feb 6, 2025

Choose a reason for hiding this comment

davidxia Feb 14, 2025

Choose a reason for hiding this comment

MortalHappiness Feb 17, 2025

Choose a reason for hiding this comment

davidxia Feb 17, 2025 • edited Loading

Choose a reason for hiding this comment

davidxia commented Feb 14, 2025

MortalHappiness commented Feb 17, 2025

davidxia commented Feb 17, 2025

[feat][kubectl-plugin] add `scale` command #2926

[feat][kubectl-plugin] add `scale` command #2926

davidxia commented Feb 5, 2025 •

edited

Loading

davidxia Feb 17, 2025 •

edited

Loading