-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat][kubectl-plugin] add scale
command
#2926
base: master
Are you sure you want to change the base?
Conversation
@davidxia let me know when this is ready for review. |
e779b60
to
aa8e35f
Compare
ready for review 🙏 |
cmdFactory := cmdutil.NewFactory(options.configFlags) | ||
|
||
cmd := &cobra.Command{ | ||
Use: "scale [WORKERGROUP] [-c/--raycluster CLUSTERNAME] [-r/--replicas N]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think kubectl scale cluster (CLUSTERNAME) (WORKERGROUP) [flags]
would be better.
That is, create a cluster
sub-command under scale
, and make both the cluster name and workergroup name required arguments.
Also, note that required arguments should be wrapped with ()
and optional parameters with []
to be consistent with kubectl
. You can check kubectl get --help
or kubectl ray session --help
for details.
cc @kevin85421 @andrewsykim What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like those semantics more. Are there other resources besides RayCluster that can be scaled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated to be a sub-command kubectl ray scale cluster
, but I kept the -c/--ray-cluster
flag because
Prefer flags to args. It’s a bit more typing, but it makes it much clearer what is going on. It also makes it easier to make changes to how you accept input in the future. Sometimes when using args, it’s impossible to add new input without breaking existing behavior or creating ambiguity.
If you’ve got two or more arguments for different things, you’re probably doing something wrong.
Lmk if this works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then how about also make worker group as a new flag like --worker-group
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I kind of like how the current command follows the VERB NOUN NAME
semantics of the other commands like kubectl ray create cluster NAME
or kubectl ray get workergroups NAME
. I think the help message makes it apparent the first positional arg has to be the worker group name.
But not a strong opinion. Happy to change especially if others like that more too.
We plan to add a [command to scale a worker group][1] in an existing RayCluster, e.g. `kubectl ray scale cluster (CLUSTER_NAME) (WORKER_GROUP)`. This requires the user to know the group name. There's currently no way to get the worker group names in a RayCluster other than getting the resource with kubectl and looking for or parsing out the names. A command to get worker groups details for a cluster might be helpful. [1]: ray-project#2926 Signed-off-by: David Xia <david@davidxia.com>
We plan to add a [command to scale a worker group][1] in an existing RayCluster, e.g. `kubectl ray scale cluster (CLUSTER_NAME) (WORKER_GROUP)`. This requires the user to know the group name. There's currently no way to get the worker group names in a RayCluster other than getting the resource with kubectl and looking for or parsing out the names. A command to get worker groups details for a cluster might be helpful. [1]: ray-project#2926 Signed-off-by: David Xia <david@davidxia.com>
We plan to add a [command to scale a worker group][1] in an existing RayCluster, e.g. `kubectl ray scale cluster (CLUSTER_NAME) (WORKER_GROUP)`. This requires the user to know the group name. There's currently no way to get the worker group names in a RayCluster other than getting the resource with kubectl and looking for or parsing out the names. A command to get worker groups details for a cluster might be helpful. [1]: ray-project#2926 Signed-off-by: David Xia <david@davidxia.com>
We plan to add a [command to scale a worker group][1] in an existing RayCluster, e.g. `kubectl ray scale cluster (CLUSTER_NAME) (WORKER_GROUP)`. This requires the user to know the group name. There's currently no way to get the worker group names in a RayCluster other than getting the resource with kubectl and looking for or parsing out the names. A command to get worker groups details for a cluster might be helpful. [1]: ray-project#2926 Signed-off-by: David Xia <david@davidxia.com>
We plan to add a [command to scale a worker group][1] in an existing RayCluster, e.g. `kubectl ray scale cluster (CLUSTER_NAME) (WORKER_GROUP)`. This requires the user to know the group name. There's currently no way to get the worker group names in a RayCluster other than getting the resource with kubectl and looking for or parsing out the names. A command to get worker groups details for a cluster might be helpful. [1]: ray-project#2926 Signed-off-by: David Xia <david@davidxia.com>
We plan to add a [command to scale a worker group][1] in an existing RayCluster, e.g. `kubectl ray scale cluster (CLUSTER_NAME) (WORKER_GROUP)`. This requires the user to know the group name. There's currently no way to get the worker group names in a RayCluster other than getting the resource with kubectl and looking for or parsing out the names. A command to get worker groups details for a cluster might be helpful. [1]: ray-project#2926 Signed-off-by: David Xia <david@davidxia.com>
We plan to add a [command to scale a worker group][1] in an existing RayCluster, e.g. `kubectl ray scale cluster (CLUSTER_NAME) (WORKER_GROUP)`. This requires the user to know the group name. There's currently no way to get the worker group names in a RayCluster other than getting the resource with kubectl and looking for or parsing out the names. A command to get worker groups details for a cluster might be helpful. [1]: ray-project#2926 Signed-off-by: David Xia <david@davidxia.com>
We plan to add a [command to scale a worker group][1] in an existing RayCluster, e.g. `kubectl ray scale cluster (CLUSTER_NAME) (WORKER_GROUP)`. This requires the user to know the group name. There's currently no way to get the worker group names in a RayCluster other than getting the resource with kubectl and looking for or parsing out the names. A command to get worker groups details for a cluster might be helpful. [1]: ray-project#2926 Signed-off-by: David Xia <david@davidxia.com>
We plan to add a [command to scale a worker group][1] in an existing RayCluster, e.g. `kubectl ray scale cluster (CLUSTER_NAME) (WORKER_GROUP)`. This requires the user to know the group name. There's currently no way to get the worker group names in a RayCluster other than getting the resource with kubectl and looking for or parsing out the names. A command to get worker groups details for a cluster might be helpful. ## Example Usage ```console $ kubectl ray get workergroups -A NAMESPACE NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER default default-group 1/1 2 0 0 4Gi dxia-test explorer-one gpuWorker 1/1 36 4 0 256Gi yzhao foundation-models cpuWorker 1/1 20 0 0 200Gi jacquelinew-v3 foundation-models gpuWorker 1/1 200 8 0 1000Gi jacquelinew-v3 foundation-models redis 1/1 13 0 0 112Gi jacquelinew-v3 $ kubectl ray get workergroups -n default NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER default-group 1/1 2 0 0 4Gi dxia-test $ kubectl ray get workergroups -n foundation-models -c jacquelinew-v3 NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER cpuWorker 1/1 20 0 0 200Gi jacquelinew-v3 gpuWorker 1/1 200 8 0 1000Gi jacquelinew-v3 redis 1/1 13 0 0 112Gi jacquelinew-v3 $ kubectl ray get workergroups gpuWorker -A NAMESPACE NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER explorer-one gpuWorker 1/1 36 4 0 256Gi yzhao foundation-models gpuWorker 1/1 200 8 0 1000Gi jacquelinew-v3 ``` [1]: ray-project#2926 Signed-off-by: David Xia <david@davidxia.com>
We plan to add a [command to scale a worker group][1] in an existing RayCluster, e.g. `kubectl ray scale cluster (CLUSTER_NAME) (WORKER_GROUP)`. This requires the user to know the group name. There's currently no way to get the worker group names in a RayCluster other than getting the resource with kubectl and looking for or parsing out the names. A command to get worker groups details for a cluster might be helpful. ## Example Usage ```console $ kubectl ray get workergroups -A NAMESPACE NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER default default-group 1/1 2 0 0 4Gi dxia-test explorer-one gpuWorker 1/1 36 4 0 256Gi yzhao foundation-models cpuWorker 1/1 20 0 0 200Gi jacquelinew-v3 foundation-models gpuWorker 1/1 200 8 0 1000Gi jacquelinew-v3 foundation-models redis 1/1 13 0 0 112Gi jacquelinew-v3 $ kubectl ray get workergroups -n default NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER default-group 1/1 2 0 0 4Gi dxia-test $ kubectl ray get workergroups -n foundation-models -c jacquelinew-v3 NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER cpuWorker 1/1 20 0 0 200Gi jacquelinew-v3 gpuWorker 1/1 200 8 0 1000Gi jacquelinew-v3 redis 1/1 13 0 0 112Gi jacquelinew-v3 $ kubectl ray get workergroups gpuWorker -A NAMESPACE NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER explorer-one gpuWorker 1/1 36 4 0 256Gi yzhao foundation-models gpuWorker 1/1 200 8 0 1000Gi jacquelinew-v3 ``` [1]: ray-project#2926 Signed-off-by: David Xia <david@davidxia.com>
We plan to add a [command to scale a worker group][1] in an existing RayCluster, e.g. `kubectl ray scale cluster (CLUSTER_NAME) (WORKER_GROUP)`. This requires the user to know the group name. There's currently no way to get the worker group names in a RayCluster other than getting the resource with kubectl and looking for or parsing out the names. A command to get worker groups details for a cluster might be helpful. ## Example Usage ```console $ kubectl ray get workergroups -A NAMESPACE NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER default default-group 1/1 2 0 0 4Gi dxia-test explorer-one gpuWorker 1/1 36 4 0 256Gi yzhao foundation-models cpuWorker 1/1 20 0 0 200Gi jacquelinew-v3 foundation-models gpuWorker 1/1 200 8 0 1000Gi jacquelinew-v3 foundation-models redis 1/1 13 0 0 112Gi jacquelinew-v3 $ kubectl ray get workergroups -n default NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER default-group 1/1 2 0 0 4Gi dxia-test $ kubectl ray get workergroups -n foundation-models -c jacquelinew-v3 NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER cpuWorker 1/1 20 0 0 200Gi jacquelinew-v3 gpuWorker 1/1 200 8 0 1000Gi jacquelinew-v3 redis 1/1 13 0 0 112Gi jacquelinew-v3 $ kubectl ray get workergroups gpuWorker -A NAMESPACE NAME REPLICAS CPUS GPUS TPUS MEMORY CLUSTER explorer-one gpuWorker 1/1 36 4 0 256Gi yzhao foundation-models gpuWorker 1/1 200 8 0 1000Gi jacquelinew-v3 ``` [1]: #2926 Signed-off-by: David Xia <david@davidxia.com>
f966a7c
to
413dba6
Compare
@MortalHappiness ready for another review. Do you think we need to address #110 (comment)? I noticed that if I scale down a group from, for example, 3 to 1, sometimes all worker Pods are terminated and a new one created. Is this expected controller behavior? |
cc @kevin85421 @andrewsykim WDYT?
I don't think this is the expected behavior. |
The behavior is unrelated to this PR since it's the controller. But I'm curious if others can repro. I can repro it just with |
to scale a RayCluster's worker group. closes ray-project#110 ## Example Usage ```console $ kubectl ray scale cluster -h (base) Scale a Ray cluster's worker group. Usage: ray scale cluster (WORKERGROUP) (-c/--ray-cluster RAYCLUSTER) (-r/--replicas N) [flags] Examples: # Scale a Ray cluster's worker group to 3 replicas kubectl ray scale cluster my-workergroup --ray-cluster my-raycluster --replicas 3 $ kubectl ray scale default-group --ray-cluster NONEXISTENT --replicas 0 Error: failed to scale worker group default-group in Ray cluster NONEXISTENT in namespace default: rayclusters.ray.io "NONEXISTENT" not found $ kubectl ray scale DEADBEEF --ray-cluster dxia-test --replicas 1 Error: worker group DEADBEEF not found in Ray cluster dxia-test in namespace default. Available worker groups: default-group, another-group, yet-another-group $ kubectl ray scale default-group --ray-cluster dxia-test --replicas 3 Scaled worker group default-group in Ray cluster dxia-test in namespace default from 0 to 3 replicas $ kubectl ray scale default-group --ray-cluster dxia-test --replicas 1 Scaled worker group default-group in Ray cluster dxia-test in namespace default from 3 to 1 replicas $ kubectl ray scale default-group --ray-cluster dxia-test --replicas -1 Error: must specify -r/--replicas with a non-negative integer ``` Signed-off-by: David Xia <david@davidxia.com>
to scale a RayCluster's worker group.
closes #110
Example Usage
Checks