-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit number of Elastic Agents running per cluster profile #247
Comments
+1. There should be a means to constrain the maximum amount of concurrent elastic-agent pods running. In its present state, it can destabilize a cluster even with resource limits set. |
A use case for this is when agent profiles are used to run jobs on different node types In our case we have a number of different node pools with different autoscaling settings. Elastic agent profiles target these different node pools, so elastic agent It would be useful to limit the number of pods that can be launched for a given elastic profile, so excess jobs instead queue waiting to launch an agent, rather than having an abundance of pending pods which can trigger alerting |
Has anyone tried putting the elastic agents in their own namespace and imposing a e.g
Perhaps the plugin should have this, but Kubernetes does have a lot of ways to limit resource consumption baked into its scheduler which you generally have control over via the elastic profile. # pods can be a bit of a blunt instrument. While my above suggestion doesn't solve the "multiple node pools" problem unless you split these into different namespaces, I do separately wonder whether it really helps to move a queue from Kubernetes-land (pending pods) back internally to GoCD (where there is likely more limited alerting capacity), rather than tune K8S alerting to something appropriate for the usage? Besides, I thought "maximum pending pods" is supposed to limit that... I wonder what it is doing in that case if it's neither.... |
@chadlwilson I did try this and If I remember correctly....the pipelines just failed when the quota was reached. |
@wojtek-viirtue interesting, OK, thanks! |
I suppose for us there's a separation of concerns between queued tasks and infrastructure problems, and both of those manifesting as pending pods blurs the lines a bit. On the GoCD side we want to set an upper limit for the overall resources that a particular set of jobs can use, and we can achieve this by having an agent profile with an affinity to a specific node pool, and setting upper autoscaling limits on that node pool. If that size for example was 5 nodes that can accommodate 3 agents each, that gives 15 of these jobs that can be running in parallel. On the infrastructure side pods stuck in a pending state for a prolonged period of time is usually indicative of a problem with the cluster or something similar itself; it could be an autoscaler problem, or something simple such as taints/tolerations being incorrectly changed. In the first example we don't really need alerts, it's expected behaviour that the pipelines have a concurrency limit, and beyond that they should queue. In the second example, we do want alerts, as it's indicating a problem that needs addressing. If instead the plugin exposed an option of max pods for a given agent profile, we could clearly alert on pending pods; since we would never expect there to be pods stuck in a pending state for a long period of time unless there is actual problems. |
I'm don't sure if this is issue or future requests but in Elastic Cluster profile we have "Maximum pending pods" to set but its somehow misleading and we don't have little more documentation about it.
Any way, I think it will be very useful if we have additional field to limit number of possible running Agents in one cluster profile if his Agent profile is requested to run.
Very often K8S resources are limited and we can not overload cluster with 50 running agents if we don't have space for it.
My question is do we have something similar already?
If not can this be created and for default value set unlimited but we can set it and then when pipeline run is scheduled it will look for number of already running agents and if this limit is set and reached it will stay in schedule waiting until its free to run it.
Similar behavior when we have static agents set in place.
Thanks for answer
The text was updated successfully, but these errors were encountered: