-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge CPU usage spike in protokube after cluster scale up #7427
Comments
I was able to repro this behavior in a test cluster and I have submitted a fix to upstream (weaveworks/mesh#107) -- waiting on feedback there. |
I was able to make a build of protokube with the fix I proposed, but I am continuing to run into issues. The next one I ran into is weaveworks/mesh#108 -- TLDR the gossip protocol is bad (its cost scales seemingly exponential to cluster size). So I'm going to spend a bit more time looking for a mechanism to reduce the cost -- but likely to get past this we'll have to move off of mesh (#7436) |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
This has been taken care of in #7436 |
1. What
kops
version are you running? The commandkops version
, will displaythis information.
which is 1.12.2 with a few backports (done prior to the 1.12.3 release -- is the same)
2. What Kubernetes version are you running?
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.Seemingly not relevant, but just in case:
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
We changed some of the instancegroups to scale up somewhat significantly (added about 60 nodes)
5. What happened after the commands executed?
Protokube's CPU usage spiked to consume ~1.5 cores (it usually sits around 6% of a single core).
A few things to note from the graph:
I've done some digging around upstream and have found a few issues that are maybe related, but don't exactly inspire confidence :/
6. What did you expect to happen?
The text was updated successfully, but these errors were encountered: