You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 18, 2021. It is now read-only.
Figuring out which serviceNames to enable with this "degraded load balancing" strategy is going to be involved.
One strategy is to:
look at top10 workers in terms of average CPU
flamegraph them to see if choosePeer() dominates the CPU
identify which serviceNames dominate that worker.
turn on "dumb load balancing" for those service names
repeat
In theory, there should only be ~10ish service names that have both: "high QPS" and "high number of peers" which causes choosePeer() to dominate the flamegraph.
From a flame graph I've observed that some workers / services are really struggling with peer selection
If we implement a random peer selection strategy and add a flipr where we can change the peer selection strategy per serviceName
We already have boolean logic to enabled / disable peer heap per serviceName.
A round robin peer selection will reduce CPU utilization and slightly degrade load balancing by increasing variance.
If round robin is involved we can also just implement random peer selection.
The text was updated successfully, but these errors were encountered: