-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: tablet throttler multi-metrics #15624
Comments
Observability: we should be able to track why a certain client was throttled, ie which specific metric it was throttled on. |
|
As mentioned above, we want to be able to change the list of considered metrics while an Online DDL operation is running (as an example). So that, for example, we want Online DDL to start throttling based on lag and based on load average, or then later on for it to stop throttling based on load average and remain just with lag. IMO the way to do that is to associate metrics with an app name. All Online DDL operations use the app name "online-ddl". So the way would be to associate That association will then either
|
metrics can be collected from the single tablet being probed, or from the collective shard.
To that effect:
Moreover, consider the discussion in previous comment re: associating metrics with apps. It will be even further possible to fine grain the checks by associating
|
|
|
Required additions to
|
Eventually (
|
$ vtctldclient UpdateThrottlerConfig --app-name "all" --app-metrics "lag,loadavg" commerce |
Addressed by #15988 |
Base branch PR for changes: |
Beyond #15988:
|
More beyond #15988:
|
Reopening as there is a bit of followup. |
@timvaillancourt circling back to connection pool usage, how do you choose reasonable values?? Do you only throttle when the pool is completely exhausted (ie |
@shlomi-noach our plan in |
Today, table throttler uses a single metric by which to throttle. This metric is dynamically configurable, but is just the one. The default metric is replication lag, and can be modified based on any query that returns a scalar value, e.g. to return
Threads_running
.We want the throttler to measure multiple metrics at once, and we want to be able to throttle based on a selective list of metrics. Such metrics could be:
Threads_running
To that effect, we want:
self
multiple metrics on (on their own host or their designated MySQL server)PRIMARY
tablet to always collect all available metrics from replica tabletIntroducing multi-metrics dimension explodes the complexity of the throttler code. However, we are thankfully also able to reduce the complexity by getting rid of dimensions that we don't really use or need, and which were inherited from
freno
:Clusters: today we use
self
andshard
, butself
isn't really a cluster, and the code largely handles it different thanshard
. We can therefore remove the "cluster" or "store" dimension.Store types: we only use
MySQL
, We can remove the dimension.Probe settings: we always probe by tablet, and the probe layer is mostly redundant.
Other.
We will need to be backwards compatible: multi-metric
PRIMARY
should work withv19
replicas, and vice versa.This will cause a major rewrite, with some temporary redundancy code to support backwards compatibility. Hopefully we can simplify some existing complexities inherited from
freno
, or technical debt we've accumulated since.Unit tests and endtoend tests will remain (and expand) to protect us against incompatibilities.
The text was updated successfully, but these errors were encountered: