ring: Fix pathological case when an entire zone leaves#672
Merged
56quarters merged 1 commit intomainfrom Mar 27, 2025
Merged
Conversation
Contributor
Author
|
Comparison against a commit from early January which predates my changes in #632: |
pstibrany
reviewed
Mar 26, 2025
Contributor
pstibrany
left a comment
There was a problem hiding this comment.
I haven't seen ring code in a while, and originally I thought I've found a bug in the PR, but staring at it for some more time, I'm wrong and the change seems legit and makes sense to me.
This change improves performance in a the case where an entire zone is not ACTIVE and the replication set is meant to be extended. Previously, when an entire zone was unavailable, the ring kept searching for instances by looking at every single token trying to find an instance in the required zone that was ACTIVE. This meant thousands of iterations to find a host that would never work. This change keeps track of the number of hosts that we have examined in each zone. It returns early once we have either found the hosts in each zone we need _OR_ we have examined all hosts in the zone and so know that we won't find one. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
67a03bf to
16780f3
Compare
julienduchesne
approved these changes
Mar 26, 2025
Contributor
julienduchesne
left a comment
There was a problem hiding this comment.
It makes sense to me, not super familiar with ring code though
56quarters
added a commit
to grafana/mimir
that referenced
this pull request
Mar 27, 2025
Specifically, this pulls in the following dskit PRs: * grafana/dskit#672 * grafana/dskit#669 * grafana/dskit#668 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
to grafana/mimir
that referenced
this pull request
Mar 27, 2025
Update to the latest dskit commmit to pull in grafana/dskit#672 which improves performance of ring operations in clients (queriers, distributors) when zone-awareness is enabled and an entire zone is not "ACTIVE". Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
4 tasks
56quarters
added a commit
to grafana/mimir
that referenced
this pull request
Mar 27, 2025
Update to the latest dskit commit to pull in grafana/dskit#672 which improves performance of ring operations in clients (queriers, distributors) when zone-awareness is enabled and an entire zone is not "ACTIVE". Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
to grafana/mimir
that referenced
this pull request
Mar 27, 2025
Update to the latest dskit commit to pull in grafana/dskit#672 which improves performance of ring operations in clients (queriers, distributors) when zone-awareness is enabled and an entire zone is not "ACTIVE". Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
to grafana/mimir
that referenced
this pull request
Mar 27, 2025
* chore: update to latest dskit for ring performance fix Update to the latest dskit commit to pull in grafana/dskit#672 which improves performance of ring operations in clients (queriers, distributors) when zone-awareness is enabled and an entire zone is not "ACTIVE". Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Lint Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> --------- Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
pracucci
added a commit
to grafana/mimir
that referenced
this pull request
Jan 27, 2026
…AVING state (#14157) #### What this PR does The `Ring.Get()` is extremely inefficient when there are are large number of instances in a state for which the Ring operation requests to extend the replication set. This is a known issue, that was attempted to be optimised in grafana/dskit#672 but unfortunately the CPU impact on queriers is still huge. I'm finalizing an optimization for the dskit's `Ring.Get()`, but – even with my upcoming optimization – there's still a significant penalty when the replication set is extended for a large number of instances. In this PR I'm taking a different approach: I'm questioning whether we really need to extend the replication set for store-gateways in `LEAVING` state. My take is that we don't need it (but we still need it for store-gateways in `JOINING` state). I'm the author of the original code (cortexproject/cortex#4263) and the original code was actually intended to cover the `JOINING` state (for which I still agree that we need to extend the replication set). In this PR I simply propose to not extend the replica set for `LEAVING` instances. In the code comment you find the full reasoning (that I would like to leave there as a future reference). The benchmark for the `LEAVING` state is quite impactful: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/querier cpu: Apple M3 Pro │ before.txt │ after.txt │ │ sec/op │ sec/op vs base │ BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE-11 9.924µ ± 11% 8.853µ ± 4% -10.80% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING-11 23.91m ± 3% 24.72m ± 13% +3.39% (p=0.026 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING-11 24007.353µ ± 2% 8.855µ ± 1% -99.96% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE_with_dynamic_replication-11 43.62µ ± 2% 43.14µ ± 2% ~ (p=0.093 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING_with_dynamic_replication-11 24.50m ± 3% 24.68m ± 11% ~ (p=0.699 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING_with_dynamic_replication-11 24033.15µ ± 2% 44.34µ ± 1% -99.82% (p=0.002 n=6) geomean 2.296m 212.3µ -90.75% │ before.txt │ after.txt │ │ B/op │ B/op vs base │ BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE-11 15.291Ki ± 0% 7.314Ki ± 0% -52.17% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING-11 1.583Mi ± 1% 1.231Mi ± 4% -22.26% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING-11 1618.160Ki ± 2% 7.315Ki ± 0% -99.55% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE_with_dynamic_replication-11 49.48Ki ± 0% 51.28Ki ± 0% +3.63% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING_with_dynamic_replication-11 1.674Mi ± 0% 1.228Mi ± 3% -26.64% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING_with_dynamic_replication-11 1714.05Ki ± 0% 53.16Ki ± 0% -96.90% (p=0.002 n=6) geomean 424.3Ki 78.34Ki -81.53% │ before.txt │ after.txt │ │ allocs/op │ allocs/op vs base │ BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE-11 81.00 ± 0% 55.00 ± 0% -32.10% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING-11 503.0 ± 6% 470.0 ± 13% ~ (p=0.117 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING-11 499.00 ± 7% 55.00 ± 0% -88.98% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE_with_dynamic_replication-11 141.0 ± 0% 115.0 ± 0% -18.44% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING_with_dynamic_replication-11 533.0 ± 2% 485.0 ± 9% -9.01% (p=0.006 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING_with_dynamic_replication-11 533.0 ± 1% 135.0 ± 0% -74.67% (p=0.002 n=6) geomean 305.6 148.5 -51.42% ``` #### Which issue(s) this PR fixes or relates to N/A #### Checklist - [x] Tests updated. - [ ] Documentation added. - [x] `CHANGELOG.md` updated - the order of entries should be `[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry is not needed, please add the `changelog-not-needed` label to the PR. - [ ] [`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md) updated with experimental features. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Improves querier performance when many store-gateways are not ACTIVE. > > - Changes `storegateway.BlocksRead` to extend the replica set only for `JOINING` instances (no longer for `LEAVING`), with clarified in-code rationale > - Optimizes `querier.GetClientsFor` by reusing ring buffers via `ring.WithBuffers(ring.MakeBuffersForGet())` to reduce allocations/CPU > - Adds safety test ensuring buffer reuse doesn’t corrupt instance metadata and a comprehensive benchmark covering ACTIVE/JOINING/LEAVING and dynamic replication scenarios > - Updates CHANGELOG with the enhancement entry > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 43ff03e. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Taylor C <41653732+tacole02@users.noreply.github.com> Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
pracucci
added a commit
that referenced
this pull request
Jan 28, 2026
…tate for which the extended replication set is requested (#884) **What this PR does**: The `Ring.Get()` is very inefficient when a large number of instances are in a state for which the extended replication set is requested. This is not a new issue, and there was a previous optimization attempt in #672 by @56quarters. Unfortunately the issue is not fixed yet and we can see it when the magnitude of instances increases. In this PR I propose an optimization based on the fact that looking up small maps is slower than just searching for corresponding items in slices (maps become fasters when the number of unique items increases, but for few items slices are just faster). The result of this optimization is not huge, but I think it's still a step forward. It's still very slow when we have a large number of instances in a state for which the extended replication set is requested, compared to the case all instances are ACTIVE. ``` goos: darwin goarch: arm64 pkg: github.com/grafana/dskit/ring cpu: Apple M3 Pro │ new-before.txt │ new-after.txt │ │ sec/op │ sec/op vs base │ Ring_Get/with_zone_awareness-11 1008.1n ± 19% 553.1n ± 8% -45.13% (p=0.002 n=6) Ring_Get/one_excluded_zone-11 624.6n ± 1% 393.0n ± 12% -37.08% (p=0.002 n=6) Ring_Get/without_zone_awareness-11 347.1n ± 4% 334.5n ± 1% -3.62% (p=0.002 n=6) Ring_Get/without_zone_awareness,_not_enough_instances-11 268.5n ± 1% 256.2n ± 1% -4.56% (p=0.002 n=6) Ring_Get_OneZoneLeaving/one_zone_leaving_=_true-11 3739.3µ ± 5% 903.0µ ± 2% -75.85% (p=0.002 n=6) Ring_Get_OneZoneLeaving/one_zone_leaving_=_false-11 1.905µ ± 6% 1.169µ ± 2% -38.62% (p=0.002 n=6) geomean 2.734µ 1.643µ -39.91% │ new-before.txt │ new-after.txt │ │ B/op │ B/op vs base │ Ring_Get/with_zone_awareness-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ Ring_Get/one_excluded_zone-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ Ring_Get/without_zone_awareness-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ Ring_Get/without_zone_awareness,_not_enough_instances-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ Ring_Get_OneZoneLeaving/one_zone_leaving_=_true-11 109.6Ki ± 0% 124.3Ki ± 0% +13.45% (p=0.002 n=6) Ring_Get_OneZoneLeaving/one_zone_leaving_=_false-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ geomean ² +2.13% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ new-before.txt │ new-after.txt │ │ allocs/op │ allocs/op vs base │ Ring_Get/with_zone_awareness-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ Ring_Get/one_excluded_zone-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ Ring_Get/without_zone_awareness-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ Ring_Get/without_zone_awareness,_not_enough_instances-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ Ring_Get_OneZoneLeaving/one_zone_leaving_=_true-11 19.00 ± 0% 27.00 ± 0% +42.11% (p=0.002 n=6) Ring_Get_OneZoneLeaving/one_zone_leaving_=_false-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ geomean ² +6.03% ² ¹ all samples are equal ² summaries must be >0 to compute geomean ``` **Which issue(s) this PR fixes**: N/A **Checklist** - [x] Tests updated <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Improves performance of `Ring.Get()` when many instances extend the replica set, especially under zone-awareness. > > - Replaces `distinctHosts` slice/map usage with a hybrid `stringSet` (`util.go`) to reduce lookups/allocations; adds comprehensive tests > - Switches per-zone tracking from maps to slices indexed by zone index (`totalHostsPerZone`, `examinedHostsPerZone`, `foundHostsPerZone`); updates `canStopLooking()` signature accordingly > - Adds explicit `zoneIndex` resolution with validation and panics on misuse if zone-awareness disabled > - Adjusts main selection loop to use `stringSet.len()/contains()/add()` and new stop condition > - Expands and parameterizes `BenchmarkRing_Get_OneZoneLeaving` (larger `instances`, more zones, and both with/without one zone leaving) with deterministic assertions > > Net effect: substantial reductions in `Ring.Get()` time in pathological scenarios; no API changes. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 9f1c505. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Signed-off-by: Marco Pracucci <marco@pracucci.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does:
This change improves performance in a the case where an entire zone is not ACTIVE and the replication set is meant to be extended. Previously, when an entire zone was unavailable, the ring kept searching for instances by looking at every single token trying to find an instance in the required zone that was ACTIVE. This meant thousands of iterations to find a host that would never work.
This change keeps track of the number of hosts that we have examined in each zone. It returns early once we have either found the hosts in each zone we need OR we have examined all hosts in the zone and so know that we won't find one.
Which issue(s) this PR fixes:
N/A
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]