ring: Fix pathological case when an entire zone leaves by 56quarters · Pull Request #672 · grafana/dskit

56quarters · 2025-03-25T16:17:43Z

What this PR does:

This change improves performance in a the case where an entire zone is not ACTIVE and the replication set is meant to be extended. Previously, when an entire zone was unavailable, the ring kept searching for instances by looking at every single token trying to find an instance in the required zone that was ACTIVE. This meant thousands of iterations to find a host that would never work.

This change keeps track of the number of hosts that we have examined in each zone. It returns early once we have either found the hosts in each zone we need OR we have examined all hosts in the zone and so know that we won't find one.

Which issue(s) this PR fixes:

N/A

Checklist

Tests updated
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

56quarters · 2025-03-25T16:21:41Z

Comparison against a commit from early January which predates my changes in #632:

$ benchstat prev.txt current.txt 
goos: linux
goarch: amd64
pkg: github.com/grafana/dskit/ring
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
                           │   prev.txt    │             current.txt             │
                           │    sec/op     │   sec/op     vs base                │
Ring_Get_OneZoneLeaving-16   8329.3µ ± 13%   114.6µ ± 5%  -98.62% (p=0.000 n=10)

                           │   prev.txt   │           current.txt            │
                           │     B/op     │     B/op      vs base            │
Ring_Get_OneZoneLeaving-16   10.27Ki ± 0%   10.27Ki ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

                           │  prev.txt  │          current.txt           │
                           │ allocs/op  │ allocs/op   vs base            │
Ring_Get_OneZoneLeaving-16   12.00 ± 0%   12.00 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

pstibrany

I haven't seen ring code in a while, and originally I thought I've found a bug in the PR, but staring at it for some more time, I'm wrong and the change seems legit and makes sense to me.

This change improves performance in a the case where an entire zone is not ACTIVE and the replication set is meant to be extended. Previously, when an entire zone was unavailable, the ring kept searching for instances by looking at every single token trying to find an instance in the required zone that was ACTIVE. This meant thousands of iterations to find a host that would never work. This change keeps track of the number of hosts that we have examined in each zone. It returns early once we have either found the hosts in each zone we need _OR_ we have examined all hosts in the zone and so know that we won't find one. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

julienduchesne

It makes sense to me, not super familiar with ring code though

Specifically, this pulls in the following dskit PRs: * grafana/dskit#672 * grafana/dskit#669 * grafana/dskit#668 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

Update to the latest dskit commmit to pull in grafana/dskit#672 which improves performance of ring operations in clients (queriers, distributors) when zone-awareness is enabled and an entire zone is not "ACTIVE". Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

Update to the latest dskit commit to pull in grafana/dskit#672 which improves performance of ring operations in clients (queriers, distributors) when zone-awareness is enabled and an entire zone is not "ACTIVE". Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

* chore: update to latest dskit for ring performance fix Update to the latest dskit commit to pull in grafana/dskit#672 which improves performance of ring operations in clients (queriers, distributors) when zone-awareness is enabled and an entire zone is not "ACTIVE". Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Lint Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> --------- Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

…AVING state (#14157) #### What this PR does The `Ring.Get()` is extremely inefficient when there are are large number of instances in a state for which the Ring operation requests to extend the replication set. This is a known issue, that was attempted to be optimised in grafana/dskit#672 but unfortunately the CPU impact on queriers is still huge. I'm finalizing an optimization for the dskit's `Ring.Get()`, but – even with my upcoming optimization – there's still a significant penalty when the replication set is extended for a large number of instances. In this PR I'm taking a different approach: I'm questioning whether we really need to extend the replication set for store-gateways in `LEAVING` state. My take is that we don't need it (but we still need it for store-gateways in `JOINING` state). I'm the author of the original code (cortexproject/cortex#4263) and the original code was actually intended to cover the `JOINING` state (for which I still agree that we need to extend the replication set). In this PR I simply propose to not extend the replica set for `LEAVING` instances. In the code comment you find the full reasoning (that I would like to leave there as a future reference). The benchmark for the `LEAVING` state is quite impactful: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/querier cpu: Apple M3 Pro │ before.txt │ after.txt │ │ sec/op │ sec/op vs base │ BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE-11 9.924µ ± 11% 8.853µ ± 4% -10.80% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING-11 23.91m ± 3% 24.72m ± 13% +3.39% (p=0.026 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING-11 24007.353µ ± 2% 8.855µ ± 1% -99.96% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE_with_dynamic_replication-11 43.62µ ± 2% 43.14µ ± 2% ~ (p=0.093 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING_with_dynamic_replication-11 24.50m ± 3% 24.68m ± 11% ~ (p=0.699 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING_with_dynamic_replication-11 24033.15µ ± 2% 44.34µ ± 1% -99.82% (p=0.002 n=6) geomean 2.296m 212.3µ -90.75% │ before.txt │ after.txt │ │ B/op │ B/op vs base │ BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE-11 15.291Ki ± 0% 7.314Ki ± 0% -52.17% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING-11 1.583Mi ± 1% 1.231Mi ± 4% -22.26% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING-11 1618.160Ki ± 2% 7.315Ki ± 0% -99.55% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE_with_dynamic_replication-11 49.48Ki ± 0% 51.28Ki ± 0% +3.63% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING_with_dynamic_replication-11 1.674Mi ± 0% 1.228Mi ± 3% -26.64% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING_with_dynamic_replication-11 1714.05Ki ± 0% 53.16Ki ± 0% -96.90% (p=0.002 n=6) geomean 424.3Ki 78.34Ki -81.53% │ before.txt │ after.txt │ │ allocs/op │ allocs/op vs base │ BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE-11 81.00 ± 0% 55.00 ± 0% -32.10% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING-11 503.0 ± 6% 470.0 ± 13% ~ (p=0.117 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING-11 499.00 ± 7% 55.00 ± 0% -88.98% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE_with_dynamic_replication-11 141.0 ± 0% 115.0 ± 0% -18.44% (p=0.002 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING_with_dynamic_replication-11 533.0 ± 2% 485.0 ± 9% -9.01% (p=0.006 n=6) BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING_with_dynamic_replication-11 533.0 ± 1% 135.0 ± 0% -74.67% (p=0.002 n=6) geomean 305.6 148.5 -51.42% ``` #### Which issue(s) this PR fixes or relates to N/A #### Checklist - [x] Tests updated. - [ ] Documentation added. - [x] `CHANGELOG.md` updated - the order of entries should be `[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry is not needed, please add the `changelog-not-needed` label to the PR. - [ ] [`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md) updated with experimental features.  --- > [!NOTE] > Improves querier performance when many store-gateways are not ACTIVE. > > - Changes `storegateway.BlocksRead` to extend the replica set only for `JOINING` instances (no longer for `LEAVING`), with clarified in-code rationale > - Optimizes `querier.GetClientsFor` by reusing ring buffers via `ring.WithBuffers(ring.MakeBuffersForGet())` to reduce allocations/CPU > - Adds safety test ensuring buffer reuse doesn’t corrupt instance metadata and a comprehensive benchmark covering ACTIVE/JOINING/LEAVING and dynamic replication scenarios > - Updates CHANGELOG with the enhancement entry > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 43ff03e. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>  --------- Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Taylor C <41653732+tacole02@users.noreply.github.com> Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>

@56quarters

…tate for which the extended replication set is requested (#884) **What this PR does**: The `Ring.Get()` is very inefficient when a large number of instances are in a state for which the extended replication set is requested. This is not a new issue, and there was a previous optimization attempt in #672 by @56quarters. Unfortunately the issue is not fixed yet and we can see it when the magnitude of instances increases. In this PR I propose an optimization based on the fact that looking up small maps is slower than just searching for corresponding items in slices (maps become fasters when the number of unique items increases, but for few items slices are just faster). The result of this optimization is not huge, but I think it's still a step forward. It's still very slow when we have a large number of instances in a state for which the extended replication set is requested, compared to the case all instances are ACTIVE. ``` goos: darwin goarch: arm64 pkg: github.com/grafana/dskit/ring cpu: Apple M3 Pro │ new-before.txt │ new-after.txt │ │ sec/op │ sec/op vs base │ Ring_Get/with_zone_awareness-11 1008.1n ± 19% 553.1n ± 8% -45.13% (p=0.002 n=6) Ring_Get/one_excluded_zone-11 624.6n ± 1% 393.0n ± 12% -37.08% (p=0.002 n=6) Ring_Get/without_zone_awareness-11 347.1n ± 4% 334.5n ± 1% -3.62% (p=0.002 n=6) Ring_Get/without_zone_awareness,_not_enough_instances-11 268.5n ± 1% 256.2n ± 1% -4.56% (p=0.002 n=6) Ring_Get_OneZoneLeaving/one_zone_leaving_=_true-11 3739.3µ ± 5% 903.0µ ± 2% -75.85% (p=0.002 n=6) Ring_Get_OneZoneLeaving/one_zone_leaving_=_false-11 1.905µ ± 6% 1.169µ ± 2% -38.62% (p=0.002 n=6) geomean 2.734µ 1.643µ -39.91% │ new-before.txt │ new-after.txt │ │ B/op │ B/op vs base │ Ring_Get/with_zone_awareness-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ Ring_Get/one_excluded_zone-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ Ring_Get/without_zone_awareness-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ Ring_Get/without_zone_awareness,_not_enough_instances-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ Ring_Get_OneZoneLeaving/one_zone_leaving_=_true-11 109.6Ki ± 0% 124.3Ki ± 0% +13.45% (p=0.002 n=6) Ring_Get_OneZoneLeaving/one_zone_leaving_=_false-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ geomean ² +2.13% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ new-before.txt │ new-after.txt │ │ allocs/op │ allocs/op vs base │ Ring_Get/with_zone_awareness-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ Ring_Get/one_excluded_zone-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ Ring_Get/without_zone_awareness-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ Ring_Get/without_zone_awareness,_not_enough_instances-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ Ring_Get_OneZoneLeaving/one_zone_leaving_=_true-11 19.00 ± 0% 27.00 ± 0% +42.11% (p=0.002 n=6) Ring_Get_OneZoneLeaving/one_zone_leaving_=_false-11 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ geomean ² +6.03% ² ¹ all samples are equal ² summaries must be >0 to compute geomean ``` **Which issue(s) this PR fixes**: N/A **Checklist** - [x] Tests updated  --- > [!NOTE] > Improves performance of `Ring.Get()` when many instances extend the replica set, especially under zone-awareness. > > - Replaces `distinctHosts` slice/map usage with a hybrid `stringSet` (`util.go`) to reduce lookups/allocations; adds comprehensive tests > - Switches per-zone tracking from maps to slices indexed by zone index (`totalHostsPerZone`, `examinedHostsPerZone`, `foundHostsPerZone`); updates `canStopLooking()` signature accordingly > - Adds explicit `zoneIndex` resolution with validation and panics on misuse if zone-awareness disabled > - Adjusts main selection loop to use `stringSet.len()/contains()/add()` and new stop condition > - Expands and parameterizes `BenchmarkRing_Get_OneZoneLeaving` (larger `instances`, more zones, and both with/without one zone leaving) with deterministic assertions > > Net effect: substantial reductions in `Ring.Get()` time in pathological scenarios; no API changes. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 9f1c505. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>  --------- Signed-off-by: Marco Pracucci <marco@pracucci.com>

56quarters requested review from a team and pstibrany March 25, 2025 16:57

pstibrany reviewed Mar 26, 2025

View reviewed changes

56quarters force-pushed the 56quarters/ring-missing-zone branch from 67a03bf to 16780f3 Compare March 26, 2025 13:53

56quarters marked this pull request as ready for review March 26, 2025 15:41

julienduchesne approved these changes Mar 26, 2025

View reviewed changes

julienduchesne requested a review from a team March 26, 2025 15:48

56quarters merged commit 60d867e into main Mar 27, 2025
5 checks passed

56quarters deleted the 56quarters/ring-missing-zone branch March 27, 2025 16:40

56quarters mentioned this pull request Mar 27, 2025

chore: update to latest dskit for ring performance fix grafana/mimir#11028

Merged

4 tasks

This was referenced Jan 24, 2026

Querier: Optimize querying store-gateways when many of them are in LEAVING state grafana/mimir#14157

Merged

perf: optimize Ring.Get() when a large number of instances are in a state for which the extended replication set is requested #884

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ring: Fix pathological case when an entire zone leaves#672

ring: Fix pathological case when an entire zone leaves#672
56quarters merged 1 commit intomainfrom
56quarters/ring-missing-zone

56quarters commented Mar 25, 2025

Uh oh!

56quarters commented Mar 25, 2025

Uh oh!

pstibrany left a comment

Uh oh!

julienduchesne left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

56quarters commented Mar 25, 2025

Uh oh!

56quarters commented Mar 25, 2025

Uh oh!

pstibrany left a comment

Choose a reason for hiding this comment

Uh oh!

julienduchesne left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants