Skip to content

ring: Fix pathological case when an entire zone leaves#672

Merged
56quarters merged 1 commit intomainfrom
56quarters/ring-missing-zone
Mar 27, 2025
Merged

ring: Fix pathological case when an entire zone leaves#672
56quarters merged 1 commit intomainfrom
56quarters/ring-missing-zone

Conversation

@56quarters
Copy link
Copy Markdown
Contributor

What this PR does:

This change improves performance in a the case where an entire zone is not ACTIVE and the replication set is meant to be extended. Previously, when an entire zone was unavailable, the ring kept searching for instances by looking at every single token trying to find an instance in the required zone that was ACTIVE. This meant thousands of iterations to find a host that would never work.

This change keeps track of the number of hosts that we have examined in each zone. It returns early once we have either found the hosts in each zone we need OR we have examined all hosts in the zone and so know that we won't find one.

Which issue(s) this PR fixes:

N/A

Checklist

  • Tests updated
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@56quarters
Copy link
Copy Markdown
Contributor Author

Comparison against a commit from early January which predates my changes in #632:

$ benchstat prev.txt current.txt 
goos: linux
goarch: amd64
pkg: github.com/grafana/dskit/ring
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
                           │   prev.txt    │             current.txt             │
                           │    sec/op     │   sec/op     vs base                │
Ring_Get_OneZoneLeaving-16   8329.3µ ± 13%   114.6µ ± 5%  -98.62% (p=0.000 n=10)

                           │   prev.txt   │           current.txt            │
                           │     B/op     │     B/op      vs base            │
Ring_Get_OneZoneLeaving-16   10.27Ki ± 0%   10.27Ki ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

                           │  prev.txt  │          current.txt           │
                           │ allocs/op  │ allocs/op   vs base            │
Ring_Get_OneZoneLeaving-16   12.00 ± 0%   12.00 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

@56quarters 56quarters requested review from a team and pstibrany March 25, 2025 16:57
Copy link
Copy Markdown
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen ring code in a while, and originally I thought I've found a bug in the PR, but staring at it for some more time, I'm wrong and the change seems legit and makes sense to me.

This change improves performance in a the case where an entire zone is
not ACTIVE and the replication set is meant to be extended. Previously,
when an entire zone was unavailable, the ring kept searching for
instances by looking at every single token trying to find an instance
in the required zone that was ACTIVE. This meant thousands of iterations
to find a host that would never work.

This change keeps track of the number of hosts that we have examined
in each zone. It returns early once we have either found the hosts in
each zone we need _OR_ we have examined all hosts in the zone and so
know that we won't find one.

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
@56quarters 56quarters force-pushed the 56quarters/ring-missing-zone branch from 67a03bf to 16780f3 Compare March 26, 2025 13:53
@56quarters 56quarters marked this pull request as ready for review March 26, 2025 15:41
Copy link
Copy Markdown
Contributor

@julienduchesne julienduchesne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me, not super familiar with ring code though

@julienduchesne julienduchesne requested a review from a team March 26, 2025 15:48
@56quarters 56quarters merged commit 60d867e into main Mar 27, 2025
5 checks passed
@56quarters 56quarters deleted the 56quarters/ring-missing-zone branch March 27, 2025 16:40
56quarters added a commit to grafana/mimir that referenced this pull request Mar 27, 2025
Specifically, this pulls in the following dskit PRs:

* grafana/dskit#672
* grafana/dskit#669
* grafana/dskit#668

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters added a commit to grafana/mimir that referenced this pull request Mar 27, 2025
Update to the latest dskit commmit to pull in grafana/dskit#672
which improves performance of ring operations in clients (queriers,
distributors) when zone-awareness is enabled and an entire zone is
not "ACTIVE".

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters added a commit to grafana/mimir that referenced this pull request Mar 27, 2025
Update to the latest dskit commit to pull in grafana/dskit#672
which improves performance of ring operations in clients (queriers,
distributors) when zone-awareness is enabled and an entire zone is
not "ACTIVE".

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters added a commit to grafana/mimir that referenced this pull request Mar 27, 2025
Update to the latest dskit commit to pull in grafana/dskit#672
which improves performance of ring operations in clients (queriers,
distributors) when zone-awareness is enabled and an entire zone is
not "ACTIVE".

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters added a commit to grafana/mimir that referenced this pull request Mar 27, 2025
* chore: update to latest dskit for ring performance fix

Update to the latest dskit commit to pull in grafana/dskit#672
which improves performance of ring operations in clients (queriers,
distributors) when zone-awareness is enabled and an entire zone is
not "ACTIVE".

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

* Lint

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

---------

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
pracucci added a commit to grafana/mimir that referenced this pull request Jan 27, 2026
…AVING state (#14157)

#### What this PR does

The `Ring.Get()` is extremely inefficient when there are are large
number of instances in a state for which the Ring operation requests to
extend the replication set.

This is a known issue, that was attempted to be optimised in
grafana/dskit#672 but unfortunately the CPU
impact on queriers is still huge. I'm finalizing an optimization for the
dskit's `Ring.Get()`, but – even with my upcoming optimization – there's
still a significant penalty when the replication set is extended for a
large number of instances.

In this PR I'm taking a different approach: I'm questioning whether we
really need to extend the replication set for store-gateways in
`LEAVING` state. My take is that we don't need it (but we still need it
for store-gateways in `JOINING` state). I'm the author of the original
code (cortexproject/cortex#4263) and the
original code was actually intended to cover the `JOINING` state (for
which I still agree that we need to extend the replication set).

In this PR I simply propose to not extend the replica set for `LEAVING`
instances. In the code comment you find the full reasoning (that I would
like to leave there as a future reference).

The benchmark for the `LEAVING` state is quite impactful:

```
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/querier
cpu: Apple M3 Pro
                                                                                     │    before.txt    │              after.txt              │
                                                                                     │      sec/op      │    sec/op     vs base               │
BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE-11                                9.924µ ± 11%   8.853µ ±  4%  -10.80% (p=0.002 n=6)
BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING-11                                23.91m ±  3%   24.72m ± 13%   +3.39% (p=0.026 n=6)
BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING-11                            24007.353µ ±  2%   8.855µ ±  1%  -99.96% (p=0.002 n=6)
BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE_with_dynamic_replication-11       43.62µ ±  2%   43.14µ ±  2%        ~ (p=0.093 n=6)
BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING_with_dynamic_replication-11       24.50m ±  3%   24.68m ± 11%        ~ (p=0.699 n=6)
BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING_with_dynamic_replication-11    24033.15µ ±  2%   44.34µ ±  1%  -99.82% (p=0.002 n=6)
geomean                                                                                    2.296m         212.3µ        -90.75%

                                                                                     │   before.txt    │              after.txt              │
                                                                                     │      B/op       │     B/op      vs base               │
BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE-11                              15.291Ki ± 0%   7.314Ki ± 0%  -52.17% (p=0.002 n=6)
BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING-11                               1.583Mi ± 1%   1.231Mi ± 4%  -22.26% (p=0.002 n=6)
BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING-11                            1618.160Ki ± 2%   7.315Ki ± 0%  -99.55% (p=0.002 n=6)
BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE_with_dynamic_replication-11      49.48Ki ± 0%   51.28Ki ± 0%   +3.63% (p=0.002 n=6)
BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING_with_dynamic_replication-11      1.674Mi ± 0%   1.228Mi ± 3%  -26.64% (p=0.002 n=6)
BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING_with_dynamic_replication-11    1714.05Ki ± 0%   53.16Ki ± 0%  -96.90% (p=0.002 n=6)
geomean                                                                                   424.3Ki        78.34Ki       -81.53%

                                                                                     │ before.txt  │             after.txt              │
                                                                                     │  allocs/op  │  allocs/op   vs base               │
BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE-11                             81.00 ± 0%   55.00 ±  0%  -32.10% (p=0.002 n=6)
BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING-11                             503.0 ± 6%   470.0 ± 13%        ~ (p=0.117 n=6)
BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING-11                            499.00 ± 7%   55.00 ±  0%  -88.98% (p=0.002 n=6)
BlocksStoreReplicationSet_GetClientsFor/all_zones_ACTIVE_with_dynamic_replication-11    141.0 ± 0%   115.0 ±  0%  -18.44% (p=0.002 n=6)
BlocksStoreReplicationSet_GetClientsFor/one_zone_JOINING_with_dynamic_replication-11    533.0 ± 2%   485.0 ±  9%   -9.01% (p=0.006 n=6)
BlocksStoreReplicationSet_GetClientsFor/one_zone_LEAVING_with_dynamic_replication-11    533.0 ± 1%   135.0 ±  0%  -74.67% (p=0.002 n=6)
geomean                                                                                 305.6        148.5        -51.42%
```

#### Which issue(s) this PR fixes or relates to

N/A

#### Checklist

- [x] Tests updated.
- [ ] Documentation added.
- [x] `CHANGELOG.md` updated - the order of entries should be
`[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry
is not needed, please add the `changelog-not-needed` label to the PR.
- [ ]
[`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md)
updated with experimental features.


<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Improves querier performance when many store-gateways are not ACTIVE.
> 
> - Changes `storegateway.BlocksRead` to extend the replica set only for
`JOINING` instances (no longer for `LEAVING`), with clarified in-code
rationale
> - Optimizes `querier.GetClientsFor` by reusing ring buffers via
`ring.WithBuffers(ring.MakeBuffersForGet())` to reduce allocations/CPU
> - Adds safety test ensuring buffer reuse doesn’t corrupt instance
metadata and a comprehensive benchmark covering ACTIVE/JOINING/LEAVING
and dynamic replication scenarios
> - Updates CHANGELOG with the enhancement entry
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
43ff03e. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Taylor C <41653732+tacole02@users.noreply.github.com>
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
pracucci added a commit that referenced this pull request Jan 28, 2026
…tate for which the extended replication set is requested (#884)

**What this PR does**:

The `Ring.Get()` is very inefficient when a large number of instances
are in a state for which the extended replication set is requested. This
is not a new issue, and there was a previous optimization attempt in
#672 by @56quarters. Unfortunately
the issue is not fixed yet and we can see it when the magnitude of
instances increases.

In this PR I propose an optimization based on the fact that looking up
small maps is slower than just searching for corresponding items in
slices (maps become fasters when the number of unique items increases,
but for few items slices are just faster).

The result of this optimization is not huge, but I think it's still a
step forward. It's still very slow when we have a large number of
instances in a state for which the extended replication set is
requested, compared to the case all instances are ACTIVE.

```
goos: darwin
goarch: arm64
pkg: github.com/grafana/dskit/ring
cpu: Apple M3 Pro
                                                         │ new-before.txt │            new-after.txt            │
                                                         │     sec/op     │    sec/op     vs base               │
Ring_Get/with_zone_awareness-11                             1008.1n ± 19%   553.1n ±  8%  -45.13% (p=0.002 n=6)
Ring_Get/one_excluded_zone-11                                624.6n ±  1%   393.0n ± 12%  -37.08% (p=0.002 n=6)
Ring_Get/without_zone_awareness-11                           347.1n ±  4%   334.5n ±  1%   -3.62% (p=0.002 n=6)
Ring_Get/without_zone_awareness,_not_enough_instances-11     268.5n ±  1%   256.2n ±  1%   -4.56% (p=0.002 n=6)
Ring_Get_OneZoneLeaving/one_zone_leaving_=_true-11          3739.3µ ±  5%   903.0µ ±  2%  -75.85% (p=0.002 n=6)
Ring_Get_OneZoneLeaving/one_zone_leaving_=_false-11          1.905µ ±  6%   1.169µ ±  2%  -38.62% (p=0.002 n=6)
geomean                                                      2.734µ         1.643µ        -39.91%

                                                         │ new-before.txt │             new-after.txt             │
                                                         │      B/op      │     B/op      vs base                 │
Ring_Get/with_zone_awareness-11                              0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=6) ¹
Ring_Get/one_excluded_zone-11                                0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=6) ¹
Ring_Get/without_zone_awareness-11                           0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=6) ¹
Ring_Get/without_zone_awareness,_not_enough_instances-11     0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=6) ¹
Ring_Get_OneZoneLeaving/one_zone_leaving_=_true-11         109.6Ki ± 0%     124.3Ki ± 0%  +13.45% (p=0.002 n=6)
Ring_Get_OneZoneLeaving/one_zone_leaving_=_false-11          0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=6) ¹
geomean                                                                 ²                  +2.13%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                                         │ new-before.txt │            new-after.txt            │
                                                         │   allocs/op    │ allocs/op   vs base                 │
Ring_Get/with_zone_awareness-11                              0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=6) ¹
Ring_Get/one_excluded_zone-11                                0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=6) ¹
Ring_Get/without_zone_awareness-11                           0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=6) ¹
Ring_Get/without_zone_awareness,_not_enough_instances-11     0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=6) ¹
Ring_Get_OneZoneLeaving/one_zone_leaving_=_true-11           19.00 ± 0%     27.00 ± 0%  +42.11% (p=0.002 n=6)
Ring_Get_OneZoneLeaving/one_zone_leaving_=_false-11          0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=6) ¹
geomean                                                                 ²                +6.03%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```

**Which issue(s) this PR fixes**:

N/A

**Checklist**
- [x] Tests updated

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Improves performance of `Ring.Get()` when many instances extend the
replica set, especially under zone-awareness.
> 
> - Replaces `distinctHosts` slice/map usage with a hybrid `stringSet`
(`util.go`) to reduce lookups/allocations; adds comprehensive tests
> - Switches per-zone tracking from maps to slices indexed by zone index
(`totalHostsPerZone`, `examinedHostsPerZone`, `foundHostsPerZone`);
updates `canStopLooking()` signature accordingly
> - Adds explicit `zoneIndex` resolution with validation and panics on
misuse if zone-awareness disabled
> - Adjusts main selection loop to use
`stringSet.len()/contains()/add()` and new stop condition
> - Expands and parameterizes `BenchmarkRing_Get_OneZoneLeaving` (larger
`instances`, more zones, and both with/without one zone leaving) with
deterministic assertions
> 
> Net effect: substantial reductions in `Ring.Get()` time in
pathological scenarios; no API changes.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
9f1c505. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants