Skip to content

cache: Added consistent reads for cache#21428

Open
akstron wants to merge 4 commits intoetcd-io:mainfrom
akstron:dev/cache-consistent-read
Open

cache: Added consistent reads for cache#21428
akstron wants to merge 4 commits intoetcd-io:mainfrom
akstron:dev/cache-consistent-read

Conversation

@akstron
Copy link

@akstron akstron commented Mar 4, 2026

Cache supports consistent reads when IsSerializable is false. Based on: https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/2340-Consistent-reads-from-cache

Cache.Get Request Flow

cache.Get(ctx, key, opts...)
│
├── store.LatestRev == 0?
│   └── yes ──► WaitReady(ctx) ── error ──► return error
│
├── validateGet(key, op) ── error ──► return error
│
├── op.IsSerializable()?
│   │
│   ├── YES (serializable / fast local path)
│   │   │
│   │   └───────────────────────────────────────────────────┐
│   │                                                       │
│   └── NO  (non-serializable / consistent read path)       │
│       │                                                   │
│       ▼                                                   │
│   waitTillRevision(ctx, requestedRev)                     │
│       │                                                   │
│       ▼                                                   │
│   serverRevision(ctx)                                     │
│   linearizable kv.Get(prefix, Limit(1), KeysOnly)         │
│       ──► etcd leader                                     │
│       │                                                   │
│       ├── error ──► return error                          │
│       │                                                   │
│       ▼                                                   │
│   rev == 0?                                               │
│       └── yes ──► rev = serverRev                         │
│       │                                                   │
│       ▼                                                   │
│   serverRev < rev?                                        │
│       └── yes ──► return ErrFutureRev                     │
│       │                                                   │
│       ▼                                                   │
│   localRevision() >= rev?                                 │
│       └── yes ──────────────────────────────────────┐     │
│       │                                             │     │
│       ▼                                             │     │
│   progressRequestor.add()                           │     │
│   (wakes background RequestProgress loop)           │     │
│       │                                             │     │
│       ▼                                             │     │
│   ┌─────────────────────────────┐                   │     │
│   │  Poll loop                  │                   │     │
│   │  ticker: 50ms (PollInterval)│                   │     │
│   │  timeout: 3s  (WaitTimeout) │                   │     │
│   │                             │                   │     │
│   │  localRevision() >= rev? ───┼── yes ────────────┤     │
│   │       │ no                  │                   │     │
│   │       ▼                     │                   │     │
│   │  select:                    │                   │     │
│   │  ├─ ticker  ──► loop back   │                   │     │
│   │  ├─ timeout ──► return ErrCacheTimeout          │     │
│   │  └─ ctx.Done ──► return ctx.Err()               │     │
│   └─────────────────────────────┘                   │     │
│       │                                             │     │
│       ▼                                             │     │
│   progressRequestor.remove()  (deferred)            │     │
│   (background loop stops sending RequestProgress    │     │
│    once waiting drops to 0)                         │     │
│                                                     │     │
│   ◄─────────────────────────────────────────────────┘     │
│   ◄───────────────────────────────────────────────────────┘
│
▼
store.Get(startKey, endKey, requestedRev)
│
├── error ──► return error
│
▼
return GetResponse { Header.Revision, Kvs, Count }

Background: conditionalProgressRequestor.run(ctx)

┌──────────────────────────────────────────────────────┐
│  Blocks on cond.Wait() while waiting == 0            │
│                                                      │
│  When waiting > 0:                                   │
│    timer: 100ms (RequestInterval)                    │
│    sends watcher.RequestProgress(ctx)                │
│    ──► etcd responds with progress notification      │
│    ──► watch stream delivers revision update         │
│    ──► store.LatestRev advances                      │
│                                                      │
│  When waiting drops to 0:                            │
│    timer resets to 0, loop re-checks, blocks again   │
│                                                      │
│  ctx.Done ──► return                                 │
└──────────────────────────────────────────────────────┘

The progress loop only sends RequestProgress when at least one
waitTillRevision call is actively waiting, avoiding unnecessary
server round-trips for serializable-only workloads.

Part of #19371

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: akstron
Once this PR has been reviewed and has the lgtm label, please assign siyuanfoundation for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link

Hi @akstron. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@akstron akstron changed the title [WIP] cache: Added consistent read for cache [WIP] cache: Added consistent reads for cache Mar 4, 2026
@akstron akstron changed the title [WIP] cache: Added consistent reads for cache cache: Added consistent reads for cache Mar 4, 2026
@akstron akstron marked this pull request as ready for review March 4, 2026 17:46
@serathius
Copy link
Member

Please sign the DCO https://github.com/etcd-io/etcd/pull/21428/checks?check_run_id=65752362584

func revLessThan(n int64) func(int64) bool { return func(r int64) bool { return r < n } }
func revGreaterEqual(n int64) func(int64) bool { return func(r int64) bool { return r >= n } }

func TestCacheConsistentRead(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integrate with TestGet

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the getTestCases also check for revision which is causing issue while trying to integrate. I want to send put requests outside the prefix to update the revision before every Get request but that also updates the server revision which is being checked in each test case.

example:

=== RUN   TestCacheConsistentRead/fromKey_/foo/b
    cache_test.go:638: revision: got 93, want 8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good it's checking revision. That's the way

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I update getTestCases according to the new behaviour?

Copy link
Member

@serathius serathius Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. thanks. Also updated testWithPrefixGet test cases.

@serathius
Copy link
Member

serathius commented Mar 5, 2026

From https://github.com/etcd-io/etcd/pull/21428/checks?check_run_id=65752362584

want: Author: Alok Kumar Singh, Committer: Alok Kumar Singh; Expected "Alok Kumar Singh alokkumar.singh@alokkum-ltmybqp.internal.salesforce.com", but
got "Alok Kumar Singh dev.alok.singh123@gmail.com".

@akstron akstron force-pushed the dev/cache-consistent-read branch from f532d80 to e54ddc6 Compare March 5, 2026 13:46
@serathius
Copy link
Member

/ok-to-test

@codecov
Copy link

codecov bot commented Mar 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.33%. Comparing base (a68edb3) to head (d3a0c47).

Additional details and impacted files

see 28 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #21428      +/-   ##
==========================================
- Coverage   68.47%   68.33%   -0.14%     
==========================================
  Files         428      428              
  Lines       35291    35291              
==========================================
- Hits        24164    24116      -48     
- Misses       9733     9769      +36     
- Partials     1394     1406      +12     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a68edb3...d3a0c47. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@serathius
Copy link
Member

This is shaping good, just couple of nits, some testing and we should be good to merge.

@serathius
Copy link
Member

Please squash commits

@serathius
Copy link
Member

serathius commented Mar 14, 2026

Run robustness tests with this PR in #21485, got linearization error. Looks like cache is still returning stale result in some case.

image

Looks like issue is related to revision returned when Get has non zero revision provided.

Signed-off-by: Alok Kumar Singh <dev.alok.singh123@gmail.com>
@akstron akstron force-pushed the dev/cache-consistent-read branch from ab469cf to 1f912eb Compare March 14, 2026 19:50
akstron and others added 3 commits March 15, 2026 01:41
Signed-off-by: Alok Kumar Singh <dev.alok.singh123@gmail.com>
Signed-off-by: Alok Kumar Singh <dev.alok.singh123@gmail.com>
@k8s-ci-robot
Copy link

@akstron: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-robustness-arm64 d3a0c47 link true /test pull-etcd-robustness-arm64

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@serathius
Copy link
Member

Please squash commits and keep the squashed into 1 commit

// waitTillRevision blocks until the local cache revision reaches rev,
// using the server's latest revision as the linearizable target.
func (c *Cache) waitTillRevision(ctx context.Context, rev int64) error {
if rev != 0 && c.store.LatestRev() >= rev {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my suggestion was wrong, we always need to read server Revision.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would that be? Did you got that reason from the failing robustness test? I didn't got a chance to check it, let me check it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Etcd server always linearizes read and thus returns linearized revision. Cache needs to mirror that behavior to be compatible.

return nil
}

rev, err := c.linearizableRevision(ctx, rev)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the function out of waitTillRevision to Get, it's unrelated to waiting till revision.

return rev, nil
}

// waitTillRevision blocks until the local cache revision reaches rev,
Copy link
Member

@serathius serathius Mar 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing comments doest help by itself, usually it's a sign of code not being clean itself.

General rule: Comment should explain WHY something is being do e, not WHATis being done.

Don't write comments that just explain the function. Rather think how to improve the code to make it obvious

Comment on lines +180 to +182
if err := c.waitTillRevision(ctx, requestedRev); err != nil {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err := c.waitTillRevision(ctx, requestedRev); err != nil {
return nil, err
}
serverRev, err := c.serverRevision(ctx)
if err != nil {
return nil, err
}
if requestedRev > serverRev {
return nil, rpctypes.ErrFutureRev
}
if err = c.waitTillRevision(ctx, serverRev); err != nil {
return nil, err
}

Comment on lines +251 to +259
func (c *Cache) waitTillRevision(ctx context.Context, rev int64) error {
if rev != 0 && c.store.LatestRev() >= rev {
return nil
}

rev, err := c.linearizableRevision(ctx, rev)
if err != nil {
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (c *Cache) waitTillRevision(ctx context.Context, rev int64) error {
if rev != 0 && c.store.LatestRev() >= rev {
return nil
}
rev, err := c.linearizableRevision(ctx, rev)
if err != nil {
return err
}
func (c *Cache) waitTillRevision(ctx context.Context, rev int64) error {

@serathius
Copy link
Member

Confirmed that with recommended changes the cache is linearizable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants