-
Notifications
You must be signed in to change notification settings - Fork 146
fix(storage): scope ledger queries with bucket hint #1252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Replace the dynamic _system.ledgers subquery in newScopedSelect() with a pre-computed isAloneInBucket boolean flag. The flag is set at store creation/opening time via CountLedgersInBucket, avoiding degraded PostgreSQL query plans when a ledger is alone in its bucket. Backport of #1250.
WalkthroughThe changes introduce a bucket-level optimization that detects when a ledger is the sole member of its bucket. When creating or opening a ledger, the system counts ledgers in the bucket and sets a flag. The ledger store subsequently uses this flag to simplify database query construction by skipping unnecessary ledger filters. Changes
Sequence Diagram(s)sequenceDiagram
participant Driver as Driver Layer
participant SystemStore as System Store
participant LedgerStore as Ledger Store
participant DB as Database
Driver->>SystemStore: CountLedgersInBucket(ctx, bucket)
SystemStore->>DB: SELECT COUNT(*) FROM _system.ledgers WHERE bucket = ?
DB-->>SystemStore: count
SystemStore-->>Driver: count
Driver->>LedgerStore: SetAloneInBucket(count == 1)
LedgerStore->>LedgerStore: isAloneInBucket = true/false
Note over LedgerStore: Future queries in newScopedSelect<br/>skip ledger filter if isAloneInBucket
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@internal/storage/driver/driver.go`:
- Around line 93-113: OpenLedger currently calls systemStore.GetLedger and
systemStore.CountLedgersInBucket separately which can race with concurrent
CreateLedger and leave an already-open Store with a stale isAloneInBucket flag;
fix by performing the ledger lookup and bucket count in a single read-only
transaction/snapshot (use whatever transaction/snapshot API your DB/systemStore
exposes) so the CountLedgersInBucket result is consistent with GetLedger, then
call store.SetAloneInBucket based on that transactional count; alternatively, if
a transactional snapshot isn't available, document that callers must not reuse
the returned *ledgerstore.Store across requests and add a per-request refresh
path that re-evaluates CountLedgersInBucket before critical reads (reference:
OpenLedger, systemStore.GetLedger, systemStore.CountLedgersInBucket,
store.SetAloneInBucket).
🧹 Nitpick comments (2)
internal/storage/driver/driver.go (1)
113-113: Nit: return explicitnilerror for clarity.At this point
erris guaranteed to benil(the non-nil path returns on Line 109), but returning the shadowederrvariable is slightly confusing. Returningnilexplicitly makes the success path self-evident.Suggested change
- return store, ret, err + return store, ret, nilinternal/storage/ledger/store.go (1)
27-32: Good documentation on the field — considersync/atomicif refresh is ever added.The comment clearly documents the invariant and the invalidation risk. Today the flag is set-once before the store is returned, so there's no data race. However, if a future change refreshes the flag while the store is serving concurrent queries (as the comment itself suggests may be needed), the unsynchronized
boolbecomes a data race. Anatomic.Boolwould make the field safe for concurrent read/write at negligible cost.
| func (d *Driver) OpenLedger(ctx context.Context, name string) (*ledgerstore.Store, *ledger.Ledger, error) { | ||
| // todo: keep the ledger in cache somewhere to avoid read the ledger at each request, maybe in the factory | ||
| ret, err := d.systemStoreFactory.Create(d.db).GetLedger(ctx, name) | ||
| // NOTE: if the store is ever cached, the isAloneInBucket flag must be | ||
| // refreshed/invalidated when bucket membership changes, otherwise | ||
| // cross-ledger reads may occur (missing WHERE ledger = ?). | ||
| systemStore := d.systemStoreFactory.Create(d.db) | ||
|
|
||
| ret, err := systemStore.GetLedger(ctx, name) | ||
| if err != nil { | ||
| return nil, nil, err | ||
| } | ||
|
|
||
| store := d.ledgerStoreFactory.Create(d.bucketFactory.Create(ret.Bucket), *ret) | ||
|
|
||
| count, err := systemStore.CountLedgersInBucket(ctx, ret.Bucket) | ||
| if err != nil { | ||
| return nil, nil, fmt.Errorf("counting ledgers in bucket: %w", err) | ||
| } | ||
| store.SetAloneInBucket(count == 1) | ||
|
|
||
| return store, ret, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Staleness risk: isAloneInBucket is never refreshed for an already-opened store.
The comment on Lines 95-97 correctly warns about cache invalidation, but the risk goes beyond caching. If the store returned by OpenLedger is held open (even transiently) while a concurrent CreateLedger adds a second ledger to the same bucket, the first store will silently skip the WHERE ledger = ? filter, leaking rows from the new ledger.
For the CreateLedger path this is safe (transactional), but OpenLedger is outside any transaction and the flag is set-once. Consider:
- Wrapping
GetLedger+CountLedgersInBucketin a read-only transaction (or snapshot) so the count is at least consistent with the ledger lookup. - Documenting that callers must not reuse the returned
*Storeacross requests when bucket membership may change, or adding a per-request refresh.
This is flagged for awareness — the window is narrow but the impact (cross-ledger data leakage) is high if it occurs.
🤖 Prompt for AI Agents
In `@internal/storage/driver/driver.go` around lines 93 - 113, OpenLedger
currently calls systemStore.GetLedger and systemStore.CountLedgersInBucket
separately which can race with concurrent CreateLedger and leave an already-open
Store with a stale isAloneInBucket flag; fix by performing the ledger lookup and
bucket count in a single read-only transaction/snapshot (use whatever
transaction/snapshot API your DB/systemStore exposes) so the
CountLedgersInBucket result is consistent with GetLedger, then call
store.SetAloneInBucket based on that transactional count; alternatively, if a
transactional snapshot isn't available, document that callers must not reuse the
returned *ledgerstore.Store across requests and add a per-request refresh path
that re-evaluates CountLedgersInBucket before critical reads (reference:
OpenLedger, systemStore.GetLedger, systemStore.CountLedgersInBucket,
store.SetAloneInBucket).
Summary
Backport of #1250 onto
release/v2.3.CountLedgersInBucketto the system store interface and implementation_system.ledgerssubquery innewScopedSelect()with a pre-computedisAloneInBucketboolean flagCreateLedger) and opening (OpenLedger) timeThis avoids degraded PostgreSQL query plans when a ledger is alone in its bucket, and guards against potential cross-ledger reads if the store is cached without proper invalidation.
Files changed
internal/storage/system/store.go— newCountLedgersInBucketmethodinternal/storage/driver/store.go—SystemStoreinterface updateinternal/storage/driver/driver.go— flag initialization inCreateLedgerandOpenLedgerinternal/storage/ledger/store.go—isAloneInBucketfield, simplifiednewScopedSelect(),SetAloneInBucketmethodinternal/storage/driver/store_generated_test.go— regenerated mockTest plan
go build ./...passesgo test -tags it ./internal/storage/ledger ./internal/storage/driver(requires Postgres)