-
Notifications
You must be signed in to change notification settings - Fork 146
fix(storage): share aloneInBucket flag per bucket via atomic.Bool #1254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…Factory The previous subquery-based approach in newScopedSelect executed a COUNT(*) on _system.ledgers for every scoped query. Replace it with a cached *atomic.Bool per bucket, shared across all stores through the DefaultFactory. When a new ledger is created, SetAloneInBucket(false) immediately propagates to every store in that bucket, ensuring the WHERE ledger=? predicate is never incorrectly skipped. Also adds CountLedgersInBucket to the system store interface so CreateLedger and OpenLedger can seed the flag correctly.
WalkthroughThis change introduces a per-bucket optimization flag to track whether a ledger is alone in its bucket. The implementation counts ledgers in each bucket during creation and opening, propagates the flag through driver and factory layers, and leverages it to simplify query optimization logic. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Driver as Driver Layer
participant Factory as Ledger Factory
participant SystemStore as System Store
participant Database as Database
rect rgba(100, 150, 200, 0.5)
note over Client,Database: Ledger Creation/Opening Flow
Client->>Driver: CreateLedger() or OpenLedger()
Driver->>SystemStore: CountLedgersInBucket(bucket)
SystemStore->>Database: Query Ledger count by bucket
Database-->>SystemStore: Count result
SystemStore-->>Driver: Return count
Driver->>Factory: Create store & set flag
Factory->>Factory: Acquire mutex
Factory->>Factory: Get/create per-bucket atomic flag
Factory->>Factory: Release mutex
Factory->>Factory: Assign atomic flag to store.aloneInBucket
Factory-->>Driver: Store with flag set
Driver-->>Client: Return ledger store
rect rgba(150, 200, 150, 0.5)
note over Factory: Query Optimization Later
Client->>Factory: Query operations on store
Factory->>Factory: Check store.aloneInBucket
alt Alone in bucket (true)
Factory->>Database: Query without ledger filter
else Not alone (false/nil)
Factory->>Database: Query with ledger = ? filter
end
Database-->>Factory: Results
Factory-->>Client: Return data
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@internal/storage/driver/driver.go`:
- Around line 93-112: OpenLedger currently reads the ledger count then calls
store.SetAloneInBucket(count == 1), which can race with concurrent CreateLedger
and mistakenly set the shared alone flag back to true; change OpenLedger to
never set the flag to true — only clear it when you observe count > 1 (i.e.,
call SetAloneInBucket(false) when count > 1) or rely on CreateLedger/factory to
initialize true; alternatively make SetAloneInBucket one-directional (only allow
true→false, never false→true) so OpenLedger cannot revert a concurrent false to
true; update references in OpenLedger and ensure CreateLedger remains the only
path that can set true.
| func (d *Driver) OpenLedger(ctx context.Context, name string) (*ledgerstore.Store, *ledger.Ledger, error) { | ||
| // todo: keep the ledger in cache somewhere to avoid read the ledger at each request, maybe in the factory | ||
| ret, err := d.systemStoreFactory.Create(d.db).GetLedger(ctx, name) | ||
| // NOTE: the aloneInBucket flag is now shared per bucket via the Factory, | ||
| // so all stores in the same bucket see updates immediately. | ||
| systemStore := d.systemStoreFactory.Create(d.db) | ||
|
|
||
| ret, err := systemStore.GetLedger(ctx, name) | ||
| if err != nil { | ||
| return nil, nil, err | ||
| } | ||
|
|
||
| count, err := systemStore.CountLedgersInBucket(ctx, ret.Bucket) | ||
| if err != nil { | ||
| return nil, nil, fmt.Errorf("counting ledgers in bucket: %w", err) | ||
| } | ||
|
|
||
| store := d.ledgerStoreFactory.Create(d.bucketFactory.Create(ret.Bucket), *ret) | ||
| store.SetAloneInBucket(count == 1) | ||
|
|
||
| return store, ret, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TOCTOU race: OpenLedger can overwrite a concurrent CreateLedger's flag update.
OpenLedger reads the count outside a transaction and then calls SetAloneInBucket(count == 1). If a concurrent CreateLedger commits between the count read and the Store call, the sequence can be:
OpenLedgerreadscount=1(2nd ledger not yet committed)CreateLedgercommits → sets shared flag tofalseOpenLedgercallsSetAloneInBucket(true)→ overwrites back totrue
All stores in that bucket now incorrectly skip WHERE ledger = ?, potentially returning data from the wrong ledger.
Consider making the flag transition one-directional: SetAloneInBucket should only allow true → false, never false → true, or use CompareAndSwap. The false → true transition should only happen in a controlled path (e.g., ledger deletion, if applicable), not in OpenLedger.
Proposed fix in store.go
func (store *Store) SetAloneInBucket(alone bool) {
if store.aloneInBucket != nil {
- store.aloneInBucket.Store(alone)
+ if alone {
+ // Only upgrade from false→true is unsafe from OpenLedger;
+ // use CAS to avoid overwriting a concurrent false.
+ store.aloneInBucket.CompareAndSwap(false, true)
+ } else {
+ store.aloneInBucket.Store(false)
+ }
}
}Actually, even the CAS above doesn't fully resolve this—it still races. A safer approach: OpenLedger should never set the flag to true; it should only set it to false when count > 1. Leave the true initialization to CreateLedger (which runs transactionally) or to the factory's lazy init.
Alternative: only propagate false from OpenLedger
store := d.ledgerStoreFactory.Create(d.bucketFactory.Create(ret.Bucket), *ret)
- store.SetAloneInBucket(count == 1)
+ if count > 1 {
+ store.SetAloneInBucket(false)
+ }
return store, ret, err📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (d *Driver) OpenLedger(ctx context.Context, name string) (*ledgerstore.Store, *ledger.Ledger, error) { | |
| // todo: keep the ledger in cache somewhere to avoid read the ledger at each request, maybe in the factory | |
| ret, err := d.systemStoreFactory.Create(d.db).GetLedger(ctx, name) | |
| // NOTE: the aloneInBucket flag is now shared per bucket via the Factory, | |
| // so all stores in the same bucket see updates immediately. | |
| systemStore := d.systemStoreFactory.Create(d.db) | |
| ret, err := systemStore.GetLedger(ctx, name) | |
| if err != nil { | |
| return nil, nil, err | |
| } | |
| count, err := systemStore.CountLedgersInBucket(ctx, ret.Bucket) | |
| if err != nil { | |
| return nil, nil, fmt.Errorf("counting ledgers in bucket: %w", err) | |
| } | |
| store := d.ledgerStoreFactory.Create(d.bucketFactory.Create(ret.Bucket), *ret) | |
| store.SetAloneInBucket(count == 1) | |
| return store, ret, err | |
| func (d *Driver) OpenLedger(ctx context.Context, name string) (*ledgerstore.Store, *ledger.Ledger, error) { | |
| // todo: keep the ledger in cache somewhere to avoid read the ledger at each request, maybe in the factory | |
| // NOTE: the aloneInBucket flag is now shared per bucket via the Factory, | |
| // so all stores in the same bucket see updates immediately. | |
| systemStore := d.systemStoreFactory.Create(d.db) | |
| ret, err := systemStore.GetLedger(ctx, name) | |
| if err != nil { | |
| return nil, nil, err | |
| } | |
| count, err := systemStore.CountLedgersInBucket(ctx, ret.Bucket) | |
| if err != nil { | |
| return nil, nil, fmt.Errorf("counting ledgers in bucket: %w", err) | |
| } | |
| store := d.ledgerStoreFactory.Create(d.bucketFactory.Create(ret.Bucket), *ret) | |
| if count > 1 { | |
| store.SetAloneInBucket(false) | |
| } | |
| return store, ret, err | |
| } |
🤖 Prompt for AI Agents
In `@internal/storage/driver/driver.go` around lines 93 - 112, OpenLedger
currently reads the ledger count then calls store.SetAloneInBucket(count == 1),
which can race with concurrent CreateLedger and mistakenly set the shared alone
flag back to true; change OpenLedger to never set the flag to true — only clear
it when you observe count > 1 (i.e., call SetAloneInBucket(false) when count >
1) or rely on CreateLedger/factory to initialize true; alternatively make
SetAloneInBucket one-directional (only allow true→false, never false→true) so
OpenLedger cannot revert a concurrent false to true; update references in
OpenLedger and ensure CreateLedger remains the only path that can set true.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## release/v2.3 #1254 +/- ##
================================================
- Coverage 81.78% 81.69% -0.10%
================================================
Files 187 187
Lines 9059 9078 +19
================================================
+ Hits 7409 7416 +7
- Misses 1215 1222 +7
- Partials 435 440 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
newScopedSelectwith a shared*atomic.Boolper bucket, managed by theDefaultFactorySetAloneInBucket(false)now propagates immediately to all existing stores in that bucketCountLedgersInBucketto the system store interface soCreateLedgerandOpenLedgercan seed the flag correctlyRoot cause
The previous subquery-based approach (
SELECT count = 1 FROM _system.ledgers ...) innewScopedSelectadded overhead to every query. More critically, when stores were cached, a per-store boolean could miss bucket membership changes: existing stores kept believing they were alone and skipped theWHERE ledger = ?predicate, causing cross-ledger data leaks (e.g.TestAccountsListfailures).Changes
internal/storage/ledger/store.go:aloneInBucket *atomic.Boolfield +SetAloneInBucket/newScopedSelectusing atomic loadinternal/storage/ledger/factory.go:bucketFlags map[string]*atomic.Boolshared across all stores of the same bucketinternal/storage/driver/driver.go:CreateLedgerandOpenLedgercallCountLedgersInBucketandSetAloneInBucketinternal/storage/system/store.go: newCountLedgersInBucketmethod on interface + implementationinternal/storage/driver/store.go:CountLedgersInBucketadded to localSystemStoreinterfaceinternal/storage/driver/store_generated_test.go: mock updated for new methodTest plan
earthly -P +testspasses (all unit, integration, e2e, stress tests)TestAccountsListno longer flaky with multiple ledgers in the same bucket