Skip to content

fix: add mutex to Cloudability emitter for concurrent Emit() safety#154

Open
peatey wants to merge 3 commits intodevelopfrom
peatey/cldy-emitter-mutex
Open

fix: add mutex to Cloudability emitter for concurrent Emit() safety#154
peatey wants to merge 3 commits intodevelopfrom
peatey/cldy-emitter-mutex

Conversation

@peatey
Copy link
Copy Markdown
Contributor

@peatey peatey commented Feb 24, 2026

Adds sync.Mutex to the Cloudability Emitter struct to protect mutable state (sampleCt, currentSamplePath, nextSamplePath, lastEmission) from concurrent access when Emit() is called from overlapping goroutines in the export loop.

Also protects ClearOldScratchSamples() which calls Uploader.RemoveSample() and races with AddSample() in Emit().

Follow-up to #151. Minimal change: one import, one struct field, three lines in two methods.

Init() and ID() are not locked — Init() runs sequentially before the loop, ID() returns a constant.

Includes concurrency test that validates under -race.

Adds sync.Mutex to the Cloudability Emitter struct to protect mutable state
(sampleCt, currentSamplePath, nextSamplePath, lastEmission) from concurrent
access when Emit() is called from overlapping goroutines in the export loop.

Also protects ClearOldScratchSamples() which calls Uploader.RemoveSample() and
races with AddSample() in Emit().

Follow-up to #151. Minimal change: one import, one struct field, three lines in two methods.

Init() and ID() are not locked — Init() runs sequentially before the loop,
ID() returns a constant.

Includes concurrency test that validates under -race.
Split ClearOldScratchSamples into public (locked) and private (unlocked)
methods to prevent deadlock when called from within Emit()'s lock scope.

The call chain Emit() -> writeStatsData() -> writeStatsFile() ->
ClearOldScratchSamples() would previously deadlock because sync.Mutex
is not reentrant in Go.

Now writeStatsFile() calls clearOldScratchSamplesLocked() which assumes
the caller already holds the lock, while external callers use the public
ClearOldScratchSamples() which acquires the lock.
Copy link
Copy Markdown
Collaborator

@daniel-spray daniel-spray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you! Let's run it on uw2p-akp-j1 for 24-48hrs just to ensure we don't find anything unexpected :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants