Low-latency pitch detection (phases 1-4)#29
Conversation
Remove decimation (was 2x at 44.1kHz, 4x at 96kHz+), reducing effective window from 4096 to 1024 samples. Analysis now runs at native sample rate with hop-based triggering every 128 samples. First pitch estimate: ~93ms -> ~23ms Update interval: ~93ms -> ~3ms
user1303836
left a comment
There was a problem hiding this comment.
PR #29 Review: Phase 1 - Ring buffer + sliding window + remove decimation
The core architectural change is sound -- replacing the fill-reset buffer with a ring buffer + hop-based analysis is the right approach, and the implementation is clean and minimal. The decimation removal and window reduction achieve the latency goals outlined in the plan. Good test coverage for the new latency properties.
Must fix (2 issues)
1. Window size doesn't scale with sample rate -- low-frequency detection regression.
With windowSize hardcoded to 1024 and decimation removed, halfWindow = 512 sets the maximum detectable period. At 44.1kHz, the lowest detectable frequency is ~86Hz, which already excludes E2 (82.4Hz, period ~535 samples). At 96kHz, E2's period is ~1164 samples -- far beyond halfWindow. The old decimated approach handled this implicitly. The window (and hop) size should scale with sample rate to maintain consistent frequency coverage. Alternatively, bump the base window to 1200 as the plan suggests for the 44.1k case, and scale from there.
2. sampleCount integer overflow on long sessions.
At 96kHz, a 32-bit int overflows in ~6.2 hours. After overflow, the detector stops producing results. Replace with a bool windowFilled flag that latches on first fill.
Suggestions (non-blocking, 2 items)
- Hop interval test is fragile: relies on float jitter between consecutive analyses of the same sine to detect updates. Consider feeding a changing signal instead.
- Relock test name/bound mismatch: the test name implies fast relock but the bound allows a full window refill. Consider renaming or tightening.
Phase 2 note
The linearBuffer copy in analyse() will become redundant once the FFT buffer is introduced. Not an issue now but worth keeping in mind to avoid double-copying.
Summary
The two must-fix items are the sample-rate-dependent window sizing (correctness regression for low frequencies and high sample rates) and the sampleCount overflow (silent failure after hours of use). Both are straightforward to address. The rest of the implementation is solid and well-aligned with the latency analysis plan.
source/dsp/YinPitchDetector.h
Outdated
| int writePos = 0; | ||
| bool bufferFull = false; | ||
| int hopCounter = 0; | ||
| int sampleCount = 0; |
There was a problem hiding this comment.
sampleCount will overflow on long sessions.
sampleCount is an int that increments every sample and is never reset (except via prepare()). At 96kHz, a 32-bit int overflows in ~6.2 hours. After overflow, the sampleCount >= windowSize guard in feedSample() will fail, and the detector silently stops producing results until the next prepare() call.
This is a real-world scenario -- DAW sessions routinely run for hours.
Two options:
- Add a
bool windowFilled = false;flag that gets set oncesampleCount >= windowSize, then use that in the guard. This is the simplest fix and eliminates the counter entirely after the initial fill. - Cap
sampleCountatwindowSizeso it stops incrementing:if (sampleCount < windowSize) ++sampleCount;
Option 1 is cleaner since sampleCount would only exist to track the initial fill.
|
|
||
| // Step 1: Difference function | ||
| for (int i = 0; i < windowSize; ++i) | ||
| linearBuffer[static_cast<size_t>(i)] = buffer[static_cast<size_t>((writePos + i) % windowSize)]; |
There was a problem hiding this comment.
Ring buffer linearization copies the full window every hop -- consider whether this matters for Phase 2.
Right now this copies 1024 floats every 128 samples, which is fine. But worth noting: when Phase 2 adds FFT-accelerated autocorrelation, you'll need to copy data into the FFT buffer anyway. At that point this linearBuffer copy becomes redundant -- the FFT input buffer serves the same purpose.
Not a blocker for this phase, just flagging so Phase 2 doesn't end up doing two copies.
| decimationAccum = 0.0f; | ||
| analysisSR = sampleRate / decimation; | ||
| analysisSR = sampleRate; | ||
|
|
There was a problem hiding this comment.
Hardcoded windowSize = 1024 ignores sample rate -- E2 detection will fail at 96kHz.
At 44.1kHz, 1024 samples = ~23ms, which covers ~1.9 periods of E2 (82.4Hz). That's tight but workable with parabolic interpolation.
At 96kHz, 1024 samples = ~10.7ms, which covers only ~0.88 periods of E2. The YIN algorithm needs at least 2 periods in the analysis window (halfWindow = 512 samples = ~5.3ms at 96kHz = 0.44 periods). This is fundamentally insufficient -- YIN cannot find a valid tau for E2 because the period (~1164 samples at 96kHz) exceeds halfWindow.
The old code handled this via decimation (effectively analyzing at 24kHz at 96k SR). Since decimation is removed, the window size should scale with sample rate to maintain the same frequency coverage:
windowSize = static_cast<int>(sampleRate / 44100.0 * 1024);This gives 1024 at 44.1k, 1115 at 48k, 2227 at 96k -- preserving the ~23ms analysis window regardless of sample rate.
The hopSize should probably scale proportionally too to maintain the same ~3ms update interval.
tests/TestYinPitchDetector.cpp
Outdated
|
|
||
| for (int i = 0; i < 512; ++i) | ||
| { | ||
| float sample = static_cast<float>(std::sin(phase)); | ||
| yin.feedSample(sample); | ||
| phase += inc; | ||
|
|
||
| auto r = yin.getResult(); | ||
| if (r.frequency != prevFreq) | ||
| { | ||
| ++updateCount; |
There was a problem hiding this comment.
The "hop interval updates" test has a fragile detection mechanism.
This test checks that r.frequency != prevFreq to count updates. But on a continuous 440Hz sine, the detected frequency may be identical across consecutive analyses (YIN is deterministic on the same signal). The frequency only drifts slightly as the phase alignment of the window shifts with each hop.
The test passes now because floating-point differences in window alignment cause tiny frequency variations. But this is brittle -- a future improvement to detection stability (which is the goal of Phase 3's smoother work) could make consecutive results identical, causing this test to fail.
A more robust approach: feed a signal that actually changes (e.g., sweep from 440 to 880 Hz over 512 samples) and verify the output tracks the change. That way the test genuinely validates that updates are arriving, rather than relying on jitter.
| REQUIRE(result.confidence >= 0.0f); | ||
| REQUIRE(result.confidence <= 1.0f); | ||
| } | ||
|
|
There was a problem hiding this comment.
Missing test: pitch accuracy at the low-frequency boundary (E2 ~82Hz).
The latency analysis specifically calls out E2 (82.4Hz) as marginal with a 1024 window at 44.1kHz. With halfWindow = 512, the maximum detectable period is 512 samples, which corresponds to 44100/512 = ~86Hz. E2 at 82.4Hz has a period of ~535 samples -- this exceeds halfWindow and will not be detected.
This is a correctness regression from the old code. The old decimated window (2048 at ~22050 Hz effective SR) had halfWindow = 1024 decimated samples, supporting down to ~21.5Hz.
A test for this would confirm the regression:
TEST_CASE("YIN: detects E2 (82.4 Hz) at 44100 Hz")
{
YinPitchDetector yin;
yin.prepare(44100.0);
feedSine(yin, 44100.0, 82.4f, 44100);
auto result = yin.getResult();
REQUIRE(result.frequency > 0.0f);
REQUIRE_THAT(static_cast<double>(result.frequency),
Catch::Matchers::WithinRel(82.4, 0.03));
}This ties directly to the window size comment on YinPitchDetector.cpp -- the window likely needs to be larger (the plan mentions 1200 as a workable alternative).
| REQUIRE(yin.getResult().frequency == 0.0f); | ||
|
|
||
| int samplesNeeded = feedSineUntilDetection(yin, 44100.0, 440.0f, 44100); | ||
|
|
There was a problem hiding this comment.
Relock test bound of 1024 + 128 may be misleading.
After 2048 silence samples, the ring buffer is full of zeros and analyses have been running every 128 samples (all returning 0Hz). When the sine starts, the next analysis fires within 128 samples, but that analysis window is ~87% zeros and ~13% sine. It likely won't detect pitch yet.
The test allows up to 1152 samples for detection. In reality, the window needs to be predominantly signal before YIN can find a valid tau. At 440Hz (period ~100 samples at 44.1k), you'd need several full periods in the window. The bound works but doesn't really test "fast relock" -- it tests "eventual detection within one full window refill."
Consider tightening the name or the bound to more precisely describe what's being verified.
- Replace sampleCount int with windowFilled bool flag to prevent overflow on long sessions - Scale windowSize and hopSize with sample rate (halfWindow = ceil(sampleRate/70), hopSize = ceil(sampleRate*0.003)) to guarantee E2 detection at all sample rates - Add E2 (82.4 Hz) detection test - Replace fragile hop-interval test with frequency sweep test - Rename and tighten bounds on latency/relock tests to use computed window size
user1303836
left a comment
There was a problem hiding this comment.
Re-review: All issues addressed -- LGTM
All items from the first review are resolved in fcb587d:
1. sampleCount overflow -- Replaced with bool windowFilled that latches once on first buffer wrap. No unbounded counter.
2. Window/hop scale with sample rate -- halfWindow = ceil(sampleRate / 70.0) guarantees E2 (82.4Hz) detection at all sample rates. The 70Hz floor gives comfortable margin (E2 period ~535 samples at 44.1kHz vs halfWindow=631). First-result latency is a consistent ~28.6ms across sample rates. hopSize = ceil(sampleRate * 0.003) maintains ~3ms update interval.
3. E2 test added -- Confirms no regression at the low-frequency boundary.
4. Fragile hop test replaced -- Frequency sweep test (440 -> 880Hz) is a robust way to verify hop-based updates are arriving.
5. Relock test renamed -- "detects pitch change after silence" with computed bounds.
The feedSample fill logic is clean: during initial fill, hopCounter accumulates but analysis is gated by windowFilled. On the exact wrap, it latches, falls through to the hop check (which passes since hopCounter == windowSize >> hopSize), and fires the first analysis immediately. Subsequent analyses fire at hopSize intervals.
No remaining issues. Ready to merge. Phase 2 (FFT-accelerated difference function) can build directly on this.
…tion Compute YIN difference function via FFT cross-correlation between the first half-window and full window, replacing the O(N^2) nested loop with 2 forward FFTs + 1 inverse FFT (O(N log N)). Uses JUCE's performRealOnlyForwardTransform/InverseTransform with separate buffers for the first-half and full-signal spectra, then combines via conj(A)*B cross-spectrum before inverse transform. Power terms still computed incrementally for the exact YIN formula. Tests link against juce_dsp via juce_add_console_app.
user1303836
left a comment
There was a problem hiding this comment.
Phase 2 Review: FFT-accelerated difference function -- LGTM
Clean implementation. The O(N^2) nested loop is correctly replaced with FFT-based cross-correlation.
Math verification
The YIN difference function d(tau) = sum (x[j] - x[j+tau])^2 decomposes into powerTerm0 + powerTermTau - 2 * r_cross(tau). The code computes r_cross via:
- Forward FFT of first half-window
x[0..n-1](zero-padded) -> A - Forward FFT of full window
x[0..2n-1](zero-padded) -> B - Cross-spectrum
conj(A) * B-- conjugate multiply is correct (Re: aRe*bRe + aIm*bIm,Im: aRe*bIm - aIm*bRe) - Inverse FFT -> cross-correlation values at each lag
Zero-padding is sufficient: fftSize = 2^ceil(log2(2*windowSize)) >= 4n > 3n-1 (minimum for linear cross-correlation). No circular aliasing.
Power term update is correct: sliding window powerTermTau += x[n+tau-1]^2 - x[tau-1]^2 maintains the shifted energy incrementally. Max linearBuffer index is 2n-2, within the windowSize=2n bound.
CMNDF/threshold/parabolic interpolation: unchanged from the original. Only the difference function computation was swapped.
CMake changes
Switching to juce_add_console_app is the right call -- YinPitchDetector.h now includes <juce_dsp/juce_dsp.h>, requiring JUCE module infrastructure. juce_add_console_app doesn't provide its own main(), so no conflict with Catch2. Standard JUCE_WEB_BROWSER=0/JUCE_USE_CURL=0 boilerplate to avoid transitive deps.
No issues found
This is a textbook YIN FFT optimization (matches the aubio yinfft approach). The algorithm produces identical results -- validated by existing pitch accuracy tests. No new tests needed for a pure performance swap.
Ready for Phase 3 (PitchSmoother fast-attack/slow-release).
Pitch changes larger than ~1 semitone (0.08 in log2 space) now snap immediately, eliminating audible glides on note transitions. Small changes within a semitone still use the configured smoothing amount for jitter reduction on sustained notes. Reduce max smoothing tau from 200ms to 50ms (0.2f -> 0.05f) since the detector now updates every ~3ms instead of ~93ms. Update existing tests for within-semitone smoothing behavior and add tests for instant note-change lock, confidence-gap hold, and sustained-note stability.
user1303836
left a comment
There was a problem hiding this comment.
Phase 3 Review: PitchSmoother fast-attack/slow-release -- LGTM
Minimal, correct change. 3 lines added in process(), 1 constant changed in recomputeAlpha().
Code changes
Fast-attack logic: delta > 0.08f ? 1.0f : alpha -- when pitch jumps more than ~1 semitone (0.08 in log2; 1 semitone = 0.0833), the smoother snaps instantly. Within a semitone, the configured smoothing applies. This correctly handles:
- Note transitions: instant lock, no audible glide
- Sustained notes with jitter: smoothed out
- Vibrato (~50 cents = 0.042 in log2): smoothed, not tracked as note changes
The confidence gating is implicitly handled -- low-confidence readings never reach the delta check due to the existing early return at the top of process().
Reduced tau: 0.2f -> 0.05f reduces max smoothing from 200ms to 50ms. Appropriate since updates now arrive every ~3ms instead of ~93ms.
Test adaptations
The two existing tests that used octave jumps (220->440Hz) were correctly identified as broken by the new behavior (delta = 1.0 >> 0.08 would trigger instant lock, defeating the smoothing test). Smart fixes:
- "smoothing slows convergence" -> "smoothing slows convergence within a semitone": 440->443Hz (delta ~0.012), 100 iterations. Fast (0.1) converges ~36% vs slow (0.9) at ~5%. Test validates within-semitone smoothing differentiation.
- "log-frequency domain" -> "smoothing operates in log-frequency domain": 440->445Hz (delta ~0.016), 100 iterations. Verifies partial convergence in the smoothing regime.
New tests
All three tests from the plan are present:
- Instant lock on note change: 440->880Hz at max smoothing, verifies output = 880 within 0.1Hz on the very next sample.
- Holds during confidence gaps: Alternates (440, 1.0) and (0, 0), verifies output holds at 440 during gaps.
- Stable output on sustained note: 1000 iterations of 440Hz, verifies convergence to within 0.01Hz.
No issues found
The implementation matches the plan exactly. The threshold at 0.08 log2 (~0.96 semitones) is a reasonable boundary between "jitter to smooth" and "new note to snap to." No remaining concerns. Ready for Phase 4 integration verification.
Summary
Overhaul pitch detection pipeline to reduce latency from ~93ms to ~23ms and improve note-tracking responsiveness.
Phase 1: Ring buffer + sliding window + remove decimation
halfWindow = ceil(sampleRate / 70.0), ensuring detection down to ~70Hz at any rateceil(sampleRate * 0.003)samples (~3ms) after initial window fillPhase 2: FFT-accelerated difference function
performRealOnlyForwardTransformon first-half and full-signal separately, thenconj(A)*Bcross-spectrum and inverse FFTC(tau) = sum_{j=0}^{n-1} x[j]*x[j+tau]via cross-correlation, not the simpler (but incorrect for low frequencies) full-signal autocorrelationjuce_dspviajuce_add_console_appPhase 3: PitchSmoother fast-attack/slow-release
Phase 4: Integration verification
Impact
Files changed
source/dsp/YinPitchDetector.hsource/dsp/YinPitchDetector.cppsource/dsp/PitchSmoother.htests/TestYinPitchDetector.cpptests/TestPitchSmoother.cpptests/CMakeLists.txtTest plan