Skip to content

fix(typescript-client): quarantine aborted responses#4115

Open
KyleAMathews wants to merge 7 commits intomainfrom
fix/abort-race-error-state-logging
Open

fix(typescript-client): quarantine aborted responses#4115
KyleAMathews wants to merge 7 commits intomainfrom
fix/abort-race-error-state-logging

Conversation

@KyleAMathews
Copy link
Copy Markdown
Contributor

@KyleAMathews KyleAMathews commented Apr 11, 2026

Summary

Fixes one aborted-request race in ShapeStream and adds much denser generation-aware diagnostics for the remaining field failures we are still chasing.

This matters for desktop runtimes too, including Tauri/WebView, where abort semantics can be less strict than a normal browser tab. When a late or parallel request generation collides with a newer one, the stream can already be in ErrorState, producing repeated:

[Electric] Response was ignored by state "error"

and silently dropping updates until a hard refresh.

This PR also adds an opt-in client diagnostics mode that can be enabled from localStorage, refreshed, and used to capture denser request / response / state logs in the field without changing app code.

It now also reduces onError -> {} retry churn substantially:

  • the bounded restart loop is capped at 3 consecutive retries instead of 50
  • every retry restart emits a visible console.log with the full triggering error object attached
  • generation ids and first-cause error metadata are carried through the warnings so we can separate the primary failure from the later fallout

Root Cause

ShapeStream assumed that once a request signal was aborted, that request could no longer deliver a successful response into the state machine.

That assumption was not fully enforced in the fetch wrapper chain.

So the client could hit this sequence:

  1. request generation A starts
  2. generation A is aborted logically (pause/resume, refresh, lifecycle race, etc.)
  3. generation B becomes active
  4. generation B fails and moves the stream into ErrorState
  5. generation A resolves late with a 200
  6. the late response is processed and then ignored by ErrorState

Separately, once an app-specific onError handler returned {} to request a restart, the client would restart immediately. Because 4xx responses other than 429 bypass transport backoff, a persistent 400 / malformed 200 path could churn quickly.

Further investigation showed that we also need better evidence for other possible failure shapes, especially:

  • stale generations surfacing errors after a newer generation is already active
  • parallel emissions (response / messages / sseClose) landing after the stream already entered ErrorState
  • ordinary real errors happening first, with the ignored-response warnings only being secondary fallout

This PR now instruments those paths directly.

Fix

  • Re-check signal.aborted after fetch resolution in createFetchWithBackoff
  • Re-check signal.aborted before and after body consumption in createFetchWithConsumedMessages
  • Convert those late successes into aborts before they can emit state-machine events
  • Add a regression test covering pause/abort followed by a late successful response
  • Update SPEC.md to document the transport invariant: aborted requests are quarantined before state-machine delivery
  • Add runtime diagnostics so field reports include the state/url/handle/offset/cursor that led to ErrorState
  • Add opt-in verbose client diagnostics via localStorage.setItem('electric.debug', 'true') or localStorage.setItem('debug', 'electric*')
  • Make diagnostics safer for pathological loops by announcing themselves with a visible console.info line and rate-limiting verbose console.debug output
  • Reduce the bounded onError -> {} retry loop from 50 to 3
  • Emit a visible console.log on every onError-driven restart and attach the full error object to that log entry
  • Add generation-aware diagnostics for request:dispatch, response:headers, messages:batch, sse:closed, ignored responses, and ErrorState entry
  • Carry the first causal error context forward (lastErrorGeneration, lastErrorRequestId, lastErrorName) so later warnings can be tied back to the original failure
  • Emit a dedicated warning when a stale request generation surfaces an error after a newer generation is already active
  • Rate-limit visible ignored-response warnings much harder while leaving the detailed per-event generation info in console.debug
  • Update the stream and model-based tests to assert the lower retry ceiling and the revised 409 recovery behavior

Enable Diagnostics

No app code change is required. In the affected client runtime:

localStorage.setItem('electric.debug', 'true')
// or, for debug-package compatibility:
localStorage.setItem('debug', 'electric*')

Then refresh / reload the app.

When diagnostics are enabled, the client now prints one visible console.info line confirming that diagnostics are active. Detailed per-request diagnostics are emitted at console.debug / Verbose level in DevTools, and are rate-limited to avoid overwhelming a tight-looping runtime.

Logs To Expect

With this PR, the app team should now see logs like these when a stream goes bad:

1. Diagnostics mode turned on successfully

[Electric] ShapeStream diagnostics enabled from localStorage["electric.debug"]="true". Detailed per-request logs use console.debug / Verbose level in DevTools. Verbose logs are rate-limited to 50 per 1000ms to avoid overwhelming the runtime.

If that line does not appear after refresh, diagnostics were not enabled early enough for stream construction.

2. First transition into ErrorState

[Electric] Entered error state. state="error" handle="..." offset="..." cursor="..." activeGeneration=7 paused=false connected=... started=true currentUrl="..." shapeKey="..." lastErrorGeneration=7 lastErrorRequestId=19 lastErrorName="FetchError" errorGeneration=7 errorRequestId=19 errorTransport="long-poll" isActiveGenerationError=true previousState="live" errorName="FetchError" errorMessage="..."

This is the most important warning. It tells you:

  • which stream state failed from (previousState)
  • which request generation failed (errorGeneration, errorRequestId, errorTransport)
  • whether that failure came from the active generation or from stale follow-on work (isActiveGenerationError)
  • the currently active generation (activeGeneration)
  • the first causal error metadata retained for later warnings (lastErrorGeneration, lastErrorRequestId, lastErrorName)
  • the exact request URL and stream identity (currentUrl, shapeKey)
  • where it was in the log (handle, offset, cursor)
  • the exact error that pushed the stream into failure (errorName, errorMessage)

3. When the app's onError requests an immediate restart

[Electric] onError requested retry. Restarting stream from current offset. state="error" handle="..." offset="..." cursor="..." activeGeneration=7 currentUrl="..." shapeKey="..." lastErrorGeneration=7 lastErrorRequestId=19 lastErrorName="FetchError" errorGeneration=7 errorRequestId=19 consecutiveErrorRetries=1 FetchError: ...

This is emitted with console.log, and the full Error / FetchError object is attached as the second console argument so DevTools should show the full stack, status, headers, and any parsed body.

4. If a stale request generation surfaces an error after a newer generation is already active

[Electric] A stale request generation surfaced an error after a newer generation was already active. state="error" handle="..." offset="..." cursor="..." activeGeneration=8 lastErrorGeneration=7 lastErrorRequestId=19 lastErrorName="FetchError"

If this appears, the most important fields to capture are the active generation, the stale generation, and the request ids around the first Entered error state warning.

5. If a response is ignored while already in ErrorState

[Electric] Response was ignored by state "error". The response body will be skipped. This may indicate a proxy/CDN caching issue or a client state machine bug. state="error" handle="..." offset="..." cursor="..." activeGeneration=8 currentUrl="..." shapeKey="..." eventGeneration=7 requestId=19 transport="long-poll" isActiveGeneration=false responseHandle="..." responseStatus=200

This warning is now rate-limited much more aggressively so a pathological loop does not bury the useful logs. The per-event generation details are still available in console.debug.

6. Generation-aware debug lines around the failure

[Electric] Debug state="initial" handle=... offset="-1" ... event="request:dispatch" generationId=7 requestId=19 transport="long-poll" fetchUrl="..."
[Electric] Debug state="syncing" handle="..." offset="..." ... event="response:headers" generationId=7 eventGeneration=7 requestId=19 transport="long-poll" isActiveGeneration=true responseStatus=200 responseHandle="..." action="accepted"
[Electric] Debug state="live" handle="..." offset="..." ... event="messages:batch" generationId=7 eventGeneration=7 requestId=19 transport="long-poll" isActiveGeneration=true batchSize=2 publishedCount=2 hasUpToDateMessage=true
[Electric] Debug state="error" handle="..." offset="..." ... event="response:headers" generationId=7 eventGeneration=7 requestId=19 transport="long-poll" isActiveGeneration=false action="ignored-state-error"
[Electric] Debug state="error" handle="..." offset="..." ... event="sse:closed" generationId=7 requestId=19 transport="sse"

These are the most useful lines for distinguishing:

  • a true multi-generation race
  • a legitimate transport / malformed-response failure that happened first
  • stale follow-on emissions after the stream was already poisoned

7. If the app's onError keeps retrying and the client finally gives up

[Electric] onError retry loop exhausted after 3 consecutive retries. The error was never resolved by the onError handler. Error: ... state="error" handle="..." offset="..." cursor="..." currentUrl="..." shapeKey="..."

That is especially relevant for apps that return {} from onError on unknown sync failures.

8. If the runtime is in a pathological loop and verbose diagnostics are being throttled

[Electric] ShapeStream diagnostics suppressed 184 verbose logs in the last 1000ms. The stream is likely in a tight loop or repeated error path.

What To Capture From The Tester

If the user can keep the runtime alive long enough, the most helpful things to screenshot or copy are:

  • the first Entered error state warning, not just the later ignored-response warnings
  • any A stale request generation surfaced an error... warning
  • the request:dispatch, response:headers, messages:batch, and sse:closed debug lines immediately before the first failure
  • any snapshot:pause-acquired, snapshot:fetch:start, snapshot:fetch:409, snapshot:error, or snapshot:pause-released debug lines if the page uses live queries / subsets
  • any Snapshot "snapshot-N" has held the pause lock for 30s warning, especially with the attached subset=... summary
  • whether the diagnostics-enabled console.info line appeared after refresh
  • whether the restart loop stops after 3 retries or keeps recreating entirely new streams

Verification

cd packages/typescript-client
pnpm vitest run --config vitest.unit.config.ts
pnpm exec tsc --noEmit

Both pass (375 unit tests).

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Apr 11, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@electric-sql/react@4115
npm i https://pkg.pr.new/@electric-sql/client@4115
npm i https://pkg.pr.new/@electric-sql/y-electric@4115

commit: 4b36ca6

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 11, 2026

Codecov Report

❌ Patch coverage is 86.61417% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.92%. Comparing base (0a65f8e) to head (4b36ca6).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
packages/typescript-client/src/client.ts 86.17% 32 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4115      +/-   ##
==========================================
- Coverage   89.20%   88.92%   -0.29%     
==========================================
  Files          25       25              
  Lines        2520     2754     +234     
  Branches      640      726      +86     
==========================================
+ Hits         2248     2449     +201     
- Misses        270      301      +31     
- Partials        2        4       +2     
Flag Coverage Δ
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start 82.83% <ø> (ø)
packages/typescript-client 93.35% <86.61%> (-0.96%) ⬇️
packages/y-electric 56.05% <ø> (ø)
typescript 88.92% <86.61%> (-0.29%) ⬇️
unit-tests 88.92% <86.61%> (-0.29%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant