fix(typescript-client): quarantine aborted responses by KyleAMathews · Pull Request #4115 · electric-sql/electric

KyleAMathews · 2026-04-11T04:08:39Z

Summary

Fixes one aborted-request race in ShapeStream and adds much denser generation-aware diagnostics for the remaining field failures we are still chasing.

This matters for desktop runtimes too, including Tauri/WebView, where abort semantics can be less strict than a normal browser tab. When a late or parallel request generation collides with a newer one, the stream can already be in ErrorState, producing repeated:

[Electric] Response was ignored by state "error"

and silently dropping updates until a hard refresh.

This PR also adds an opt-in client diagnostics mode that can be enabled from localStorage, refreshed, and used to capture denser request / response / state logs in the field without changing app code.

It now also reduces onError -> {} retry churn substantially:

the bounded restart loop is capped at 3 consecutive retries instead of 50
every retry restart emits a visible console.log with the full triggering error object attached
generation ids and first-cause error metadata are carried through the warnings so we can separate the primary failure from the later fallout

Root Cause

ShapeStream assumed that once a request signal was aborted, that request could no longer deliver a successful response into the state machine.

That assumption was not fully enforced in the fetch wrapper chain.

So the client could hit this sequence:

request generation A starts
generation A is aborted logically (pause/resume, refresh, lifecycle race, etc.)
generation B becomes active
generation B fails and moves the stream into ErrorState
generation A resolves late with a 200
the late response is processed and then ignored by ErrorState

Separately, once an app-specific onError handler returned {} to request a restart, the client would restart immediately. Because 4xx responses other than 429 bypass transport backoff, a persistent 400 / malformed 200 path could churn quickly.

Further investigation showed that we also need better evidence for other possible failure shapes, especially:

stale generations surfacing errors after a newer generation is already active
parallel emissions (response / messages / sseClose) landing after the stream already entered ErrorState
ordinary real errors happening first, with the ignored-response warnings only being secondary fallout

This PR now instruments those paths directly.

Fix

Re-check signal.aborted after fetch resolution in createFetchWithBackoff
Re-check signal.aborted before and after body consumption in createFetchWithConsumedMessages
Convert those late successes into aborts before they can emit state-machine events
Add a regression test covering pause/abort followed by a late successful response
Update SPEC.md to document the transport invariant: aborted requests are quarantined before state-machine delivery
Add runtime diagnostics so field reports include the state/url/handle/offset/cursor that led to ErrorState
Add opt-in verbose client diagnostics via localStorage.setItem('electric.debug', 'true') or localStorage.setItem('debug', 'electric*')
Make diagnostics safer for pathological loops by announcing themselves with a visible console.info line and rate-limiting verbose console.debug output
Reduce the bounded onError -> {} retry loop from 50 to 3
Emit a visible console.log on every onError-driven restart and attach the full error object to that log entry
Add generation-aware diagnostics for request:dispatch, response:headers, messages:batch, sse:closed, ignored responses, and ErrorState entry
Carry the first causal error context forward (lastErrorGeneration, lastErrorRequestId, lastErrorName) so later warnings can be tied back to the original failure
Emit a dedicated warning when a stale request generation surfaces an error after a newer generation is already active
Rate-limit visible ignored-response warnings much harder while leaving the detailed per-event generation info in console.debug
Update the stream and model-based tests to assert the lower retry ceiling and the revised 409 recovery behavior

Enable Diagnostics

No app code change is required. In the affected client runtime:

localStorage.setItem('electric.debug', 'true')
// or, for debug-package compatibility:
localStorage.setItem('debug', 'electric*')

Then refresh / reload the app.

When diagnostics are enabled, the client now prints one visible console.info line confirming that diagnostics are active. Detailed per-request diagnostics are emitted at console.debug / Verbose level in DevTools, and are rate-limited to avoid overwhelming a tight-looping runtime.

Logs To Expect

With this PR, the app team should now see logs like these when a stream goes bad:

1. Diagnostics mode turned on successfully

[Electric] ShapeStream diagnostics enabled from localStorage["electric.debug"]="true". Detailed per-request logs use console.debug / Verbose level in DevTools. Verbose logs are rate-limited to 50 per 1000ms to avoid overwhelming the runtime.

If that line does not appear after refresh, diagnostics were not enabled early enough for stream construction.

2. First transition into `ErrorState`

[Electric] Entered error state. state="error" handle="..." offset="..." cursor="..." activeGeneration=7 paused=false connected=... started=true currentUrl="..." shapeKey="..." lastErrorGeneration=7 lastErrorRequestId=19 lastErrorName="FetchError" errorGeneration=7 errorRequestId=19 errorTransport="long-poll" isActiveGenerationError=true previousState="live" errorName="FetchError" errorMessage="..."

This is the most important warning. It tells you:

which stream state failed from (previousState)
which request generation failed (errorGeneration, errorRequestId, errorTransport)
whether that failure came from the active generation or from stale follow-on work (isActiveGenerationError)
the currently active generation (activeGeneration)
the first causal error metadata retained for later warnings (lastErrorGeneration, lastErrorRequestId, lastErrorName)
the exact request URL and stream identity (currentUrl, shapeKey)
where it was in the log (handle, offset, cursor)
the exact error that pushed the stream into failure (errorName, errorMessage)

3. When the app's `onError` requests an immediate restart

[Electric] onError requested retry. Restarting stream from current offset. state="error" handle="..." offset="..." cursor="..." activeGeneration=7 currentUrl="..." shapeKey="..." lastErrorGeneration=7 lastErrorRequestId=19 lastErrorName="FetchError" errorGeneration=7 errorRequestId=19 consecutiveErrorRetries=1 FetchError: ...

This is emitted with console.log, and the full Error / FetchError object is attached as the second console argument so DevTools should show the full stack, status, headers, and any parsed body.

4. If a stale request generation surfaces an error after a newer generation is already active

[Electric] A stale request generation surfaced an error after a newer generation was already active. state="error" handle="..." offset="..." cursor="..." activeGeneration=8 lastErrorGeneration=7 lastErrorRequestId=19 lastErrorName="FetchError"

If this appears, the most important fields to capture are the active generation, the stale generation, and the request ids around the first Entered error state warning.

5. If a response is ignored while already in `ErrorState`

[Electric] Response was ignored by state "error". The response body will be skipped. This may indicate a proxy/CDN caching issue or a client state machine bug. state="error" handle="..." offset="..." cursor="..." activeGeneration=8 currentUrl="..." shapeKey="..." eventGeneration=7 requestId=19 transport="long-poll" isActiveGeneration=false responseHandle="..." responseStatus=200

This warning is now rate-limited much more aggressively so a pathological loop does not bury the useful logs. The per-event generation details are still available in console.debug.

6. Generation-aware debug lines around the failure

[Electric] Debug state="initial" handle=... offset="-1" ... event="request:dispatch" generationId=7 requestId=19 transport="long-poll" fetchUrl="..."
[Electric] Debug state="syncing" handle="..." offset="..." ... event="response:headers" generationId=7 eventGeneration=7 requestId=19 transport="long-poll" isActiveGeneration=true responseStatus=200 responseHandle="..." action="accepted"
[Electric] Debug state="live" handle="..." offset="..." ... event="messages:batch" generationId=7 eventGeneration=7 requestId=19 transport="long-poll" isActiveGeneration=true batchSize=2 publishedCount=2 hasUpToDateMessage=true
[Electric] Debug state="error" handle="..." offset="..." ... event="response:headers" generationId=7 eventGeneration=7 requestId=19 transport="long-poll" isActiveGeneration=false action="ignored-state-error"
[Electric] Debug state="error" handle="..." offset="..." ... event="sse:closed" generationId=7 requestId=19 transport="sse"

These are the most useful lines for distinguishing:

a true multi-generation race
a legitimate transport / malformed-response failure that happened first
stale follow-on emissions after the stream was already poisoned

7. If the app's `onError` keeps retrying and the client finally gives up

[Electric] onError retry loop exhausted after 3 consecutive retries. The error was never resolved by the onError handler. Error: ... state="error" handle="..." offset="..." cursor="..." currentUrl="..." shapeKey="..."

That is especially relevant for apps that return {} from onError on unknown sync failures.

8. If the runtime is in a pathological loop and verbose diagnostics are being throttled

[Electric] ShapeStream diagnostics suppressed 184 verbose logs in the last 1000ms. The stream is likely in a tight loop or repeated error path.

What To Capture From The Tester

If the user can keep the runtime alive long enough, the most helpful things to screenshot or copy are:

the first Entered error state warning, not just the later ignored-response warnings
any A stale request generation surfaced an error... warning
the request:dispatch, response:headers, messages:batch, and sse:closed debug lines immediately before the first failure
any snapshot:pause-acquired, snapshot:fetch:start, snapshot:fetch:409, snapshot:error, or snapshot:pause-released debug lines if the page uses live queries / subsets
any Snapshot "snapshot-N" has held the pause lock for 30s warning, especially with the attached subset=... summary
whether the diagnostics-enabled console.info line appeared after refresh
whether the restart loop stops after 3 retries or keeps recreating entirely new streams

Verification

cd packages/typescript-client
pnpm vitest run --config vitest.unit.config.ts
pnpm exec tsc --noEmit

Both pass (375 unit tests).

pkg-pr-new · 2026-04-11T04:09:58Z

Open in StackBlitz

npm i https://pkg.pr.new/@electric-sql/react@4115

npm i https://pkg.pr.new/@electric-sql/client@4115

npm i https://pkg.pr.new/@electric-sql/y-electric@4115

commit: 4b36ca6

codecov · 2026-04-11T04:12:34Z

Codecov Report

❌ Patch coverage is 86.61417% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.92%. Comparing base (0a65f8e) to head (4b36ca6).
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
packages/typescript-client/src/client.ts	86.17%	32 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4115      +/-   ##
==========================================
- Coverage   89.20%   88.92%   -0.29%     
==========================================
  Files          25       25              
  Lines        2520     2754     +234     
  Branches      640      726      +86     
==========================================
+ Hits         2248     2449     +201     
- Misses        270      301      +31     
- Partials        2        4       +2

Flag	Coverage Δ
packages/experimental	`87.73% <ø> (ø)`
packages/react-hooks	`86.48% <ø> (ø)`
packages/start	`82.83% <ø> (ø)`
packages/typescript-client	`93.35% <86.61%> (-0.96%)`	⬇️
packages/y-electric	`56.05% <ø> (ø)`
typescript	`88.92% <86.61%> (-0.29%)`	⬇️
unit-tests	`88.92% <86.61%> (-0.29%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

fix(typescript-client): quarantine aborted responses

ed306c0

KyleAMathews added 6 commits April 11, 2026 09:46

feat(typescript-client): add opt-in stream diagnostics

59d8b89

fix(typescript-client): rate-limit verbose diagnostics

688e96b

fix(typescript-client): throttle repeated stream warnings

1ceb6a4

fix(typescript-client): reduce onError retry churn

66a7e34

feat(typescript-client): add generation diagnostics

3b03b86

feat(typescript-client): log snapshot retry diagnostics

4b36ca6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(typescript-client): quarantine aborted responses#4115

fix(typescript-client): quarantine aborted responses#4115
KyleAMathews wants to merge 7 commits intomainfrom
fix/abort-race-error-state-logging

KyleAMathews commented Apr 11, 2026 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Apr 11, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KyleAMathews commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Fix

Enable Diagnostics

Logs To Expect

1. Diagnostics mode turned on successfully

2. First transition into ErrorState

3. When the app's onError requests an immediate restart

4. If a stale request generation surfaces an error after a newer generation is already active

5. If a response is ignored while already in ErrorState

6. Generation-aware debug lines around the failure

7. If the app's onError keeps retrying and the client finally gives up

8. If the runtime is in a pathological loop and verbose diagnostics are being throttled

What To Capture From The Tester

Verification

Uh oh!

pkg-pr-new bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KyleAMathews commented Apr 11, 2026 •

edited

Loading

2. First transition into `ErrorState`

3. When the app's `onError` requests an immediate restart

5. If a response is ignored while already in `ErrorState`

7. If the app's `onError` keeps retrying and the client finally gives up

pkg-pr-new bot commented Apr 11, 2026 •

edited

Loading

codecov bot commented Apr 11, 2026 •

edited

Loading