fix: send HTTP 102 keepalive during long-poll to prevent middlebox timeouts#4106
fix: send HTTP 102 keepalive during long-poll to prevent middlebox timeouts#4106
Conversation
…meouts Long-poll requests hold connections idle for up to 20s in a receive block. Network middleboxes (particularly Cloudflare edge nodes on long paths to origin) can drop these idle connections, causing 522 errors. Send periodic HTTP 102 Processing informational responses via Plug.Conn.inform during the hold to keep the connection alive. The 1xx responses are invisible to HTTP clients (they only see the final response), so this requires zero client changes and is fully backwards compatible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4106 +/- ##
=======================================
Coverage 88.67% 88.67%
=======================================
Files 25 25
Lines 2438 2438
Branches 612 611 -1
=======================================
Hits 2162 2162
Misses 274 274
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Claude Code ReviewSummaryThis PR adds periodic HTTP 102 Processing informational responses during long-poll holds to prevent middlebox connection drops. The implementation remains correct and clean; the latest commit ( What's Working Well
Issues FoundNo new issues. Suggestions (Nice to Have)No linked issue — This PR has no linked GitHub issue. Per project convention, PRs should reference the issue they address. Missing integration test for wire-level 102 delivery — Acknowledged in the PR description. The lux Issue ConformanceNo linked issue. The PR description is thorough and self-contained as problem statement and acceptance criteria. Previous Review Status
Review iteration: 4 | 2026-04-09 |
Address review feedback: - Add on_keepalive field to Request @type t() spec for Dialyzer - Document intentional ignore of Plug.Conn.inform/2 return value Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The curl_shape awk script assumed a single status line followed by headers and body. With the new 102 Processing keepalive, curl outputs the 1xx response before the final response, causing awk to treat the blank line after 102 as the header/body separator and pipe the real headers into jq. Skip 1xx informational response blocks entirely so only the final response headers and body are processed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move keepalive timer start into hold_until_change so it only runs when the process actually enters the hold (not for immediate responses) - Separate set_long_poll_keepalive (sets closure, needs conn) from start_keepalive_timer (starts interval, called in hold_until_change) - Clean up comments to reference what exists, not what was removed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Summary
102 Processinginformational responses during long-poll holds to prevent network middleboxes (Cloudflare edge nodes on long paths to origin) from dropping idle connections (522 errors)Problem
Long-poll requests to
/v1/shape?live=truehold connections idle for up to 20 seconds insidehold_until_change. During this time, no bytes flow on the wire. Middleboxes on long network paths (e.g., Cloudflare colos BAH, HKG hitting ALB in us-east-1) can drop these connections, causing sporadic 522 errors.Approach
Uses
Plug.Conn.inform(conn, 102)to send HTTP 1xx informational responses every 5–15 seconds during the hold. These responses:fetch(), Req, and all standard HTTP clients transparently ignore 1xx responses per RFC 9110 §15.2Implementation details
:timer.send_intervalstarts when a live non-SSE request enters the Plug path (serve_shape_response/2orserve_shape_log/2)register_before_sendcallback cancels the timer and flushes stale messages when the response is senton_keepalive) on theRequeststruct, keepingPlug.Connout of the domain structhold_until_changeis split into two functions: the outer sets up aProcess.send_aftertimeout timer (replacingreceive...afterwhich would reset on keepalive re-entry), the innerdo_hold_until_changehandles the receive loop with a new:long_poll_keepaliveclausediv(long_poll_timeout, 4)clamped to 5–15 secondsAlternatives considered
Chunked transfer encoding with whitespace keepalive bytes
Send
Transfer-Encoding: chunkedimmediately and emit" "(space) chunks as keepalive during the hold, then send the real JSON payload as the final chunk.Rejected for two reasons:
Breaks backwards compatibility. HTTP headers are committed when chunked encoding starts, before the hold resolves. The
electric-offsetheader would be stale when new data arrives during the hold. This requires a TypeScript client change (using body-based offset instead of header-based) to avoid an infinite re-fetch loop — breaking old clients on the new server.Breaks CDN request collapsing. Live long-poll responses are cacheable (
Cache-Control: public, max-age=5, stale-while-revalidate=5). With the current non-chunked approach, CDNs like Cloudflare hold the coalescing window open for the full hold duration — multiple clients requesting the same shape+offset are collapsed into one origin request. Starting a chunked response immediately closes this window on Cloudflare and CloudFront because the response is already "in flight." With 102, the coalescing window stays open since 1xx responses are not final responses.Stream.resource pattern (modeled on existing SSE keepalive)
Replace
hold_until_changeentirely with aStream.resourcethat emits keepalive chunks and data chunks as a streaming response body.Rejected because it changes the response semantics: the
Responsestruct fields (offset,up_to_date,status) are frozen at stream creation time rather than set after the hold resolves. This broke 11 existing tests and required changing error behavior (shape rotation, out-of-bounds, stack failure all became 200 with empty body instead of their proper status codes). Also has the same CDN collapsing and backwards-compatibility issues as the chunked approach.Shorter long-poll timeout
Reduce
long_poll_timeoutto well under the middlebox timeout.Rejected because it doubles the request rate for idle shapes without solving the root cause.
On HTTP 102 Processing
102 Processingwas defined in RFC 2518 (WebDAV, 1999) specifically for preventing client timeouts on long-running requests — the exact use case here. It was removed in RFC 4918 (2007) with the stated reason: "due to lack of implementation" — not because it was harmful or architecturally unsound.MDN labels 102 as "deprecated," but this is an editorial judgment, not a standards-body action. No RFC has ever formally deprecated 102. The status is more accurately described as "no longer defined by any active RFC" — a state shared by many widely-used status codes (429 Too Many Requests was also defined outside RFC 9110's predecessor until recently).
Why it's safe to use
Plug.Conn.inform/2; Node.js hasresponse.writeProcessing()since v10; Go added 1xx support in Go 1.19100 Continueis for request bodies,103 Early Hintsis for resource preloading and would confuse browsersKnown limitations
net/httphas a default limit of 5 consecutive 1xx responses. Our 3–4 responses per hold (at 5s intervals over 20s) are within this limit.HttpStatus.PROCESSINGenum constant. This is the only major framework taking action, and it's the constant, not the protocol behavior.CDN request collapsing
Electric's live long-poll responses are cacheable (
Cache-Control: public, max-age=5, stale-while-revalidate=5), which enables CDN request collapsing — multiple clients requesting the same shape+offset are collapsed into a single origin request.The 102 approach preserves this because 1xx informational responses are not final responses. The CDN continues to hold the coalescing window open until the final 200 arrives. This is significant under high concurrency where many clients subscribe to the same shape.
By contrast, the alternative chunked-whitespace approach would close the coalescing window immediately by committing a 200 response before the hold resolves.
Test plan
api_test.exstests pass with no modifications🤖 Generated with Claude Code