fix: handle missing last processed LSN gracefully#4109
Conversation
The LsnTracker ETS table can be empty when a request handler survives a MonitoredCoreSupervisor restart (ShapeStatusOwner owns the table and recreates it empty on restart, while Bandit handlers live in a separate supervision tree). A long-polling request validated during active mode (read_only?: false) could then crash on MatchError when the timeout fires and determine_global_last_seen_lsn calls get_last_processed_lsn. - Make get_last_processed_lsn return nil instead of crashing when the ETS key is absent - determine_global_last_seen_lsn falls back to the shape's own offset when LsnTracker returns nil, removing the need for a separate read_only? clause - Align request.read_only? with runtime status in hold_until_change's after block so downstream functions use correct read strategies - Fix stale flushed_wal in replication client: handle_result was using the pre-query state (flushed_wal=0) instead of the updated state from process_query_result when populating the LsnTracker - Add explicit assertion in shape_log_collector's mark_as_ready for the LsnTracker invariant (must be populated before collector starts) Closes #4107 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Claude Code ReviewSummaryThis PR correctly fixes the What's Working Well
Issues FoundNone. Issue ConformanceIssue #4107 is well-specified with a Sentry stack trace and a clear reproduction hypothesis. The implementation directly addresses the reported crash, and the PR goes further by fixing the root cause of why the LsnTracker value was unreliable in the first place. No scope creep. Previous Review StatusBoth issues from iteration 1 have been resolved:
Review iteration: 2 | 2026-04-09 |
Address Claude review feedback: - Add patch changeset for sync-service - Add unit tests for process_query_result verifying that the updated state returned in the 5-tuple contains the correct flushed_wal from slot creation and slot query results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4109 +/- ##
=======================================
Coverage 88.67% 88.67%
=======================================
Files 25 25
Lines 2438 2438
Branches 612 610 -2
=======================================
Hits 2162 2162
Misses 274 274
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
Fixes #4107 —
MatchError: no match of right hand side value: []inLsnTracker.get_last_processed_lsn/1duringhold_until_change.Root cause
The crash occurs when a long-polling request's timeout fires and
determine_global_last_seen_lsncallsLsnTracker.get_last_processed_lsn, but the ETS key hasn't been populated.The
hold_until_changeafter block enters the read-only branch based on runtime status (status.shape == :read_only), butdetermine_global_last_seen_lsndispatched on the request-time flag (request.read_only?). Whenstatus.shapedegraded to:read_onlyduring the long poll butrequest.read_only?was stillfalse, the non-read-only clause calledLsnTracker.get_last_processed_lsnwhich crashed on the empty ETS table.The LsnTracker data is lost when the
ShapeStatusOwnerprocess (which owns the ETS table) crashes and restarts — it recreates an empty table. Bandit HTTP handler processes survive this because they live in a separate supervision tree. A request validated during active mode (read_only?: false) can then hit the empty table when its long-poll timeout fires.Changes
LsnTracker.get_last_processed_lsn/1returnsnilinstead of crashing when the ETS key is absentdetermine_global_last_seen_lsn/1falls back to the shape's own offset when LsnTracker returnsnil— this is the same value the read-only path used, so the separateread_only?: trueclause was removedhold_until_changeafter block alignsrequest.read_only?with runtime status when entering the read-only branch, so downstream functions (determine_log_chunk_offset,get_merged_log_stream) use correct read strategiesreplication_clienthandle_resultwas using the pre-querystate.flushed_wal(always0) instead of the updated state fromprocess_query_resultwhen populating the LsnTracker during slot creation/query — fixed by returning the updated state fromprocess_query_resultshape_log_collectormark_as_readyadds an explicit assertion for the LsnTracker invariant (must be populated before the collector starts) with a clear error messageTest plan
LsnTracker.get_last_processed_lsnreturnsnilwhen not populated🤖 Generated with Claude Code