fix: prevent worker memory leaks with recycling, memory logging, and Sentry scope clearing by autumnbillinginternalapp[bot] · Pull Request #771 · useautumn/autumn

autumnbillinginternalapp · 2026-02-19T22:50:51Z

What

Prevents the silent worker death we saw today (workers stopped processing SQS messages after hours of uptime due to memory growth).

Changes

Process recycling — workers exit gracefully after 500k messages. The cluster primary automatically respawns them with fresh memory. This is the safety net.
Memory logging — adds rss and heapUsed to the periodic stats log line so we can see memory growth in Axiom and catch leaks before they cause outages.
Sentry scope clearing — calls Sentry.getCurrentScope().clear() after each batch to prevent breadcrumb/tag accumulation from thousands of setSentryTags() calls.

Context

Today at ~09:21 UTC, Worker 52 silently stopped receiving SQS messages. Worker 53 followed at ~11:12 UTC. No errors, no crashes — the workers kept polling but got empty responses for 8 hours until a restart at 19:21 fixed it. ECS metrics showed memory climbing from 33% → 44% max before failure.

Testing

Minimal risk — only adds logging, a scope clear, and a graceful exit path that the existing cluster respawn logic already handles.

Summary by cubic

Prevents worker memory leaks that stalled SQS polling by adding process recycling, memory usage logging, and clearing Sentry scope after batches. Workers now exit and respawn predictably, and memory growth is visible.

Bug Fixes
- Exit with code 0 after 500k messages; cluster primary auto-respawns a fresh worker.
- Log rss, heapUsed, and total messages in periodic stats to catch growth early.
- Clear Sentry scope after each batch to prevent breadcrumb/tag buildup.

^{Written for commit 532b9e8. Summary will update on new commits.}

Greptile Summary

Addresses production memory leak causing workers to silently stop processing SQS messages after hours of uptime by adding three defensive mechanisms.

Bug Fixes
- Added process recycling: workers gracefully exit after 500k messages; cluster primary auto-respawns with fresh memory as safety net against leaks
- Added memory logging: logs rss and heapUsed in periodic stats to catch memory growth early in Axiom before outages occur
- Added Sentry scope clearing: calls Sentry.getCurrentScope().clear() after each batch to prevent breadcrumb/tag accumulation from thousands of setSentryTags() calls

The changes directly respond to the incident at ~09:21 UTC where Worker 52 stopped receiving messages and Worker 53 followed, with ECS metrics showing memory climbing from 33% to 44% before failure. The worker recycling acts as a safety net, while memory logging provides observability, and Sentry scope clearing prevents a known source of memory accumulation.

Confidence Score: 5/5

Safe to merge - implements defensive, low-risk observability and graceful recycling mechanisms
All three changes are defensive additions with minimal risk: process recycling reuses existing cluster respawn logic, memory logging is read-only observability, and Sentry scope clearing is a standard cleanup pattern. No logic changes to message processing.
No files require special attention

Important Files Changed

Filename	Overview
server/src/queue/initWorkers.ts	Added process recycling after 500k messages, memory logging (rss/heap), and Sentry scope clearing to prevent worker memory leaks

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Worker Polls SQS Queue] --> B{Messages Received?}
    B -->|Yes| C[Process Messages in Batch]
    C --> D[Delete Messages from Queue]
    D --> E[Sentry.getCurrentScope.clear]
    E --> F{totalMessagesProcessed >= 500k?}
    F -->|Yes| G[Set isRunning = false]
    G --> H[Break Polling Loop]
    H --> I[Worker Process Exits]
    I --> J[Cluster Primary Detects Exit]
    J --> K[cluster.fork - Spawn New Worker]
    K --> A
    F -->|No| L[Continue Polling]
    L --> A
    B -->|No| M[Handle Empty Poll]
    M --> A

_{Last reviewed commit: eb4b0ad}

…Sentry scope clearing - Add process recycling after 500k messages (cluster primary auto-respawns) - Add memory stats (rss/heap) to periodic log output for leak detection - Clear Sentry scope after each batch to prevent breadcrumb/tag accumulation - Track total messages processed per worker lifetime Context: Workers silently stopped receiving SQS messages after hours of uptime due to memory growth. Memory climbed from 33% to 44% max before workers became unresponsive. Only a full restart recovered them.

vercel · 2026-02-19T22:50:55Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
autumn-vite	Ready	Preview, Comment	Feb 19, 2026 10:59pm

cubic-dev-ai

1 issue found across 1 file

Confidence score: 3/5

There is a concrete lifecycle risk in server/src/queue/initWorkers.ts: recycling stops polling but never exits, so workers can sit idle and not be respawned, undermining the memory-reclaim goal.
This is a medium-severity behavior change (6/10) with user-impacting implications for worker availability, so the merge carries some risk.
Pay close attention to server/src/queue/initWorkers.ts - ensure recycling triggers the shutdown/exit path so workers actually restart.

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="server/src/queue/initWorkers.ts">

<violation number="1" location="server/src/queue/initWorkers.ts:261">
P2: Recycling stops the polling loop but never exits the process, so the worker will sit idle and won’t be respawned. Trigger the existing shutdown path (SIGTERM) or exit the process here so recycling actually frees memory.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

server/src/queue/initWorkers.ts

Addresses review feedback — setting isRunning=false only stops the loop but leaves the process alive and idle. process.exit(0) ensures the cluster primary detects the exit and respawns a fresh worker.

autumnbillinginternalapp bot requested review from ay-rod and johnyeocx as code owners February 19, 2026 22:50

vercel bot deployed to Preview February 19, 2026 22:51 View deployment

cubic-dev-ai bot reviewed Feb 19, 2026

View reviewed changes

server/src/queue/initWorkers.ts Outdated Show resolved Hide resolved

fix: exit process on recycle instead of just breaking loop

532b9e8

Addresses review feedback — setting isRunning=false only stops the loop but leaves the process alive and idle. process.exit(0) ensures the cluster primary detects the exit and respawns a fresh worker.

vercel bot deployed to Preview February 19, 2026 22:59 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix: prevent worker memory leaks with recycling, memory logging, and Sentry scope clearing#771

fix: prevent worker memory leaks with recycling, memory logging, and Sentry scope clearing#771
autumnbillinginternalapp[bot] wants to merge 2 commits intomainfrom
jj/worker-memory-leak-fixes

autumnbillinginternalapp bot commented Feb 19, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

vercel bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Comments

Conversation

autumnbillinginternalapp bot commented Feb 19, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Changes

Context

Testing

Summary by cubic

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

vercel bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

autumnbillinginternalapp bot commented Feb 19, 2026 •

edited by cubic-dev-ai bot

Loading

vercel bot commented Feb 19, 2026 •

edited

Loading