Skip to content

Conversation

@jorben
Copy link
Collaborator

@jorben jorben commented Feb 10, 2026

Summary

  • Fix a bug where ConverterWorker fails to process pages after PDF splitting because orphaned TaskDetail records from terminal tasks (FAILED, CANCELLED, etc.) exhaust claimPage() retry attempts
  • Refactor claimPage() to pre-filter active tasks before searching for PENDING pages
  • Add cleanup logic in WorkerOrchestrator to mark orphaned PENDING pages as FAILED on startup
  • Improve test connection endpoint with realistic image and token limit

Changes

Core Fix — ConverterWorker.claimPage()

  • Before: Query all PENDING TaskDetail records first, then verify parent Task status one-by-one. If 5+ orphaned records exist from terminal tasks, all maxAttempts are exhausted and valid pages are never reached.
  • After: Pre-query Task table for active tasks (PROCESSING/COMPLETED), then only search for PENDING pages within those task IDs. Orphaned records are completely bypassed.

Startup Cleanup — WorkerOrchestrator.cleanupOrphanedWork()

  • Add a new cleanup step that finds tasks in terminal states (CREATED, FAILED, CANCELLED, COMPLETED, PARTIAL_FAILED) and marks their remaining PENDING TaskDetail records as FAILED
  • This prevents orphaned pages from accumulating over time
  • Update CleanupResult interface with new orphanedPendingPages field

Minor — completion.handler.ts

  • Use a meaningful 32×32 image instead of 1×1 pixel for connection testing
  • Add maxTokens: 8 cap to limit cost during test connection
  • Update prompt for better vision capability testing

Test Plan

  • All 759 tests pass (35 test files)
  • TypeScript compilation clean (tsc --noEmit)
  • New tests for claimPage(): no active tasks, search within active task IDs
  • New test for orphaned PENDING page cleanup in cleanupOrphanedWork()
  • Updated existing tests to match refactored query strategy

Refs #19

🤖 Generated with Claude Code

jorben and others added 2 commits February 10, 2026 23:30
Orphaned TaskDetail records (PENDING status) from tasks in terminal
states (FAILED, CANCELLED, etc.) were causing ConverterWorker's
claimPage() to exhaust its maxAttempts without finding valid work.

- Refactor claimPage() to pre-filter active tasks (PROCESSING/COMPLETED)
  before searching for PENDING pages, eliminating orphan interference
- Add cleanup step in WorkerOrchestrator.cleanupOrphanedWork() to mark
  PENDING pages from terminal tasks as FAILED on startup
- Update CleanupResult interface with orphanedPendingPages field

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…oken limit

Use a meaningful 32x32 image instead of 1x1 pixel for connection test,
add maxTokens cap to limit cost, and update prompt for vision testing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jorben jorben merged commit 00aa1d5 into master Feb 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant