-
Notifications
You must be signed in to change notification settings - Fork 12
Fix Orphaned Messages on Run Failure #220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 09-18-chore_improve_verify-schemas-synced_and_regenerate-temp-migration
Are you sure you want to change the base?
Fix Orphaned Messages on Run Failure #220
Conversation
🦋 Changeset detectedLatest commit: 842f929 The changes in this PR will be included in the next version bump. This PR includes changesets to release 7 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. ✨ Finishing touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
select is( | ||
(select count(*)::integer from pgflow.step_tasks | ||
where run_id = :'test_run_id'::uuid | ||
and step_slug = 'parallel_single'), | ||
0, | ||
'Parallel single task should not exist after type constraint violation (transaction rolled back)' | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test comment indicates that parallel_single
tasks won't exist due to transaction rollback, but the PR changes the behavior to handle type violations gracefully without rolling back transactions. With the new implementation, these tasks might actually exist in a failed state rather than not existing at all.
Consider updating this test to match the new behavior - either by checking for failed status instead of non-existence, or by updating the comment to reflect the current implementation's expected behavior. This will ensure the test accurately validates the intended behavior of the type violation handling.
select is( | |
(select count(*)::integer from pgflow.step_tasks | |
where run_id = :'test_run_id'::uuid | |
and step_slug = 'parallel_single'), | |
0, | |
'Parallel single task should not exist after type constraint violation (transaction rolled back)' | |
); | |
select is( | |
(select count(*)::integer from pgflow.step_tasks | |
where run_id = :'test_run_id'::uuid | |
and step_slug = 'parallel_single' | |
and status = 'failed'), | |
1, | |
'Parallel single task should exist but be in failed state after type constraint violation (graceful handling)' | |
); | |
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
View your CI Pipeline Execution ↗ for commit 842f929
☁️ Nx Cloud last updated this comment at |
c4b287c
to
6602788
Compare
f303c04
to
79502f5
Compare
79502f5
to
ae9a6ee
Compare
ae9a6ee
to
104f337
Compare
PERFORM pgmq.archive(r.flow_slug, st.message_id) | ||
FROM pgflow.step_tasks st | ||
JOIN pgflow.runs r ON st.run_id = r.run_id | ||
WHERE st.run_id = fail_task.run_id | ||
AND st.status IN ('queued', 'started') | ||
AND st.message_id IS NOT NULL; | ||
END IF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance and correctness issue: The archive operation uses individual pgmq.archive() calls in a loop rather than batch archiving. This is inefficient for large numbers of messages and could cause partial archiving if one call fails. Should collect message IDs and use batch archiving like in complete_task, or use a single query with array_agg() to archive all messages atomically.
PERFORM pgmq.archive(r.flow_slug, st.message_id) | |
FROM pgflow.step_tasks st | |
JOIN pgflow.runs r ON st.run_id = r.run_id | |
WHERE st.run_id = fail_task.run_id | |
AND st.status IN ('queued', 'started') | |
AND st.message_id IS NOT NULL; | |
END IF; | |
WITH messages_to_archive AS ( | |
SELECT r.flow_slug, array_agg(st.message_id) AS message_ids | |
FROM pgflow.step_tasks st | |
JOIN pgflow.runs r ON st.run_id = r.run_id | |
WHERE st.run_id = fail_task.run_id | |
AND st.status IN ('queued', 'started') | |
AND st.message_id IS NOT NULL | |
GROUP BY r.flow_slug | |
) | |
SELECT pgmq.archive_batch(flow_slug, message_ids) | |
FROM messages_to_archive | |
WHERE array_length(message_ids, 1) > 0; | |
END IF; |
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
Provides guidance on fixing invalid tests, updating SQL functions, and rerunning tests without creating migrations or using nx, to streamline test maintenance and debugging.
104f337
to
842f929
Compare
🔍 Preview Deployment: Website✅ Deployment successful! 🔗 Preview URL: https://pr-220.pgflow.pages.dev 📝 Details:
_Last updated: _ |
🔍 Preview Deployment: Playground✅ Deployment successful! 🔗 Preview URL: https://pr-220--pgflow-demo.netlify.app 📝 Details:
_Last updated: _ |
This PR fixes a critical issue where messages for pending tasks remain in the queue indefinitely when a run fails, causing performance degradation and resource waste.
Problem
Solution
Testing