-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Retry endpoint added #198
base: main
Are you sure you want to change the base?
Conversation
Mohiiit
commented
Dec 26, 2024
- Created a new REST endpoint /jobs/:id/retry following RESTful conventions
- Implemented retry logic that:
- Verifies the job is in Failed state before allowing retry
- Updates job status to RetryAttempt
- Triggers job processing automatically after status update
- Implemented retry logic that:
- Additional Improvements:
- Enhanced verification status handling:
- Added support for VerificationTimeout status in the verify endpoint
- Jobs in either PendingVerification or VerificationTimeout states can now be verified
- Enhanced verification status handling:
JobStatus::Created | JobStatus::VerificationFailed => { | ||
tracing::info!(job_id = ?id, status = ?job.status, "Job status is Created or VerificationFailed, proceeding with processing"); | ||
JobStatus::Created | JobStatus::VerificationFailed | JobStatus::RetryAttempt => { | ||
tracing::info!(job_id = ?id, status = ?job.status, "Job status is Created or VerificationFailed or RetryAttempt, proceeding with processing"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the message here is a lil too much, since we are already logging the status
so we can change the message to something more useful.
WDYT @Mohiiit ?
@@ -209,8 +209,8 @@ pub async fn process_job(id: Uuid, config: Arc<Config>) -> Result<(), JobError> | |||
match job.status { | |||
// we only want to process jobs that are in the created or verification failed state. | |||
// verification failed state means that the previous processing failed and we want to retry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we need to update this comment to match the new implementation.
@@ -333,8 +333,8 @@ pub async fn verify_job(id: Uuid, config: Arc<Config>) -> Result<(), JobError> { | |||
tracing::Span::current().record("internal_id", job.internal_id.clone()); | |||
|
|||
match job.status { | |||
JobStatus::PendingVerification => { | |||
tracing::debug!(job_id = ?id, "Job status is PendingVerification, proceeding with verification"); | |||
JobStatus::PendingVerification | JobStatus::VerificationTimeout => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, let's add an IMP!
comment here explaining why doing this is fine,
referring that we don't add the jobs that timeout to the queue back again
JobStatus::PendingVerification => { | ||
tracing::debug!(job_id = ?id, "Job status is PendingVerification, proceeding with verification"); | ||
JobStatus::PendingVerification | JobStatus::VerificationTimeout => { | ||
tracing::info!(job_id = ?id, status = ?job.status, "Job status is PendingVerification or VerificationTimeout, proceeding with verification"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous:
I think the message here is a lil too much, since we are already logging the status so we can change the message to something more useful.
WDYT @Mohiiit ?
@@ -103,6 +103,8 @@ pub enum JobStatus { | |||
VerificationFailed, | |||
/// The job failed completing | |||
Failed, | |||
/// The job is being retried | |||
RetryAttempt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit-pick
RetryAttempt
could be Retrying
to match the progressive verb form used in other statuses
Suggestions from Claude :
Let me review the enum names carefully and suggest only meaningful improvements:
Created
could bePending
orQueued
- both more precisely describe that the job is waiting to be processedLockedForProcessing
is quite verbose - could be simplified toProcessing
since the locking is an implementation detailRetryAttempt
could beRetrying
to match the progressive verb form used in other statuses
The rest of the names (PendingVerification
, Completed
, VerificationTimeout
, VerificationFailed
, Failed
) are clear and appropriate as they are.
So the simplified version would be:
pub enum JobStatus {
/// An acknowledgement that the job has been received by the
/// orchestrator and is waiting to be processed
Pending,
/// Some system has taken a lock over the job for processing and no
/// other system to process the job
Processing,
/// The job has been processed and is pending verification
PendingVerification,
/// The job has been processed and verified. No other actions needs to be taken
Completed,
/// The job was processed but the was unable to be verified under the given time
VerificationTimeout,
/// The job failed processing
VerificationFailed,
/// The job failed completing
Failed,
/// The job is being retried
Retrying,
}
These changes make the enum more consistent while maintaining clarity about each state's purpose. Let me know if you'd like me to explain any of these suggestions in more detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya agreed that RetryAttempt
isn't in line with other enums. Although Retrying
is a verb and others aren't a verb right? Does PendingRetry
sound better? it goes in line with PendingVerification
.
} | ||
}; | ||
|
||
// Get the job and verify it's in a failed state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit-pick
Remove and verify it's in a failed state
as it's being written below.
Err(e) => { | ||
ORCHESTRATOR_METRICS | ||
.failed_job_operations | ||
.add(1.0, &[KeyValue::new("operation_type", "retry_job")]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
operation_type should strictly be process_job
since that's the function being called.
you can add another attribute like operation_info
for registering retry_job
} | ||
} | ||
Err(e) => { | ||
ORCHESTRATOR_METRICS.failed_job_operations.add(1.0, &[KeyValue::new("operation_type", "retry_job")]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
@@ -103,6 +103,8 @@ pub enum JobStatus { | |||
VerificationFailed, | |||
/// The job failed completing | |||
Failed, | |||
/// The job is being retried | |||
RetryAttempt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya agreed that RetryAttempt
isn't in line with other enums. Although Retrying
is a verb and others aren't a verb right? Does PendingRetry
sound better? it goes in line with PendingVerification
.
.into_response(); | ||
} | ||
Err(e) => { | ||
return ApiResponse::<JobApiResponse>::error(e.to_string()).into_response(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see a lot of ApiResponse::<JobApiResponse>::error
, any way to abstract this? maybe a macro, maybe a TryFrom implementation
} | ||
|
||
// Update the job status to RetryAttempt | ||
match config.database().update_job(&job, JobItemUpdates::new().update_status(JobStatus::RetryAttempt).build()).await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the routes usually don't have the business logic, they just have basic process of taking the request, calling the function, taking the response and returning. i would suggest moving this inside the jobs mod.rs
@@ -119,6 +119,44 @@ async fn test_trigger_verify_job(#[future] setup_trigger: (SocketAddr, Arc<Confi | |||
} | |||
} | |||
|
|||
#[tokio::test] | |||
#[rstest] | |||
async fn test_trigger_retry_job(#[future] setup_trigger: (SocketAddr, Arc<Config>)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also add tests that you can't retry is not in failed?