Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Retry endpoint added #198

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Feat: Retry endpoint added #198

wants to merge 3 commits into from

Conversation

Mohiiit
Copy link
Contributor

@Mohiiit Mohiiit commented Dec 26, 2024

  1. Created a new REST endpoint /jobs/:id/retry following RESTful conventions
    • Implemented retry logic that:
      • Verifies the job is in Failed state before allowing retry
      • Updates job status to RetryAttempt
      • Triggers job processing automatically after status update
  2. Additional Improvements:
    • Enhanced verification status handling:
      • Added support for VerificationTimeout status in the verify endpoint
      • Jobs in either PendingVerification or VerificationTimeout states can now be verified

@coveralls
Copy link

Coverage Status

coverage: 67.321% (-0.1%) from 67.449%
when pulling d34594c on feat/retry-job-status
into a008642 on main.

@Mohiiit Mohiiit self-assigned this Dec 26, 2024
@Mohiiit Mohiiit requested a review from ocdbytes December 26, 2024 10:23
JobStatus::Created | JobStatus::VerificationFailed => {
tracing::info!(job_id = ?id, status = ?job.status, "Job status is Created or VerificationFailed, proceeding with processing");
JobStatus::Created | JobStatus::VerificationFailed | JobStatus::RetryAttempt => {
tracing::info!(job_id = ?id, status = ?job.status, "Job status is Created or VerificationFailed or RetryAttempt, proceeding with processing");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the message here is a lil too much, since we are already logging the status so we can change the message to something more useful.
WDYT @Mohiiit ?

@@ -209,8 +209,8 @@ pub async fn process_job(id: Uuid, config: Arc<Config>) -> Result<(), JobError>
match job.status {
// we only want to process jobs that are in the created or verification failed state.
// verification failed state means that the previous processing failed and we want to retry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we need to update this comment to match the new implementation.

@@ -333,8 +333,8 @@ pub async fn verify_job(id: Uuid, config: Arc<Config>) -> Result<(), JobError> {
tracing::Span::current().record("internal_id", job.internal_id.clone());

match job.status {
JobStatus::PendingVerification => {
tracing::debug!(job_id = ?id, "Job status is PendingVerification, proceeding with verification");
JobStatus::PendingVerification | JobStatus::VerificationTimeout => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, let's add an IMP! comment here explaining why doing this is fine,
referring that we don't add the jobs that timeout to the queue back again

JobStatus::PendingVerification => {
tracing::debug!(job_id = ?id, "Job status is PendingVerification, proceeding with verification");
JobStatus::PendingVerification | JobStatus::VerificationTimeout => {
tracing::info!(job_id = ?id, status = ?job.status, "Job status is PendingVerification or VerificationTimeout, proceeding with verification");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous:
I think the message here is a lil too much, since we are already logging the status so we can change the message to something more useful.
WDYT @Mohiiit ?

@@ -103,6 +103,8 @@ pub enum JobStatus {
VerificationFailed,
/// The job failed completing
Failed,
/// The job is being retried
RetryAttempt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit-pick

RetryAttempt could be Retrying to match the progressive verb form used in other statuses

Suggestions from Claude :

Let me review the enum names carefully and suggest only meaningful improvements:

  1. Created could be Pending or Queued - both more precisely describe that the job is waiting to be processed
  2. LockedForProcessing is quite verbose - could be simplified to Processing since the locking is an implementation detail
  3. RetryAttempt could be Retrying to match the progressive verb form used in other statuses

The rest of the names (PendingVerification, Completed, VerificationTimeout, VerificationFailed, Failed) are clear and appropriate as they are.

So the simplified version would be:

pub enum JobStatus {
    /// An acknowledgement that the job has been received by the
    /// orchestrator and is waiting to be processed
    Pending,
    /// Some system has taken a lock over the job for processing and no
    /// other system to process the job
    Processing,
    /// The job has been processed and is pending verification
    PendingVerification,
    /// The job has been processed and verified. No other actions needs to be taken
    Completed,
    /// The job was processed but the was unable to be verified under the given time
    VerificationTimeout,
    /// The job failed processing
    VerificationFailed,
    /// The job failed completing
    Failed,
    /// The job is being retried
    Retrying,
}

These changes make the enum more consistent while maintaining clarity about each state's purpose. Let me know if you'd like me to explain any of these suggestions in more detail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya agreed that RetryAttempt isn't in line with other enums. Although Retrying is a verb and others aren't a verb right? Does PendingRetry sound better? it goes in line with PendingVerification.

}
};

// Get the job and verify it's in a failed state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit-pick

Remove and verify it's in a failed state as it's being written below.

Err(e) => {
ORCHESTRATOR_METRICS
.failed_job_operations
.add(1.0, &[KeyValue::new("operation_type", "retry_job")]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

operation_type should strictly be process_job since that's the function being called.
you can add another attribute like operation_info for registering retry_job

}
}
Err(e) => {
ORCHESTRATOR_METRICS.failed_job_operations.add(1.0, &[KeyValue::new("operation_type", "retry_job")]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@@ -103,6 +103,8 @@ pub enum JobStatus {
VerificationFailed,
/// The job failed completing
Failed,
/// The job is being retried
RetryAttempt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya agreed that RetryAttempt isn't in line with other enums. Although Retrying is a verb and others aren't a verb right? Does PendingRetry sound better? it goes in line with PendingVerification.

.into_response();
}
Err(e) => {
return ApiResponse::<JobApiResponse>::error(e.to_string()).into_response();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see a lot of ApiResponse::<JobApiResponse>::error, any way to abstract this? maybe a macro, maybe a TryFrom implementation

}

// Update the job status to RetryAttempt
match config.database().update_job(&job, JobItemUpdates::new().update_status(JobStatus::RetryAttempt).build()).await
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the routes usually don't have the business logic, they just have basic process of taking the request, calling the function, taking the response and returning. i would suggest moving this inside the jobs mod.rs

@@ -119,6 +119,44 @@ async fn test_trigger_verify_job(#[future] setup_trigger: (SocketAddr, Arc<Confi
}
}

#[tokio::test]
#[rstest]
async fn test_trigger_retry_job(#[future] setup_trigger: (SocketAddr, Arc<Config>)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also add tests that you can't retry is not in failed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants