Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "retry" command to spackbot. #60

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

josephsnyder
Copy link

Utilize the "retry" API endpoint for pipelines to re-run the jobs that
had failed during the most recent failed run of a pipeline.

It does require a new comment to listen to and a call to acquire the ID
of the most recently failed pipeline before triggering the jobs to be retried.

@josephsnyder josephsnyder force-pushed the add_pipeline_retry branch 6 times, most recently from 12c0441 to 2921e4c Compare January 4, 2022 14:57
Copy link
Collaborator

@scottwittenburg scottwittenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @josephsnyder this is looking pretty good. I think we probably want to debug the sender (instead of info) or else pick out the user name from that dictionary.

I'm curious how you're testing this change, or if it's just too much of a pain. I had to do a lot of gymnastics to test #58, which is why I'm asking.

spackbot/handlers/pipelines.py Outdated Show resolved Hide resolved
spackbot/handlers/pipelines.py Outdated Show resolved Hide resolved
spackbot/handlers/pipelines.py Outdated Show resolved Hide resolved
@josephsnyder
Copy link
Author

@scottwittenburg, I've been testing this using a combination of the docker-compose system and a non-spack repository (josephsnyder/spack_docs#1) I told my local instance of spackbot to listen on this repository.

I haven't gotten one to officially "work" via spackbot yet since the branches there don't match any that would be on SPACK's GitLab but they work when manually hitting the API endpoint for the development GitLab instance I run.

@josephsnyder
Copy link
Author

I've added a few new parts: it now grabs all pipelines and wont try to restart a pipeline if the 0th one is a success. It also tries to find the last one of failed, skipped, and canceled.

I can't quite test the processing as I still am receiving a 404 when my copy of spackbot is trying to query the GitLab projects API. I am able to see it from a browser and and I have put in my PAT to the GITLAB_TOKEN entry in the .env file. Can you think of something that I might have missed?

Copy link
Collaborator

@scottwittenburg scottwittenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your changes querying all pipelines for the branch make sense, thanks @josephsnyder! Just update the comment to reflect that if you don't mind 😉

Also, maybe your attempts to hit the endpoint on the gitlab/spack project failed because you weren't a member of that project? 🤷 I invited you as a maintainer so you could try again.

In general, without deploying this change for real (or fake, as I did with #58) in the cluster, we probably can't hope to test everything. So I'm fine with you just testing the parts you can piecemeal, and then ironing out any issues once we deploy.

async with session.post(url, headers=headers) as response:
result = await response.json()
else:
# Get a list of failed pipelines
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Get a list of failed pipelines
# Get a list of all pipelines

Utilize the "retry" API endpoint for pipelines to re-run the jobs that
had failed during the most recent failed run of a pipeline.

It does require a new comment to listen to and a call to acquire the ID
of the most recently failed pipeline.  Use the PR ref to capture the
pipelines for that specific instance.
@tldahlgren
Copy link
Collaborator

Not sure/don't recall why this hasn't been merged but if it is active it needs resolution of the conflicts.

@scottwittenburg
Copy link
Collaborator

This is a year and half old, and I don't really recall why we wanted it. Right now it seems to me to be kind of risky to expose this functionality since we know that when you use the UI to retry jobs in gitlab, the job runs with the environment variables given to it when the job was created. This effectively means you can't retry jobs created more than 12 hrs ago, or the aws credentials will have expired. Maybe we hadn't implemented credential rotation yet when this was done? I'm not sure.

At any rate, I think we should close this unless someone has a good argument otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants