Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPC Episode #18

Open
wants to merge 41 commits into
base: main
Choose a base branch
from
Open

HPC Episode #18

wants to merge 41 commits into from

Conversation

multimeric
Copy link
Collaborator

@multimeric multimeric commented Jul 10, 2023

  • Added an optional HPC episode, which uses Slurm as an example
  • This has been written using the current plan writing system. Once/if Code Re-Use #19 is merged, some of this will need re-writing to remove redundancy

@github-actions
Copy link

github-actions bot commented Jul 10, 2023

🆗 Pre-flight checks passed 😃

This pull request has been checked and contains no modified workflow files or spoofing.

Results of any additional workflows will appear here when they are done.

github-actions bot pushed a commit that referenced this pull request Jul 10, 2023
episodes/hpc.Rmd Outdated Show resolved Hide resolved
episodes/hpc.Rmd Outdated Show resolved Hide resolved
episodes/hpc.Rmd Outdated Show resolved Hide resolved
episodes/hpc.Rmd Outdated Show resolved Hide resolved
episodes/hpc.Rmd Outdated Show resolved Hide resolved
episodes/hpc.Rmd Outdated Show resolved Hide resolved
episodes/hpc.Rmd Outdated Show resolved Hide resolved
episodes/hpc.Rmd Outdated Show resolved Hide resolved
episodes/hpc.Rmd Outdated Show resolved Hide resolved
episodes/hpc.Rmd Outdated Show resolved Hide resolved
episodes/hpc.Rmd Outdated Show resolved Hide resolved
episodes/hpc.Rmd Show resolved Hide resolved
multimeric and others added 3 commits July 10, 2023 15:58
Co-authored-by: Edward Yang <94523015+edoyango@users.noreply.github.com>
Co-authored-by: Edward Yang <94523015+edoyango@users.noreply.github.com>
Co-authored-by: Edward Yang <94523015+edoyango@users.noreply.github.com>
github-actions bot pushed a commit that referenced this pull request Jul 10, 2023
multimeric and others added 2 commits July 10, 2023 16:08
Co-authored-by: Edward Yang <94523015+edoyango@users.noreply.github.com>
Co-authored-by: Edward Yang <94523015+edoyango@users.noreply.github.com>
github-actions bot pushed a commit that referenced this pull request Jul 10, 2023
Co-authored-by: Edward Yang <94523015+edoyango@users.noreply.github.com>
github-actions bot pushed a commit that referenced this pull request Jul 11, 2023
github-actions bot pushed a commit that referenced this pull request Jul 12, 2023
github-actions bot pushed a commit that referenced this pull request Jul 12, 2023
Copy link

@edoyango edoyango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments after having gone through the preceding episodes.

episodes/hpc.Rmd Show resolved Hide resolved
episodes/hpc.Rmd Outdated Show resolved Hide resolved
multimeric and others added 3 commits July 20, 2023 10:41
Co-authored-by: Edward Yang <94523015+edoyango@users.noreply.github.com>
Co-authored-by: Edward Yang <94523015+edoyango@users.noreply.github.com>
github-actions bot pushed a commit that referenced this pull request Jul 10, 2024
@joelnitta
Copy link
Collaborator

joelnitta commented Jul 10, 2024

@multimeric

I realized that I needed to update renv.lock in main, so I did that and cherry-picked the commits here (4eb5f06, 9489bea), as well as make one more update to renv.lock (I think due to using CRAN instead of the carpentries r-universe, e012aaf).

Can you check if you can still build the HPC lesson locally? I can't because I don't have access to a computer with slurm.

(sorry - I realized cherry-pick may not have been the best approach here. Feel free to roll those back and re-base instead if you want)

@multimeric
Copy link
Collaborator Author

I can do that, but can we also test if it's working on the CI? I'm not sure if it actually runs the lesson build for pull requests at the moment?

@joelnitta
Copy link
Collaborator

Sure - I assume you mean implementing suggestion 3?

Add slurm to the CI server, e.g. https://github.com/koesterlab/setup-slurm-action

@multimeric
Copy link
Collaborator Author

Oh, I already added that to .github/workflows/sandpaper-main.yaml, but I'm not sure if it will even run that workflow for a pull request. Also it doesn't like the fact that I've changed the content and the workflow in the same PR.

@joelnitta
Copy link
Collaborator

joelnitta commented Jul 10, 2024

How about this: I just set up another repo and merged hpc into main over there, let's see how it goes.

GH action is running now... https://github.com/joelnitta/targets-hpc/actions/runs/9867906104

hm, seems to be hung up, I predict it will git killed by time-out. I've actually seen this behavior before, I'm not sure what's causing it but I don't think it is because of this PR...

@multimeric
Copy link
Collaborator Author

No, I think it is a fault in my PR. It's failing at the second workflow which uses different memory requirements, which is likely the issue. I've reduced the memory requirements, but I now realise that it's guaranteed to fail at the GPU workflow since the mini cluster isn't going to have a GPU node.

@joelnitta
Copy link
Collaborator

ah... so perhaps CI is not an option then?

@multimeric
Copy link
Collaborator Author

Well not for the last workflow, but that's inside a challenge solution so even if I just paste in the pre-computed result there I don't think that's too bad.

@joelnitta
Copy link
Collaborator

Sounds good, let's try that then. If you push the change here, I can try it again over on my other fork.

github-actions bot pushed a commit that referenced this pull request Dec 13, 2024
@joelnitta
Copy link
Collaborator

@multimeric

markdown build seems to be hanging

image

@multimeric
Copy link
Collaborator Author

Yeah, feel free to cancel it. I can't.

@joelnitta
Copy link
Collaborator

canceled

@multimeric
Copy link
Collaborator Author

I believe this failed because the lesson involves submitting a job that requires 2 GB of RAM, but the koesterlab/setup-slurm-action only provides a single node with 2000 MB of RAM and so it hangs forever while waiting for sufficient memory to become available. I'll fork the action to allow increased memory.

@multimeric
Copy link
Collaborator Author

Actually can you give me the permission to cancel builds so that I can play around with this?

@multimeric
Copy link
Collaborator Author

Okay, putting this on hold until koesterlab/setup-slurm-action#7 is merged. Technically we could use my fork of the action, but there's no hurry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants