Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set the computing resources (cpus, memory) in total submitted jobs #5702

Open
nttg8100 opened this issue Jan 23, 2025 · 4 comments
Open

Comments

@nttg8100
Copy link

Use cases:
I have a SLURM HPC with small computing resources and a few users. What I want is that when l run a nextflow pipeline that it can limit the total cpus and memory. For example, I have 10 jobs, each job requires 4 cpus and 8.GB memory. However, I want to run only with overall cpus in total is 8 cpus and 16 GB memory. As a result, it will only submit the jobs with 2 jobs. It will wait on done to submit next 4 times. However, nextflow only limits cpus and memory in each process that I can not control this.

One possible solution is using the queueSize and resourceLimits. I can set the queue sizes to be 2, the max cpus and max memory is 4. However, it takes time if I have the too many small resources processes (1 cpus, 2GB memory), it can be run 4 processes each time however, due to the queue size, it can only submit with 2 processes. Is there any alternative solution that I may miss via the nextflow documentation?

@bentsherman
Copy link
Member

With the local executor you can control the total cpus/memory available to jobs with the executor.cpus and executor.memory config options. But they only work for the local executor. For an HPC scheduler like SLURM the standard approach would be to set up a queue with the limited cpus/memory available to it, then use the queue directive to send all jobs to that queue.

@bentsherman
Copy link
Member

Although that might not work if you want to run multiple pipelines each with their own cpus/memory limit.

Really I think the solution is to not submit the jobs to SLURM. Instead, submit each Nextflow run as a SLURM job with the desired cpus/memory constraint, then have each run use the local executor, so it only uses the one node.

@nttg8100
Copy link
Author

nttg8100 commented Jan 23, 2025

Thank you for your quick response, it can do so. However, for a sequential steps, you may not need to use all of resources. As a result, I may keep a huge resource but not use at all because the pipeline is stucked at a step that requires small resource. I think it is good idea that nextflow can implement to limit in total resource for running a single pipeline then a team with small HPC can optimize to use cpus and memory.

@bentsherman
Copy link
Member

That's a fair point. It could be interesting to extend the executor.cpus and executor.memory to also work with the grid executors like SLURM. Shouldn't be too hard to add, and it could be a nice way for users to constrain themselves when there are many users running Nextflow pipelines on a shared cluster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants