-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipeline fails when run with a lot of cores #763
Comments
Hi there, thanks for the report.
That should be impossible, if it were to happen it would be indeed a bug. You could also use in future a more up to date nextflow version. But I somehow doubt that this is the cause. |
Close due to lack of information. |
Sorry for the delay. I have attached the log file. I updated nextflow to v24.04.3 for this run. I specified the number of cores using the '-c' option in slurm's sbatch. |
Thanks! The error message in that log file is:
|
I am not sure what causes this error. It seems to me pretty likely that its a problem with the tmp dir (maybe full or was deleted at that specific timepoint coincidentally). My hypothesis is that it was an coincident that the job didnt finish when you specified 120 cpus and succeeded with 20 cpus (because QIIME2_DIVERSITY_BETA is running with 2 cores only by default). Even if you would have modified the cpus, my alternative hypothesis is that by doing so the job was started the second time on a different node that had a working tmp dir. Next step might be: There is a discussion in the QIIME2 forum that might be related with some troubleshooting, see here. |
Changing the tmp dir to one that has 130 TB of free space produces the same error except that /tmp now points to the new tmp dir. |
Description of the bug
I'm analyzing a 16S dataset with ~1,200 samples spread across 3 runs (unfortunately this dataset is not public yet so I can't provide a reproducible example). I've found a bug in the ampliseq pipeline where if I run the pipeline with 120 cores it will fail (specifically at the diversity step) but if I run it with 20 cores it will finish without any errors. I believe the issue is that the workflow gets ahead of itself due to the number of cores available and starts a qiime2 module before another requisite qiime2 module finishes.
Command used and terminal output
Relevant files
No response
System information
Nextflow v23.10.1
nf-core/ampliseq v2.10.0
singularity v4.1.3
slurm v23.02.1
The text was updated successfully, but these errors were encountered: