-
Notifications
You must be signed in to change notification settings - Fork 2
Threads via variable. #361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Rules `merge_gtc_to_bcf_batches` and `convert_bcf_to_plink_bed` adopt the use of `workflow.cores` instead of hardcoded 44.
ACK - will test. |
… to `cores` in the cluster profiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Testing confirms the logging information is now correct, accurately displaying the allocated threads based on workflow.cores inherited from the generic_slurm and ccad2 cluster profiles. good job
@@ -17,7 +17,7 @@ | |||
|
|||
@dataclass | |||
class Ccad2Options(ClusterOptions): | |||
queue: Set[str] = field(default_factory=lambda: {"defq", "bigmemq"}) | |||
queue: Set[str] = field(default_factory=lambda: {"defq", "bigmemq", "cgrq"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cgrq
queue is not available to all users; access must be granted explicitly. If a user lacks access, it's unclear to me whether this would result in an error or simply bypassing the queue. For example, @carynwillis does not have access, so would the whole pipeline fail or continue without using this queue?
To be clear, I do quite like this feature.
Maybe there is a way we could implemented an option in the config with which you could specify if you have access to additional partitions (e.g., cgrq
) over the defaults. Else just use the default partitions given in the cluster profiles. And might be useful to make generic enough so that you could do this with Biowulf too.
What do you think?
cores: &cores 8 | ||
max-threads: *cores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea. I am still seeing the same behavior as before. That is, there is a discrepancy in the number of threads shown allocated in the job-specific log vs the whole-pipeline log. Here is what I see.
- invocation:
cgr submit --ccad2
- commit:
git rev-parse HEAD # 070cd684b8
- sacct:
$ sacct -j 2154090 --format=JobID,JobName,Partition,AllocCPUs,State,ExitCode
JobID JobName Partition AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- --------
2154090 convert_b+ cgrq 8 COMPLETED 0:0
2154090.bat+ batch 8 COMPLETED 0:0
- job-specific log:
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 40
Rules claiming more threads will be scaled down.
Select jobs to execute...
rule convert_bcf_to_plink_bed:
input: sample_level/samples.bcf
output: sample_level/samples.bed, sample_level/samples.bim, sample_level/samples.fam
log: sample_level/samples.log
jobid: 0
benchmark: benchmarks/convert_bcf_to_plink_bed.500.tsv
threads: 40
resources: mem_mb=1059, disk_mb=18132, tmpdir=/tmp, time_hr=1
- whole-pipeline log:
[Fri Feb 21 13:15:28 2025]
rule convert_bcf_to_plink_bed:
input: sample_level/samples.bcf
output: sample_level/samples.bed, sample_level/samples.bim, sample_level/samples.fam
log: sample_level/samples.log
jobid: 2
benchmark: benchmarks/convert_bcf_to_plink_bed.500.tsv
threads: 8
resources: tmpdir=/tmp, mem_mb=1059, time_hr=1
plink2 --allow-extra-chr 0 --keep-allele-order --double-id --bcf sample_level/samples.bcf --update-sex /tmp/tmpl2_fp7jj --output-chr 26 --split-par hg38 --make-pgen --out sample_level/bcf2plink --memory 1059 --threads 8 ;plink2 --pfile sample_level/bcf2plink --make-pgen --sort-vars --out sample_level/bcf2plink-sorted --threads 8 --memory 1059 ;plink2 --pfile sample_level/bcf2plink-sorted --make-bed --out sample_level/samples --threads 8 --memory 1059 ;rm sample_level/bcf2plink.{pgen,psam,pvar,log} sample_level/bcf2plink-sorted.{pgen,psam,pvar,log} /tmp/tmpl2_fp7jj
Submitted job 2 with external jobid '2154090'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just submitted another test with invocation cgr submit --slurm
to see what the logs look like when using the generic_slurm cluster profile. Should be cores/threads 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that when I use the invocation cgr submit --slurm
both the whole-pipeline log and the job-specific log show that the number of threads allocated is 8.
max_threads
parameter tocores
(925d5c6).workflow.cores
to define threads in rulesconvert_bcf_to_plink_bed
andmerge_gtc_to_bcf_batches
instead of hardcoded 44 (acc7df7).The
max_threads
seems inaccessible at the rule level. Thecores
does the trick. They both seem synonymous anyway per the Snakemake documentation.Please let me know if you encounter any issues. Thanks.