Threads via variable. #361

rajwanir · 2024-11-25T21:37:18Z

Renames the max_threads parameter to cores (925d5c6).
Uses workflow.cores to define threads in rules convert_bcf_to_plink_bed and merge_gtc_to_bcf_batches instead of hardcoded 44 (acc7df7).

The max_threads seems inaccessible at the rule level. The cores does the trick. They both seem synonymous anyway per the Snakemake documentation.

Please let me know if you encounter any issues. Thanks.

Rules `merge_gtc_to_bcf_batches` and `convert_bcf_to_plink_bed` adopt the use of `workflow.cores` instead of hardcoded 44.

jaamarks · 2024-11-26T20:48:55Z

ACK - will test.

… to `cores` in the cluster profiles.

…cgr_staff.

jaamarks

LGTM

Testing confirms the logging information is now correct, accurately displaying the allocated threads based on workflow.cores inherited from the generic_slurm and ccad2 cluster profiles. good job

jaamarks · 2025-02-21T18:49:56Z

src/cgr_gwas_qc/cluster_profiles/ccad2/submit.py

@@ -17,7 +17,7 @@

 @dataclass
 class Ccad2Options(ClusterOptions):
-    queue: Set[str] = field(default_factory=lambda: {"defq", "bigmemq"})
+    queue: Set[str] = field(default_factory=lambda: {"defq", "bigmemq", "cgrq"})


The cgrq queue is not available to all users; access must be granted explicitly. If a user lacks access, it's unclear to me whether this would result in an error or simply bypassing the queue. For example, @carynwillis does not have access, so would the whole pipeline fail or continue without using this queue?

To be clear, I do quite like this feature.

Maybe there is a way we could implemented an option in the config with which you could specify if you have access to additional partitions (e.g., cgrq) over the defaults. Else just use the default partitions given in the cluster profiles. And might be useful to make generic enough so that you could do this with Biowulf too.
What do you think?

jaamarks · 2025-02-21T19:08:47Z

src/cgr_gwas_qc/cluster_profiles/slurm_generic/config.yaml

+cores: &cores 8
+max-threads: *cores


I like this idea. I am still seeing the same behavior as before. That is, there is a discrepancy in the number of threads shown allocated in the job-specific log vs the whole-pipeline log. Here is what I see.

invocation: cgr submit --ccad2

commit: git rev-parse HEAD # 070cd684b8

sacct:

$ sacct -j 2154090 --format=JobID,JobName,Partition,AllocCPUs,State,ExitCode JobID JobName Partition AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- -------- 2154090 convert_b+ cgrq 8 COMPLETED 0:0 2154090.bat+ batch 8 COMPLETED 0:0

job-specific log:

Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 40 Rules claiming more threads will be scaled down. Select jobs to execute... rule convert_bcf_to_plink_bed: input: sample_level/samples.bcf output: sample_level/samples.bed, sample_level/samples.bim, sample_level/samples.fam log: sample_level/samples.log jobid: 0 benchmark: benchmarks/convert_bcf_to_plink_bed.500.tsv threads: 40 resources: mem_mb=1059, disk_mb=18132, tmpdir=/tmp, time_hr=1

whole-pipeline log:

[Fri Feb 21 13:15:28 2025] rule convert_bcf_to_plink_bed: input: sample_level/samples.bcf output: sample_level/samples.bed, sample_level/samples.bim, sample_level/samples.fam log: sample_level/samples.log jobid: 2 benchmark: benchmarks/convert_bcf_to_plink_bed.500.tsv threads: 8 resources: tmpdir=/tmp, mem_mb=1059, time_hr=1 plink2 --allow-extra-chr 0 --keep-allele-order --double-id --bcf sample_level/samples.bcf --update-sex /tmp/tmpl2_fp7jj --output-chr 26 --split-par hg38 --make-pgen --out sample_level/bcf2plink --memory 1059 --threads 8 ;plink2 --pfile sample_level/bcf2plink --make-pgen --sort-vars --out sample_level/bcf2plink-sorted --threads 8 --memory 1059 ;plink2 --pfile sample_level/bcf2plink-sorted --make-bed --out sample_level/samples --threads 8 --memory 1059 ;rm sample_level/bcf2plink.{pgen,psam,pvar,log} sample_level/bcf2plink-sorted.{pgen,psam,pvar,log} /tmp/tmpl2_fp7jj Submitted job 2 with external jobid '2154090'.

I just submitted another test with invocation cgr submit --slurm to see what the logs look like when using the generic_slurm cluster profile. Should be cores/threads 8.

I can confirm that when I use the invocation cgr submit --slurm both the whole-pipeline log and the job-specific log show that the number of threads allocated is 8.

rajwanir2 added 2 commits November 25, 2024 16:22

Renames max_threads described in #351 to cores.

925d5c6

Uses workflow.cores variable to define threads in select rules.

acc7df7

Rules `merge_gtc_to_bcf_batches` and `convert_bcf_to_plink_bed` adopt the use of `workflow.cores` instead of hardcoded 44.

rajwanir self-assigned this Nov 25, 2024

This was linked to issues Nov 25, 2024

Include max_threads into cluster_profile. #351

Closed

Defining threads via variable instead of hardcoding. #360

Closed

rajwanir requested a review from jaamarks November 25, 2024 21:38

rajwanir mentioned this pull request Dec 2, 2024

Reduces memory usage and time for plink rule genome. #363

Merged

jaamarks mentioned this pull request Feb 6, 2025

IDAT entry_point. #359

Merged

rajwanir mentioned this pull request Feb 19, 2025

With cluster_mode, use of workflow.cores can use more threads than allocated CPUs. #378

Closed

Fixes #378. Restricts CPU overuse by adding max-threads as an alias…

ed1d33c

… to `cores` in the cluster profiles.

rajwanir linked an issue Feb 20, 2025 that may be closed by this pull request

With cluster_mode, use of workflow.cores can use more threads than allocated CPUs. #378

Closed

rajwanir2 added 2 commits February 21, 2025 10:33

Adds cgrq as queue an additional queue that can used.

070cd68

Adds in permission checks for cgrq by checking if its a member of nci…

3a4c81d

…cgr_staff.

jaamarks approved these changes Feb 21, 2025

View reviewed changes

jaamarks merged commit e326cda into default Feb 21, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Threads via variable. #361

Threads via variable. #361

rajwanir commented Nov 25, 2024 •

edited

Loading

jaamarks commented Nov 26, 2024

jaamarks left a comment

jaamarks Feb 21, 2025

jaamarks Feb 21, 2025

jaamarks Feb 21, 2025

jaamarks Feb 21, 2025

Threads via variable. #361

Threads via variable. #361

Conversation

rajwanir commented Nov 25, 2024 • edited Loading

jaamarks commented Nov 26, 2024

jaamarks left a comment

Choose a reason for hiding this comment

jaamarks Feb 21, 2025

Choose a reason for hiding this comment

jaamarks Feb 21, 2025

Choose a reason for hiding this comment

jaamarks Feb 21, 2025

Choose a reason for hiding this comment

jaamarks Feb 21, 2025

Choose a reason for hiding this comment

rajwanir commented Nov 25, 2024 •

edited

Loading