Skip to content

Conversation

@tdayris
Copy link
Contributor

@tdayris tdayris commented Oct 21, 2025

This PR lets GATK meta-wrapper use pathvars syntax.

QC

While the contributions guidelines are more extensive, please particularly ensure that:

  • test.py was updated to call any added or updated example rules in a Snakefile
  • input: and output: file paths in the rules can be chosen arbitrarily
  • wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:)
  • temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to
  • the meta.yaml contains a link to the documentation of the respective tool or command under url:
  • conda environments use a minimal amount of channels and packages, in recommended ordering

Summary by CodeRabbit

  • New Features

    • Paths are now centrally configurable via a new path template section with defaults and per-sample placeholders for references, indexes, intervals, known variants, and input alignments.
    • Workflow rules, outputs and logs now use templated placeholders so artifacts follow the configured layout.
  • Tests

    • Test configuration and expected paths updated to align with the new configurable path scheme.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 21, 2025

📝 Walkthrough

Walkthrough

Added a pathvars section and replaced hard-coded file paths in the GATK MuTect2 meta-wrapper with templated placeholders; updated test Snakefile/config and adjusted tests to the new results-prefixed layout.

Changes

Cohort / File(s) Summary
Metadata configuration
meta/bio/gatk_mutect2_calling/meta.yaml
Added pathvars with default and custom keys: per, genome_sequence, genome_sequence_index, genome_dict, bed_intervals, known_variants, known_variants_tbi, and bam_file.
Rule implementation (wrapper)
meta/bio/gatk_mutect2_calling/meta_wrapper.smk
Replaced hard-coded inputs/outputs/logs with parameterized placeholders (e.g., <genome_sequence>, <genome_dict>, <genome_sequence_index>, <bed_intervals>, <known_variants>, <results>/.../<per>..., <logs>/...). Updated rules: create_dict, samtools_index, picard_replace_read_groups, sambamba_index_picard_bam, mutect2_call, gatk_get_pileup_summaries, gatk_calculate_contamination, gatk_learn_read_orientation_model, filter_mutect_calls.
Test scaffolding
meta/bio/gatk_mutect2_calling/test/Snakefile, meta/bio/gatk_mutect2_calling/test/config.yaml
Snakefile: added configfile directive and reinstated use rule * from gatk_mutect2_calling with config:. config.yaml: added pathvars with concrete paths/templates for genome files, intervals, known variants, per, and bam_file.
Test validation update
test_wrappers.py
Adjusted expected test path from variant/Sample1.filtered.vcf.gz.tbi to results/variant/Sample1.filtered.vcf.gz.tbi.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

Notes for review:

  • Verify meta.yaml pathvars syntax and comments match repository conventions.
  • Confirm all placeholder names in meta_wrapper.smk exactly match pathvars keys (e.g., <per>, <results>, <logs>, <genome_sequence>, etc.).
  • Check test config.yaml values align with expected templates for per-sample expansion and that test_wrappers.py path update covers all assertions.

Possibly related PRs

Suggested reviewers

  • johanneskoester
  • fgvieira

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: updating the GATK meta-wrapper to use pathvars syntax, which aligns with the substantial refactoring across multiple files.
Description check ✅ Passed The description explains the purpose (using pathvars syntax) and confirms all QC checklist items are completed, though it lacks technical detail on specific changes made.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ccc92ad and e49fc2a.

📒 Files selected for processing (3)
  • meta/bio/gatk_mutect2_calling/meta_wrapper.smk (6 hunks)
  • meta/bio/gatk_mutect2_calling/test/Snakefile (1 hunks)
  • test_wrappers.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • test_wrappers.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • meta/bio/gatk_mutect2_calling/test/Snakefile
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 4665
File: meta/bio/salmon_tximport/meta_wrapper.smk:91-125
Timestamp: 2025-10-31T14:56:28.280Z
Learning: In Snakemake wrappers, pathvars only expand to strings, not lists or other data structures. Sample lists should be provided via config["samples"] rather than through pathvars, following standard Snakemake configuration patterns.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3496
File: bio/mtnucratio/test/Snakefile:2-6
Timestamp: 2024-11-26T08:31:00.099Z
Learning: In test files for Snakemake wrappers, such as `bio/mtnucratio/test/Snakefile`, hard-coded input and output paths are acceptable as examples and do not need to use wildcards to make paths flexible.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3501
File: meta/bio/varscan2_snpeff/test/Snakefile:58-71
Timestamp: 2024-11-26T10:49:54.765Z
Learning: In test Snakefiles within the snakemake-wrappers repository, it is acceptable to use simplified paths and logging configurations that may differ from real-life pipelines.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3499
File: bio/ngscheckmate/makesnvpattern/test/Snakefile:14-22
Timestamp: 2024-11-26T09:16:39.570Z
Learning: In the `snakemake-wrappers` repository, when writing test `Snakefile`s (e.g., `bio/ngscheckmate/makesnvpattern/test/Snakefile`), hardcoded input/output paths are acceptable because these are examples and the IO can be chosen freely.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3499
File: bio/ngscheckmate/makesnvpattern/test/Snakefile:1-13
Timestamp: 2024-11-26T09:16:24.981Z
Learning: In test `Snakefile`s (e.g., `test/Snakefile`), it's acceptable to use fixed input and output file names instead of wildcards.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3502
File: bio/ngsbits/sampleancestry/wrapper.py:1-23
Timestamp: 2024-11-26T15:01:13.202Z
Learning: The NGS-bits SampleAncestry wrapper in `bio/ngsbits/sampleancestry/` includes a test Snakefile, sample VCF files, and tests available in the `test/` directory.
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:7-10
Timestamp: 2024-11-15T18:31:15.447Z
Learning: In the Snakemake wrappers repository, avoid suggesting refactoring that involves using `tempfile.gettempdir()` or changing output paths to temporary directories.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3502
File: bio/ngsbits/sampleancestry/wrapper.py:18-23
Timestamp: 2024-11-26T14:59:03.678Z
Learning: In Snakemake wrapper scripts, Snakemake validates input and output paths, so explicit shell quoting is not necessary.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 4285
File: bio/agat/wrapper.py:336-340
Timestamp: 2025-06-20T13:10:29.240Z
Learning: User tdayris prefers to fix identified code issues immediately rather than deferring them, as indicated by their comment "Now is better than later" when confirming code fixes.
📚 Learning: 2024-11-26T08:31:00.099Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3496
File: bio/mtnucratio/test/Snakefile:2-6
Timestamp: 2024-11-26T08:31:00.099Z
Learning: In test files for Snakemake wrappers, such as `bio/mtnucratio/test/Snakefile`, hard-coded input and output paths are acceptable as examples and do not need to use wildcards to make paths flexible.

Applied to files:

  • meta/bio/gatk_mutect2_calling/meta_wrapper.smk
📚 Learning: 2024-11-26T09:16:39.570Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3499
File: bio/ngscheckmate/makesnvpattern/test/Snakefile:14-22
Timestamp: 2024-11-26T09:16:39.570Z
Learning: In the `snakemake-wrappers` repository, when writing test `Snakefile`s (e.g., `bio/ngscheckmate/makesnvpattern/test/Snakefile`), hardcoded input/output paths are acceptable because these are examples and the IO can be chosen freely.

Applied to files:

  • meta/bio/gatk_mutect2_calling/meta_wrapper.smk
📚 Learning: 2024-11-26T10:49:54.765Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3501
File: meta/bio/varscan2_snpeff/test/Snakefile:58-71
Timestamp: 2024-11-26T10:49:54.765Z
Learning: In test Snakefiles within the snakemake-wrappers repository, it is acceptable to use simplified paths and logging configurations that may differ from real-life pipelines.

Applied to files:

  • meta/bio/gatk_mutect2_calling/meta_wrapper.smk
📚 Learning: 2024-11-15T18:31:15.447Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:7-10
Timestamp: 2024-11-15T18:31:15.447Z
Learning: In the Snakemake wrappers repository, avoid suggesting refactoring that involves using `tempfile.gettempdir()` or changing output paths to temporary directories.

Applied to files:

  • meta/bio/gatk_mutect2_calling/meta_wrapper.smk
📚 Learning: 2024-11-21T10:23:03.427Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.

Applied to files:

  • meta/bio/gatk_mutect2_calling/meta_wrapper.smk
📚 Learning: 2024-11-26T10:49:04.406Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3501
File: meta/bio/varscan2_snpeff/test/Snakefile:1-10
Timestamp: 2024-11-26T10:49:04.406Z
Learning: Using generic output filenames like "genome.fasta" is acceptable in the `snakemake-wrappers` project.

Applied to files:

  • meta/bio/gatk_mutect2_calling/meta_wrapper.smk
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: testing
  • GitHub Check: docs
  • GitHub Check: Summary
🔇 Additional comments (3)
meta/bio/gatk_mutect2_calling/meta_wrapper.smk (3)

61-81: Verify the {sample} placeholder in tumor sample parameter.

Line 77 uses {sample} in the --tumor-sample parameter, while <per> is used for sample-specific paths throughout the rule (lines 66, 67, 70, 71, 72, 79). Ensure that {sample} resolves correctly to provide the tumor sample name to GATK MuTect2.


1-158: LGTM: Pathvars migration is well-executed.

The conversion from hard-coded paths to pathvars placeholders is consistent and well-structured across all rules. The use of placeholders like <genome_sequence>, <results>, <logs>, and <per> enables flexible path configuration while maintaining the pipeline logic. The temp() usage for intermediate files (lines 92, 108, 124) is appropriate.


30-44: No issues found with {sample} placeholder usage.

After verification, the {sample} placeholders in params (lines 42 and 77) are correct. Here's why:

The pathvar per: "{sample}" in the configuration expands to the literal string "{sample}" when substituted into the output paths. After this substitution, paths like <results>/picard/<per>.bam become results/picard/{sample}.bam, which Snakemake recognizes as a wildcard pattern. This makes sample a valid wildcard for the entire rule, allowing {sample} to be used in params.

This pattern is consistent with other established meta_wrappers in the repository (e.g., bwa_mapping, calc_consensus_reads), where the same approach is used successfully.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0bcf3ab and 5876525.

⛔ Files ignored due to path filters (1)
  • meta/bio/gatk_mutect2_calling/test/reference/known.vcf.gz is excluded by !**/*.gz
📒 Files selected for processing (5)
  • meta/bio/gatk_mutect2_calling/meta.yaml (1 hunks)
  • meta/bio/gatk_mutect2_calling/meta_wrapper.smk (6 hunks)
  • meta/bio/gatk_mutect2_calling/test/Snakefile (1 hunks)
  • meta/bio/gatk_mutect2_calling/test/config.yaml (1 hunks)
  • test_wrappers.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • test_wrappers.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: testing
  • GitHub Check: docs
  • GitHub Check: Summary
🔇 Additional comments (4)
test_wrappers.py (1)

6562-6562: LGTM! Test path correctly updated to reflect new structure.

The path change from variant/Sample1.filtered.vcf.gz.tbi to results/variant/Sample1.filtered.vcf.gz.tbi correctly aligns with the new pathvars-based output structure defined in the meta-wrapper.

meta/bio/gatk_mutect2_calling/test/config.yaml (1)

1-9: LGTM! Configuration properly defines all required pathvars.

The pathvars configuration is complete and consistent with the documentation in meta.yaml. The per-sample templating using "{sample}" for the per pathvar and "mapped/{sample}.bam" for bam_file correctly enables flexible path resolution across the workflow.

meta/bio/gatk_mutect2_calling/test/Snakefile (1)

6-11: LGTM! Snakefile correctly adopts config-driven meta-wrapper pattern.

The addition of the configfile directive and the conversion to nested meta_wrapper block form are both correct. This enables the workflow to use the pathvars defined in config.yaml for flexible path resolution.

meta/bio/gatk_mutect2_calling/meta_wrapper.smk (1)

3-158: Verify hardcoded {sample} references in params are intentional.

The conversion to placeholders looks comprehensive and consistent. However, Lines 42 and 77 contain {sample} directly in the extra params strings:

  • Line 42: "--RGLB lib1 --RGPL illumina --RGPU {sample} --RGSM {sample}"
  • Line 77: " --tumor-sample {sample} "

This creates an implicit constraint that the per pathvar must be exactly "{sample}". If users configure per as "{sample_id}" or another pattern, these params would not expand correctly.

Consider either:

  1. Documenting this constraint in meta.yaml, or
  2. Making it more flexible by using the <per> placeholder in params where appropriate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant