-
Notifications
You must be signed in to change notification settings - Fork 1
Add VarScan2 somatic variant caller rule #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add VarScan2 somatic variant caller rule #33
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds VarScan2 somatic variant calling functionality to the Snakemake workflow for processing tumor/normal sample pairs. VarScan2 is a variant calling tool specifically designed for somatic mutation detection in cancer genomics.
- Implements a complete VarScan2 variant calling pipeline with chromosome chunking
- Adds configuration support for VarScan2 chromosome processing
- Integrates the new rule into the main workflow
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
File | Description |
---|---|
workflow/rules/varscan2.smk | Complete VarScan2 rule implementation with somatic calling, sorting, concatenation, and preparation steps |
workflow/rules/rule_common.smk | Adds VARSCAN_CHRMS configuration variable for chromosome chunk processing |
workflow/Snakefile | Includes the new VarScan2 rules file in the main workflow |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
bcftools view -O b -o $bcf --threads {threads} $vcf && bcftools index --threads 4 $bcf; | ||
done; | ||
touch {output}; | ||
{latency_wait}; |
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable latency_wait
is used in shell commands but is not defined in the rule parameters. This will cause the shell command to fail.
Copilot uses AI. Check for mistakes.
touch {output}; | ||
{latency_wait}; | ||
ls {output} >> {log} 2>&1; | ||
{latency_wait}; |
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable latency_wait
is used in shell commands but is not defined in the rule parameters. This will cause the shell command to fail.
Copilot uses AI. Check for mistakes.
"varscan2 chunks can only be one contiguous range per chunk: e.g., 1-4 with the non numerical chromosomes assigned 23=X, 24=Y, 25=MT" | ||
) | ||
|
||
return ret_mod_chrm(ret_str) |
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function ret_mod_chrm
is called but not defined in this file or imported. This will cause a NameError when the function is executed.
Copilot uses AI. Check for mistakes.
shell: | ||
""" | ||
( echo {output} ; | ||
mkdir -p $(dirname {output} ); |
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using dirname
on multiple space-separated paths in {output}
will not work correctly. The shell will interpret this as dirname path1 path2 path3
which is invalid. Use a loop or process each output file individually.
mkdir -p $(dirname {output} ); | |
for f in {output}; do mkdir -p "$(dirname "$f")"; done ; |
Copilot uses AI. Check for mistakes.
""" | ||
bedtools sort -header -i {input.vcf} > {output.vcfsort} 2>> {log}; | ||
bgzip {output.vcfsort} >> {log} 2>&1; | ||
touch {output.vcfsort}; |
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file {output.vcfsort}
is being touched after it was already compressed with bgzip on line 116, which will create an empty file with the same name as the compressed file, potentially causing confusion.
touch {output.vcfsort}; |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codex Review: Here are some suggestions.
Reply with @codex fix comments
to fix any unresolved comments.
About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".
def get_vscn_chrm(wildcards): | ||
pchr = "" # prefix handled already | ||
ret_str = "" | ||
sl = wildcards.vscnchrm.replace('chr', '').split("-") | ||
sl2 = wildcards.vscnchrm.replace('chr', '').split("~") | ||
|
||
if len(sl2) == 2: | ||
ret_str = pchr + wildcards.vscnchrm | ||
elif len(sl) == 1: | ||
ret_str = pchr + sl[0] | ||
elif len(sl) == 2: | ||
start = int(sl[0]) | ||
end = int(sl[1]) | ||
while start <= end: | ||
ret_str = str(ret_str) + " " + pchr + str(start) | ||
start = start + 1 | ||
else: | ||
raise Exception( | ||
"varscan2 chunks can only be one contiguous range per chunk: e.g., 1-4 with the non numerical chromosomes assigned 23=X, 24=Y, 25=MT" | ||
) | ||
|
||
return ret_mod_chrm(ret_str) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P1] Convert mitochondrial contig name for hg38 correctly
The chromosome helper get_vscn_chrm
always normalises the wildcard value 25
to "MT"
(via ret_mod_chrm
) before the shell block applies mito_code
. As a result, when the genome build is GRCh38 (cpre='chr'
, mito_code='M'
), the rule constructs chrMT
and later looks this contig up in the FASTA index, which fails because GRCh38 uses chrM
. Any configuration that requests the mitochondrial chunk will therefore error with samtools mpileup: invalid region chrMT
. The conversion to MT
must be deferred until after the genome-specific prefix is added, or the function must return "M"
when genome_build
is hg38.
Useful? React with 👍 / 👎.
Summary
Testing
snakemake --lint -s workflow/Snakefile
(fails: command not found)pip install snakemake
(fails: Cannot connect to proxy, 403)https://chatgpt.com/codex/tasks/task_e_68bf777652f4833187d5a5c4f5497bf2