Skip to content

Conversation

iamh2o
Copy link
Contributor

@iamh2o iamh2o commented Sep 9, 2025

Summary

  • add VarScan2 somatic calling rule for tumor/normal pairs
  • hook rule into workflow includes and shared chromosome config

Testing

  • snakemake --lint -s workflow/Snakefile (fails: command not found)
  • pip install snakemake (fails: Cannot connect to proxy, 403)

https://chatgpt.com/codex/tasks/task_e_68bf777652f4833187d5a5c4f5497bf2

@Copilot Copilot AI review requested due to automatic review settings September 9, 2025 04:32
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds VarScan2 somatic variant calling functionality to the Snakemake workflow for processing tumor/normal sample pairs. VarScan2 is a variant calling tool specifically designed for somatic mutation detection in cancer genomics.

  • Implements a complete VarScan2 variant calling pipeline with chromosome chunking
  • Adds configuration support for VarScan2 chromosome processing
  • Integrates the new rule into the main workflow

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
workflow/rules/varscan2.smk Complete VarScan2 rule implementation with somatic calling, sorting, concatenation, and preparation steps
workflow/rules/rule_common.smk Adds VARSCAN_CHRMS configuration variable for chromosome chunk processing
workflow/Snakefile Includes the new VarScan2 rules file in the main workflow

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

bcftools view -O b -o $bcf --threads {threads} $vcf && bcftools index --threads 4 $bcf;
done;
touch {output};
{latency_wait};
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable latency_wait is used in shell commands but is not defined in the rule parameters. This will cause the shell command to fail.

Copilot uses AI. Check for mistakes.

touch {output};
{latency_wait};
ls {output} >> {log} 2>&1;
{latency_wait};
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable latency_wait is used in shell commands but is not defined in the rule parameters. This will cause the shell command to fail.

Copilot uses AI. Check for mistakes.

"varscan2 chunks can only be one contiguous range per chunk: e.g., 1-4 with the non numerical chromosomes assigned 23=X, 24=Y, 25=MT"
)

return ret_mod_chrm(ret_str)
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function ret_mod_chrm is called but not defined in this file or imported. This will cause a NameError when the function is executed.

Copilot uses AI. Check for mistakes.

shell:
"""
( echo {output} ;
mkdir -p $(dirname {output} );
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using dirname on multiple space-separated paths in {output} will not work correctly. The shell will interpret this as dirname path1 path2 path3 which is invalid. Use a loop or process each output file individually.

Suggested change
mkdir -p $(dirname {output} );
for f in {output}; do mkdir -p "$(dirname "$f")"; done ;

Copilot uses AI. Check for mistakes.

"""
bedtools sort -header -i {input.vcf} > {output.vcfsort} 2>> {log};
bgzip {output.vcfsort} >> {log} 2>&1;
touch {output.vcfsort};
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file {output.vcfsort} is being touched after it was already compressed with bgzip on line 116, which will create an empty file with the same name as the compressed file, potentially causing confusion.

Suggested change
touch {output.vcfsort};

Copilot uses AI. Check for mistakes.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

Comment on lines +8 to +30
def get_vscn_chrm(wildcards):
pchr = "" # prefix handled already
ret_str = ""
sl = wildcards.vscnchrm.replace('chr', '').split("-")
sl2 = wildcards.vscnchrm.replace('chr', '').split("~")

if len(sl2) == 2:
ret_str = pchr + wildcards.vscnchrm
elif len(sl) == 1:
ret_str = pchr + sl[0]
elif len(sl) == 2:
start = int(sl[0])
end = int(sl[1])
while start <= end:
ret_str = str(ret_str) + " " + pchr + str(start)
start = start + 1
else:
raise Exception(
"varscan2 chunks can only be one contiguous range per chunk: e.g., 1-4 with the non numerical chromosomes assigned 23=X, 24=Y, 25=MT"
)

return ret_mod_chrm(ret_str)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Convert mitochondrial contig name for hg38 correctly

The chromosome helper get_vscn_chrm always normalises the wildcard value 25 to "MT" (via ret_mod_chrm) before the shell block applies mito_code. As a result, when the genome build is GRCh38 (cpre='chr', mito_code='M'), the rule constructs chrMT and later looks this contig up in the FASTA index, which fails because GRCh38 uses chrM. Any configuration that requests the mitochondrial chunk will therefore error with samtools mpileup: invalid region chrMT. The conversion to MT must be deferred until after the genome-specific prefix is added, or the function must return "M" when genome_build is hg38.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant