Skip to content

Conversation

iamh2o
Copy link
Contributor

@iamh2o iamh2o commented Sep 9, 2025

Summary

  • add NeuSomatic somatic and ensemble Snakemake rules
  • expose chromosome config and defaults for NeuSomatic
  • hook rules into workflow and templates

Testing

  • pytest
  • snakemake --lint (fails: command not found)

https://chatgpt.com/codex/tasks/task_e_68bf7826c1148331b3681b12d14ba543

@Copilot Copilot AI review requested due to automatic review settings September 9, 2025 04:33
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds NeuSomatic variant calling capabilities to the workflow by implementing both standalone and ensemble mode Snakemake rules. NeuSomatic is a somatic variant caller that can operate independently or as an ensemble method combining results from multiple other callers.

  • Added two new Snakemake rules for NeuSomatic somatic variant calling: standalone and ensemble modes
  • Exposed chromosome configuration parameters for NeuSomatic across different genome builds
  • Integrated NeuSomatic rules into the main workflow and configuration templates

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
workflow/rules/rule_common.smk Adds chromosome configuration parsing for NeuSomatic
workflow/rules/neusomatic.smk Implements NeuSomatic standalone and ensemble variant calling rules
workflow/Snakefile Includes the new NeuSomatic rules file in the workflow
config/day_profiles/slurm/templates/rule_config.yaml Adds NeuSomatic configuration for SLURM environments
config/day_profiles/local/templates/rule_config.yaml Adds NeuSomatic configuration for local environments

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

"vardict": f"{base}/vardict/{wildcards.sample}.{wildcards.alnr}.vardict.vcf",
"varscan2": f"{base}/varscan2/{wildcards.sample}.{wildcards.alnr}.varscan2.vcf",
}
for caller in ["mutect2", "strelka2", "vardict", "varscan2"]:
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded list of callers is duplicated from the mapping dictionary keys. Consider extracting this to avoid duplication: for caller in mapping.keys():

Suggested change
for caller in ["mutect2", "strelka2", "vardict", "varscan2"]:
for caller in mapping.keys():

Copilot uses AI. Check for mistakes.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

Comment on lines +32 to +55
output:
vcf=MDIR + "{sample}/align/{alnr}/snv/neusomatic/{sample}.{alnr}.neusomatic.snv.vcf",
log:
MDIR + "{sample}/align/{alnr}/snv/neusomatic/log/{sample}.{alnr}.neusomatic.snv.log",
threads: config['neusomatic']['threads']
container:
config['neusomatic']['container']
resources:
vcpu=config['neusomatic']['threads'],
threads=config['neusomatic']['threads'],
partition=config['neusomatic']['partition'],
mem_mb=config['neusomatic']['mem_mb'],
params:
cluster_sample=ret_sample,
numa=config['neusomatic']['numa'],
shell:
r"""
set -euo pipefail
{params.numa} neusomatic.py call \
--output {output.vcf} \
--tumor {input.tumor_cram} \
--normal {input.normal_cram} \
--ref {input.ref_fa} \
--threads {threads} >> {log} 2>&1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Ensure output/log directories exist in neusomatic rule

The neusomatic rule writes its VCF and log under snv/neusomatic/... but never creates those directories before redirecting stdout/stderr. Snakemake does not auto-create parent directories for file outputs, so the shell command will fail with “No such file or directory” before neusomatic.py runs on a fresh sample. Other rules in this workflow explicitly mkdir -p $(dirname {output}) to avoid this. Consider creating the neusomatic and neusomatic/log directories prior to invoking the tool.

Useful? React with 👍 / 👎.

Comment on lines +70 to +95
output:
vcf=MDIR + "{sample}/align/{alnr}/snv/neusomatic/{sample}.{alnr}.neusomatic_ensemble.snv.vcf",
log:
MDIR + "{sample}/align/{alnr}/snv/neusomatic/log/{sample}.{alnr}.neusomatic_ensemble.snv.log",
threads: config['neusomatic']['threads']
container:
config['neusomatic']['container']
resources:
vcpu=config['neusomatic']['threads'],
threads=config['neusomatic']['threads'],
partition=config['neusomatic']['partition'],
mem_mb=config['neusomatic']['mem_mb'],
params:
cluster_sample=ret_sample,
numa=config['neusomatic']['numa'],
caller_vcfs=lambda wildcards: " ".join(get_neusom_ensemble_callers(wildcards)),
shell:
r"""
set -euo pipefail
{params.numa} neusomatic.py ensemble \
--output {output.vcf} \
--tumor {input.tumor_cram} \
--normal {input.normal_cram} \
--ref {input.ref_fa} \
--callers {params.caller_vcfs} \
--threads {threads} >> {log} 2>&1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Missing directory creation in neusomatic_ensemble rule

The ensemble rule also writes to snv/neusomatic/... without ensuring those directories exist. When this rule runs for the first time, the redirection >> {log} and --output {output.vcf} will fail if the directories were not created by another rule, causing the job to error out before the ensemble step starts. Add a mkdir -p for the VCF and log parent paths to prevent this runtime failure.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant