-
Notifications
You must be signed in to change notification settings - Fork 1
Add NeuSomatic variant calling rules #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds NeuSomatic variant calling capabilities to the workflow by implementing both standalone and ensemble mode Snakemake rules. NeuSomatic is a somatic variant caller that can operate independently or as an ensemble method combining results from multiple other callers.
- Added two new Snakemake rules for NeuSomatic somatic variant calling: standalone and ensemble modes
- Exposed chromosome configuration parameters for NeuSomatic across different genome builds
- Integrated NeuSomatic rules into the main workflow and configuration templates
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
workflow/rules/rule_common.smk | Adds chromosome configuration parsing for NeuSomatic |
workflow/rules/neusomatic.smk | Implements NeuSomatic standalone and ensemble variant calling rules |
workflow/Snakefile | Includes the new NeuSomatic rules file in the workflow |
config/day_profiles/slurm/templates/rule_config.yaml | Adds NeuSomatic configuration for SLURM environments |
config/day_profiles/local/templates/rule_config.yaml | Adds NeuSomatic configuration for local environments |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
"vardict": f"{base}/vardict/{wildcards.sample}.{wildcards.alnr}.vardict.vcf", | ||
"varscan2": f"{base}/varscan2/{wildcards.sample}.{wildcards.alnr}.varscan2.vcf", | ||
} | ||
for caller in ["mutect2", "strelka2", "vardict", "varscan2"]: |
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded list of callers is duplicated from the mapping dictionary keys. Consider extracting this to avoid duplication: for caller in mapping.keys():
for caller in ["mutect2", "strelka2", "vardict", "varscan2"]: | |
for caller in mapping.keys(): |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codex Review: Here are some suggestions.
Reply with @codex fix comments
to fix any unresolved comments.
About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".
output: | ||
vcf=MDIR + "{sample}/align/{alnr}/snv/neusomatic/{sample}.{alnr}.neusomatic.snv.vcf", | ||
log: | ||
MDIR + "{sample}/align/{alnr}/snv/neusomatic/log/{sample}.{alnr}.neusomatic.snv.log", | ||
threads: config['neusomatic']['threads'] | ||
container: | ||
config['neusomatic']['container'] | ||
resources: | ||
vcpu=config['neusomatic']['threads'], | ||
threads=config['neusomatic']['threads'], | ||
partition=config['neusomatic']['partition'], | ||
mem_mb=config['neusomatic']['mem_mb'], | ||
params: | ||
cluster_sample=ret_sample, | ||
numa=config['neusomatic']['numa'], | ||
shell: | ||
r""" | ||
set -euo pipefail | ||
{params.numa} neusomatic.py call \ | ||
--output {output.vcf} \ | ||
--tumor {input.tumor_cram} \ | ||
--normal {input.normal_cram} \ | ||
--ref {input.ref_fa} \ | ||
--threads {threads} >> {log} 2>&1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P1] Ensure output/log directories exist in neusomatic
rule
The neusomatic
rule writes its VCF and log under snv/neusomatic/...
but never creates those directories before redirecting stdout/stderr. Snakemake does not auto-create parent directories for file outputs, so the shell command will fail with “No such file or directory” before neusomatic.py
runs on a fresh sample. Other rules in this workflow explicitly mkdir -p $(dirname {output})
to avoid this. Consider creating the neusomatic
and neusomatic/log
directories prior to invoking the tool.
Useful? React with 👍 / 👎.
output: | ||
vcf=MDIR + "{sample}/align/{alnr}/snv/neusomatic/{sample}.{alnr}.neusomatic_ensemble.snv.vcf", | ||
log: | ||
MDIR + "{sample}/align/{alnr}/snv/neusomatic/log/{sample}.{alnr}.neusomatic_ensemble.snv.log", | ||
threads: config['neusomatic']['threads'] | ||
container: | ||
config['neusomatic']['container'] | ||
resources: | ||
vcpu=config['neusomatic']['threads'], | ||
threads=config['neusomatic']['threads'], | ||
partition=config['neusomatic']['partition'], | ||
mem_mb=config['neusomatic']['mem_mb'], | ||
params: | ||
cluster_sample=ret_sample, | ||
numa=config['neusomatic']['numa'], | ||
caller_vcfs=lambda wildcards: " ".join(get_neusom_ensemble_callers(wildcards)), | ||
shell: | ||
r""" | ||
set -euo pipefail | ||
{params.numa} neusomatic.py ensemble \ | ||
--output {output.vcf} \ | ||
--tumor {input.tumor_cram} \ | ||
--normal {input.normal_cram} \ | ||
--ref {input.ref_fa} \ | ||
--callers {params.caller_vcfs} \ | ||
--threads {threads} >> {log} 2>&1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P1] Missing directory creation in neusomatic_ensemble
rule
The ensemble rule also writes to snv/neusomatic/...
without ensuring those directories exist. When this rule runs for the first time, the redirection >> {log}
and --output {output.vcf}
will fail if the directories were not created by another rule, causing the job to error out before the ensemble step starts. Add a mkdir -p
for the VCF and log parent paths to prevent this runtime failure.
Useful? React with 👍 / 👎.
Summary
Testing
pytest
snakemake --lint
(fails: command not found)https://chatgpt.com/codex/tasks/task_e_68bf7826c1148331b3681b12d14ba543