Skip to content

Commit

Permalink
Replace dipcall (#566)
Browse files Browse the repository at this point in the history
* wip

* Replace dipcall

* Replace dipcall

* description

* Update CHANGELOG

* ignore bam md5

* CHANGELOG

* Update main.nf

* Update workflows/nallo.nf

Co-authored-by: Anders Jemt <jemten@users.noreply.github.com>

---------

Co-authored-by: Anders Jemt <jemten@users.noreply.github.com>
  • Loading branch information
fellen31 and jemten authored Feb 18, 2025
1 parent 9d926b1 commit 6b1d91f
Show file tree
Hide file tree
Showing 31 changed files with 687 additions and 622 deletions.
9 changes: 5 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,14 @@ jobs:
- "samplesheet"
- "samplesheet_multisample_bam"
- "samplesheet_multisample_ont_bam"
- "SHORT_VARIANT_CALLING"
- "SNV_ANNOTATION"
- "CALL_SVS"
- "ALIGN_ASSEMBLIES"
- "ANNOTATE_SVS"
- "RANK_VARIANTS"
- "CALL_REPEAT_EXPANSIONS"
- "CALL_SVS"
- "METHYLATION"
- "RANK_VARIANTS"
- "SHORT_VARIANT_CALLING"
- "SNV_ANNOTATION"
profile:
- "docker"
steps:
Expand Down
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#557](https://github.com/genomic-medicine-sweden/nallo/pull/557) - Updated Severus to version 1.3
- [#558](https://github.com/genomic-medicine-sweden/nallo/pull/558) - Changed VEP to single-threaded by default, because of https://github.com/Ensembl/ensembl-vep/issues/1759
- [#560](https://github.com/genomic-medicine-sweden/nallo/pull/560) - Updated template to nf-core/tools version 3.2.0
- [#566](https://github.com/genomic-medicine-sweden/nallo/pull/566) - Replaced dipcall with `ALIGN_ASSEMBLIES`, mostly mimicing the alignment part of dipcall, while omitting the variant calling. Updated docs and output files.

### `Removed`

### `Fixed`

- [#546](https://github.com/genomic-medicine-sweden/nallo/pull/546) - Fixed output filenames missmatches in documentation compared to pipeline
- [#556](https://github.com/genomic-medicine-sweden/nallo/pull/556) - Fixed an issue where the pipeline could not run with `--skip_snv_annotation`
- [#566](https://github.com/genomic-medicine-sweden/nallo/pull/566) - Fixed wrong minimap2 mapping preset for genome assemblies

### Parameters

Expand All @@ -59,6 +61,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
| bcftools merge | | 1.20 |
| merge_json | | 1.0 |
| severus | 1.1 | 1.3 |
| dipcall | 0.3 | |
| tagbam | | 0.1.0 |

> [!NOTE]
> Version has been updated if both old and new version information is present.
Expand Down
6 changes: 3 additions & 3 deletions assets/software_references.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,6 @@ tool:
deepvariant:
citation: "DeepVariant (Poplin et al. 2018)"
bibliography: "Poplin R, Chang PC, Alexander D, et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol. 2018;36(10):983-987. doi:10.1038/nbt.4235"
dipcall:
citation: "dipcall (Li et al. 2018)"
bibliography: "Li H, Bloom JM, Farjoun Y, Fleharty M, Gauthier L, Neale B, MacArthur D (2018) A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat Methods, 15:595-597. [PMID:30013044]"
echtvar:
citation: "Echtvar (Pedersen & de Ridder 2023)"
bibliography: "Brent S Pedersen, Jeroen de Ridder, Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels, Nucleic Acids Research, Volume 51, Issue 1, 11 January 2023, Page e3, https://doi.org/10.1093/nar/gkac931"
Expand Down Expand Up @@ -131,6 +128,9 @@ tool:
tabix:
citation: "Tabix (Li 2011)"
bibliography: "Li H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics. 2011;27(5):718-719. doi:10.1093/bioinformatics/btq671"
tagbam:
citation: "Felix Lenner (2025)"
bibliography: ""
trgt:
citation: "TRGT (Dolzhenko et al. 2024)"
bibliography: "Dolzhenko, E., English, A., Dashnow, H. et al. Characterization and visualization of tandem repeats at genome scale. Nat Biotechnol (2024). https://doi.org/10.1038/s41587-023-02057-3"
Expand Down
51 changes: 51 additions & 0 deletions conf/modules/align_assemblies.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config file for defining DSL2 per module options and publishing paths
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Available keys to override module options:
ext.args = Additional arguments appended to command in module.
ext.args2 = Second set of arguments appended to command in module (multi-tool modules).
ext.args3 = Third set of arguments appended to command in module (multi-tool modules).
ext.prefix = File name prefix for output files.
----------------------------------------------------------------------------------------
*/

process {
withName: '.*:ALIGN_ASSEMBLIES:.*' {
publishDir = [
enabled: false
]
}
withName: '.*:ALIGN_ASSEMBLIES:MINIMAP2_INDEX' {
ext.prefix = { "${meta.id}_assembly_index" }
ext.args = '-x asm5'
}
withName: '.*:ALIGN_ASSEMBLIES:MINIMAP2_ALIGN' {
ext.prefix = { "${meta.id}_aligned_assembly_haplotype_${meta.haplotype}" }
}
withName: '.*:ALIGN_ASSEMBLIES:SAMTOOLS_VIEW' {
ext.prefix = { "${meta.id}_aligned_assembly_haplotype_${meta.haplotype}_filtered" }
// Mimic default settings from samflt in dipcall.aux.js
ext.args = [
'--excl-flags SECONDARY',
'--min-MQ 5',
'--min-qlen 50000'
].join(' ')
}
withName: '.*:ALIGN_ASSEMBLIES:TAGBAM' {
ext.prefix = { "${meta.id}_aligned_assembly_haplotype_${meta.haplotype}_filtered_tagged" }
ext.args = { [
'--tag HP',
"--value ${meta.haplotype}"
].join(' ') }
}
withName: '.*:ALIGN_ASSEMBLIES:SAMTOOLS_MERGE' {
ext.prefix = { "${meta.id}_aligned_assembly" }
ext.args = '--write-index'
publishDir = [
path: { "${params.outdir}/assembly/sample/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
}
37 changes: 0 additions & 37 deletions conf/modules/assembly_variant_calling.config

This file was deleted.

7 changes: 4 additions & 3 deletions conf/modules/genome_assembly.config
Original file line number Diff line number Diff line change
Expand Up @@ -52,11 +52,12 @@ process {

withName: '.*:ASSEMBLY:GFASTATS.*' {
ext.args = '--discover-paths'
ext.prefix = { "${assembly.baseName}" }
ext.prefix = { "${meta.id}_haplotype_${meta.haplotype}" }

publishDir = [
path: { "${params.outdir}/assembly_haplotypes/gfastats/${meta.id}" },
path: { "${params.outdir}/assembly/stats/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
saveAs: { filename -> filename.equals('versions.yml') || filename.endsWith('.fasta.gz') ? null : filename }
]
}
}
4 changes: 0 additions & 4 deletions docs/CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,6 @@

> Poplin R, Chang PC, Alexander D, et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol. 2018;36(10):983-987. doi:10.1038/nbt.4235
- [dipcall](https://www.nature.com/articles/s41592-018-0054-7)

> Li H, Bloom JM, Farjoun Y, Fleharty M, Gauthier L, Neale B, MacArthur D (2018) A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat Methods, 15:595-597. [PMID:30013044]
- [echtvar](https://academic.oup.com/nar/article/51/1/e3/6775383)

> Brent S Pedersen, Jeroen de Ridder, Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels, Nucleic Acids Research, Volume 51, Issue 1, 11 January 2023, Page e3, https://doi.org/10.1093/nar/gkac931
Expand Down
19 changes: 8 additions & 11 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,17 +20,14 @@ This document describes the pipeline output files and the tools used to generate

## Assembly

[Hifiasm](https://github.com/chhylp123/hifiasm) is used to assemble genomes. The assembled haplotypes are then converted to fasta files using [gfastats](https://github.com/vgl-hub/gfastats). A deconstructed version of [dipcall](https://github.com/lh3/dipcall) is used to map the assembled haplotypes back to the reference genome.

| Path | Description |
| ------------------------------------------------------------ | ---------------------------------------------------- |
| `assembly_haplotypes/gfastats/{sample}/*hap1.p_ctg.fasta.gz` | Assembled haplotype 1 |
| `assembly_haplotypes/gfastats/{sample}/*hap2.p_ctg.fasta.gz` | Assembled haplotype 2 |
| `assembly_haplotypes/gfastats/{sample}/*.assembly_summary` | Summary statistics |
| `assembly_variant_calling/dipcall/{sample}/*hap1.bam` | Assembled haplotype 1 mapped to the reference genome |
| `assembly_variant_calling/dipcall/{sample}/*hap1.bai` | Index of the corresponding BAM file for haplotype 1 |
| `assembly_variant_calling/dipcall/{sample}/*hap2.bam` | Assembled haplotype 2 mapped to the reference genome |
| `assembly_variant_calling/dipcall/{sample}/*hap2.bai` | Index of the corresponding BAM file for haplotype 2 |
[Hifiasm](https://github.com/chhylp123/hifiasm) is used to assemble genomes. The assembled haplotypes are then aligned to the reference genome with [minimap2](https://github.com/lh3/minimap2), tagged with `HP:1` for the "paternal" haplotype, and `HP:2` for the "maternal" haplotype, before being merged together into one file with [samtools](https://github.com/samtools/samtools). [gfastats](https://github.com/vgl-hub/gfastats) is used to convert the assembly to fasta format before alignment, and also ouputs summary stats per haplotype.

| Path | Description |
| ---------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
| `assembly/sample/{sample}/{sample}_aligned_assembly.bam` | Both assembled haplotypes mapped to the reference genome, merged and haplotagged (`HP:1`/`HP:2`). |
| `assembly/sample/{sample}/{sample}_aligned_assembly.bam.bai` | Index of aligned assembly. |
| `assembly/stats/{sample}/{sample}_haplotype_1.assembly_summary` | Summary statistics for haplotype 1/paternal haplotype |
| `assembly/stats/${sample}/{sample}_haplotype_2.assembly_summary` | Summary statistics for haplotype 2/maternal haplotype |

## Methylation pileups

Expand Down
10 changes: 1 addition & 9 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,15 +147,7 @@ Turned off with `--skip_qc`.

### Assembly

This subworkflow contains both genome assembly and assembly variant calling. The assembly variant calling needs the sex of samples. For samples with unknown sex this is inferred with the help of the aligned reads. Therefore it depends on the alignment subworkflow. It requires a BED file with PARs.

| Parameter | Description |
| ------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `par_regions` | A BED file with PAR regions (e.g. [GRCh38_PAR.bed](https://storage.googleapis.com/deepvariant/case-study-testdata/GRCh38_PAR.bed)) |

!!!warning

Make sure chrY PAR is hard masked (masked with the letter N) in the reference genome you are using.
This subworkflow contains both genome assembly and alignment of assemblies to the reference genome. The genome assembly assemblies the genome into two haplotypes and converts it to fasta. The align assemblies subworkflow then maps the reads to the reference genome, merges and haplotags them, and requires no additional files except the reference genome.

Turned off with `--skip_genome_assembly`.

Expand Down
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,11 @@
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"tagbam": {
"branch": "master",
"git_sha": "84a45189310c8381dfea09f3cca03ac4104809a4",
"installed_by": ["modules"]
},
"trgt/genotype": {
"branch": "master",
"git_sha": "484afd16770cf3c466a6c385e33746c877656663",
Expand Down
7 changes: 0 additions & 7 deletions modules/local/dipcall/enviroment.yml

This file was deleted.

103 changes: 0 additions & 103 deletions modules/local/dipcall/main.nf

This file was deleted.

7 changes: 7 additions & 0 deletions modules/nf-core/tagbam/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 6b1d91f

Please sign in to comment.