Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new nf-core subworkflow freyja for weighted variant analysis #375

Merged
merged 20 commits into from
Aug 16, 2023
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Enhancements & fixes

- [[#299](https://github.com/nf-core/viralrecon/issues/299)] - Add the freyja pipeline as a subworkflow

## [[2.6.0](https://github.com/nf-core/viralrecon/releases/tag/2.6.0)] - 2023-03-23

### Credits
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,8 @@ A number of improvements were made to the pipeline recently, mainly with regard
- Consensus assessment report ([`QUAST`](http://quast.sourceforge.net/quast))
- Lineage analysis ([`Pangolin`](https://github.com/cov-lineages/pangolin))
- Clade assignment, mutation calling and sequence quality checks ([`Nextclade`](https://github.com/nextstrain/nextclade))
9. Create variants long format table collating per-sample information for individual variants ([`BCFTools`](http://samtools.github.io/bcftools/bcftools.html)), functional effect prediction ([`SnpSift`](http://snpeff.sourceforge.net/SnpSift.html)) and lineage analysis ([`Pangolin`](https://github.com/cov-lineages/pangolin))
9. Recover relative lineage abundances from mixed SARS-CoV-2 samples ([`Freyja`](https://github.com/andersen-lab/Freyja))
10. Create variants long format table collating per-sample information for individual variants ([`BCFTools`](http://samtools.github.io/bcftools/bcftools.html)), functional effect prediction ([`SnpSift`](http://snpeff.sourceforge.net/SnpSift.html)) and lineage analysis ([`Pangolin`](https://github.com/cov-lineages/pangolin))
6. _De novo_ assembly
1. Primer trimming ([`Cutadapt`](https://cutadapt.readthedocs.io/en/stable/guide.html); _amplicon data only_)
2. Choice of multiple assembly tools ([`SPAdes`](http://cab.spbu.ru/software/spades/) _||_ [`Unicycler`](https://github.com/rrwick/Unicycler) _||_ [`minia`](https://github.com/GATB/minia))
Expand All @@ -78,6 +79,7 @@ A number of improvements were made to the pipeline recently, mainly with regard
- Lineage analysis ([`Pangolin`](https://github.com/cov-lineages/pangolin))
- Clade assignment, mutation calling and sequence quality checks ([`Nextclade`](https://github.com/nextstrain/nextclade))
- Individual variant screenshots with annotation tracks ([`ASCIIGenome`](https://asciigenome.readthedocs.io/en/latest/))
- Recover relative lineage abundances from mixed SARS-CoV-2 samples ([`Freyja`](https://github.com/andersen-lab/Freyja))
- Create variants long format table collating per-sample information for individual variants ([`BCFTools`](http://samtools.github.io/bcftools/bcftools.html)), functional effect prediction ([`SnpSift`](http://snpeff.sourceforge.net/SnpSift.html)) and lineage analysis ([`Pangolin`](https://github.com/cov-lineages/pangolin))
8. Present QC, visualisation and custom reporting for sequencing, raw reads, alignment and variant calling results ([`MultiQC`](http://multiqc.info/))

Expand Down
33 changes: 33 additions & 0 deletions conf/modules_illumina.config
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,39 @@ if (!params.skip_variants) {
}
}

if (!params.skip_freyja) {
process {
withName: 'FREYJA_VARIANTS' {
publishDir = [
path: { "${params.outdir}/variants/freyja/variants" },
mode: params.publish_dir_mode,
pattern: "*.{tsv,csv}"
]
}
withName: 'FREYJA_DEMIX' {
publishDir = [
path: { "${params.outdir}/variants/freyja/demix" },
mode: params.publish_dir_mode,
pattern: "*.{tsv,csv}"
]
}
withName: 'FREYJA_BOOT' {
ext.args = '--boxplot PDF'
publishDir = [
path: { "${params.outdir}/variants/freyja/bootstrap" },
mode: params.publish_dir_mode,
pattern: "*.{tsv,csv,pdf}"
]
}
withName: 'FREYJA_UPDATE' {
publishDir = [
path: { "${params.outdir}/variants/freyja/" },
mode: params.publish_dir_mode,
]
}
}
}

if (!params.skip_ivar_trim && params.protocol == 'amplicon') {
process {
withName: 'IVAR_TRIM' {
Expand Down
33 changes: 33 additions & 0 deletions conf/modules_nanopore.config
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,39 @@ if (!params.skip_nextclade) {
}
}

if (!params.skip_freyja) {
process {
withName: 'FREYJA_VARIANTS' {
publishDir = [
path: { "${params.outdir}/${params.artic_minion_caller}/freyja/variants" },
mode: params.publish_dir_mode,
pattern: "*.{tsv,csv}"
]
}
withName: 'FREYJA_DEMIX' {
publishDir = [
path: { "${params.outdir}/${params.artic_minion_caller}/freyja/demix" },
mode: params.publish_dir_mode,
pattern: "*.{tsv,csv}"
]
}
withName: 'FREYJA_BOOT' {
ext.args = '--boxplot PDF'
publishDir = [
path: { "${params.outdir}/${params.artic_minion_caller}/freyja/bootstrap" },
mode: params.publish_dir_mode,
pattern: "*.{tsv,csv,pdf}"
]
}
withName: 'FREYJA_UPDATE' {
publishDir = [
path: { "${params.outdir}/${params.artic_minion_caller}/freyja/" },
mode: params.publish_dir_mode,
]
}
}
}

if (!params.skip_variants_quast) {
process {
withName: 'QUAST' {
Expand Down
1 change: 1 addition & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ params {

// Variant calling options
variant_caller = 'ivar'
freyja_repeats = 10

// Assembly options
assemblers = 'spades,unicycler,minia'
Expand Down
1 change: 1 addition & 0 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ params {

// Variant calling options
variant_caller = 'ivar'
freyja_repeats = 10

// Assembly options
assemblers = 'spades,unicycler,minia'
Expand Down
3 changes: 3 additions & 0 deletions conf/test_full_nanopore.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ params {
genome = 'MN908947.3'
primer_set_version = 3

// variant calling options
freyja_repeats = 10

// Other parameters
artic_minion_medaka_model = 's3://ngi-igenomes/test-data/viralrecon/20210205_1526_X4_FAP51364_21fa8135/r941_min_high_g360_model.hdf5'
}
1 change: 1 addition & 0 deletions conf/test_full_sispa.config
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ params {

// Variant calling options
variant_caller = 'bcftools'
freyja_repeats = 10

// Assembly options
assemblers = 'spades,unicycler,minia'
Expand Down
3 changes: 3 additions & 0 deletions conf/test_nanopore.config
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ params {
genome = 'MN908947.3'
primer_set_version = 3

// variant calling options
freyja_repeats = 10

// Other parameters
artic_minion_medaka_model = 's3://ngi-igenomes/test-data/viralrecon/minion_test/r941_min_high_g360_model.hdf5'
}
1 change: 1 addition & 0 deletions conf/test_sispa.config
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ params {

// Variant calling options
variant_caller = 'bcftools'
freyja_repeats = 10

// Assembly options
assemblers = 'spades,unicycler,minia'
Expand Down
47 changes: 47 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ The directories listed below will be created in the results directory after the
- [Pangolin](#nanopore-pangolin) - Lineage analysis
- [Nextclade](#nanopore-nextclade) - Clade assignment, mutation calling and sequence quality checks
- [ASCIIGenome](#nanopore-asciigenome) - Individual variant screenshots with annotation tracks
- [Freyja](#fre)
Joon-Klaps marked this conversation as resolved.
Show resolved Hide resolved
- [Variants long table](#nanopore-variants-long-table) - Collate per-sample information for individual variants, functional effect prediction and lineage analysis
- [Workflow reporting](#nanopore-workflow-reporting)
- [MultiQC](#nanopore-multiqc) - Present QC, visualisation and custom reporting for sequencing, raw reads, alignment and variant calling results
Expand Down Expand Up @@ -241,6 +242,30 @@ Phylogenetic Assignment of Named Global Outbreak LINeages ([Pangolin](https://gi

[Nextclade](https://github.com/nextstrain/nextclade) performs viral genome clade assignment, mutation calling and sequence quality checks for the consensus sequences generated in this pipeline. Similar to Pangolin, it has been used extensively during the COVID-19 pandemic. A [web application](https://clades.nextstrain.org/) also exists that allows users to upload genome sequences via a web browser.

### Nanopore: Freyja

<details markdown="1">
<summary>Output files</summary>

- `<CALLER>/freyja/demix`
- `*.tsv`: Analysis results including the lineages present, their corresponding abundances, and summarization by constellation
- `<CALLER>/freyja/freyja_db`
- `.json`: dataset containing lineage metadata that correspond to barcodes.
- `.yml`: dataset containing the lineage topology.
- `.csv`: dataset containing lineage defining barcodes.
- `<CALLER>/freyja/variants`
- `*.variants.tsv`: Analysis results including identified variants in a gff-like format
- `*.depth.tsv`: Analysis results including the depth of the identified variants
- `<CALLER>/freyja/boot`
- `*lineages.csv` Analysis results inculding lineages present and their corresponding abundances with variation identified through bootstrapping
- `*summarized.csv`Analysis results inculding lineages present but summarized by constellation and their corresponding abundances with variation identified through bootstrapping

**NB:** The value of `<CALLER>` in the output directory name above is determined by the `--artic_minion_caller` parameter (Default: 'nanopolish').

</details>

[Freyja](https://github.com/andersen-lab/Freyja) is a tool to recover relative lineage abundances from mixed SARS-CoV-2 samples from a sequencing dataset (BAM aligned to the Hu-1 reference). The method uses lineage-determining mutational "barcodes" derived from the [UShER](https://usher-wiki.readthedocs.io/en/latest/#) global phylogenetic tree as a basis set to solve the constrained (unit sum, non-negative) de-mixing problem.

### Nanopore: ASCIIGenome

<details markdown="1">
Expand Down Expand Up @@ -697,6 +722,28 @@ Phylogenetic Assignment of Named Global Outbreak LINeages ([Pangolin](https://gi

[Nextclade](https://github.com/nextstrain/nextclade) performs viral genome clade assignment, mutation calling and sequence quality checks for the consensus sequences generated in this pipeline. Similar to Pangolin, it has been used extensively during the COVID-19 pandemic. A [web application](https://clades.nextstrain.org/) also exists that allows users to upload genome sequences via a web browser.

### Freyja

<details markdown="1">
<summary>Output files</summary>

- `<CALLER>/freyja/demix`
- `*.tsv`: Analysis results including the lineages present, their corresponding abundances, and summarization by constellation
- `<CALLER>/freyja/freyja_db`
- `.json`: dataset containing lineage metadata that correspond to barcodes.
- `.yml`: dataset containing the lineage topology.
- `.csv`: dataset containing lineage defining barcodes.
- `<CALLER>/freyja/variants`
- `*.variants.tsv`: Analysis results including identified variants in a gff-like format
- `*.depth.tsv`: Analysis results including the depth of the identified variants
- `<CALLER>/freyja/boot`
- `*lineages.csv` Analysis results inculding lineages present and their corresponding abundances with variation identified through bootstrapping
- `*summarized.csv`Analysis results inculding lineages present but summarized by constellation and their corresponding abundances with variation identified through bootstrapping

</details>

[Freyja](https://github.com/andersen-lab/Freyja) is a tool to recover relative lineage abundances from mixed SARS-CoV-2 samples from a sequencing dataset (BAM aligned to the Hu-1 reference). The method uses lineage-determining mutational "barcodes" derived from the [UShER](https://usher-wiki.readthedocs.io/en/latest/#) global phylogenetic tree as a basis set to solve the constrained (unit sum, non-negative) de-mixing problem.

Joon-Klaps marked this conversation as resolved.
Show resolved Hide resolved
### Variants long table

<details markdown="1">
Expand Down
4 changes: 4 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,10 @@ If the `--save_reference` parameter is provided then the Nextclade dataset gener

> **NB:** If you wish to periodically update individual tool-specific results (e.g. Pangolin) generated by the pipeline then you must ensure to keep the `work/` directory otherwise the `-resume` ability of the pipeline will be compromised and it will restart from scratch.

#### Freyja

[Freyja](https://github.com/andersen-lab/Freyja) depends on a dataset of barcodes that uses lineage defening mutations (provide by [UShER](https://usher-wiki.readthedocs.io/en/latest/#)), by default the most recent barcodes will be downloaded. However, when running analyses across large time windows and these analyses need to be compared, it might be of interest to keep the constant barcodes (or rerun all freyja analyses with the most recent dataset). To do this specify the barcodes and lineages using the variables `freyja_barcodes`, `freyja_lineages` respectivly.

### nf-core/configs

In most cases, you will only need to create a custom config as a one-off but if you and others within your organisation are likely to be running nf-core pipelines regularly and need to use the same settings regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter. You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile.
Expand Down
25 changes: 25 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,26 @@
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"freyja/boot": {
"branch": "master",
"git_sha": "281c744ed84352c24697f0916c7744853ce83927",
"installed_by": ["bam_variant_demix_boot_freyja"]
},
"freyja/demix": {
"branch": "master",
"git_sha": "4bb5cb441e89811385d18a90809ecf36c9daafd8",
"installed_by": ["bam_variant_demix_boot_freyja"]
},
"freyja/update": {
"branch": "master",
"git_sha": "4bb5cb441e89811385d18a90809ecf36c9daafd8",
"installed_by": ["bam_variant_demix_boot_freyja"]
},
"freyja/variants": {
"branch": "master",
"git_sha": "4bb5cb441e89811385d18a90809ecf36c9daafd8",
"installed_by": ["bam_variant_demix_boot_freyja"]
},
"gunzip": {
"branch": "master",
"git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
Expand Down Expand Up @@ -279,6 +299,11 @@
"git_sha": "b4b7f89e7fd6d2293f0c176213f710e0bcdaf19e",
"installed_by": ["bam_sort_stats_samtools", "bam_markduplicates_picard"]
},
"bam_variant_demix_boot_freyja": {
"branch": "master",
"git_sha": "4bb5cb441e89811385d18a90809ecf36c9daafd8",
"installed_by": ["subworkflows"]
},
"fastq_align_bowtie2": {
"branch": "master",
"git_sha": "ac75f79157ecc64283a2b3a559f1ba90bc0f2259",
Expand Down
56 changes: 56 additions & 0 deletions modules/nf-core/freyja/boot/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading