Skip to content

Commit

Permalink
Merge pull request nf-core#375 from Joon-Klaps/freyja
Browse files Browse the repository at this point in the history
Adding new nf-core subworkflow freyja for weighted variant analysis
  • Loading branch information
drpatelh committed Aug 16, 2023
2 parents 75d3a6c + 8c74f8f commit 23e3441
Show file tree
Hide file tree
Showing 27 changed files with 902 additions and 27 deletions.
38 changes: 34 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,50 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Credits

Special thanks to the following for their code contributions to the release:

- [Adam Talbot](https://github.com/adamrtalbot)
- [Joon Klaps](https://github.com/Joon-Klaps)

### Software dependencies
Thank you to everyone else that has contributed by reporting bugs, enhancements or in any other way, shape or form.

### Enhancements & fixes

- [[#387](https://github.com/nf-core/viralrecon/pull/387/files)] - Software closes gracefully when encountering an error
- [[#299](https://github.com/nf-core/viralrecon/issues/299)] - Add the freyja pipeline as a subworkflow
- [[PR #387](https://github.com/nf-core/viralrecon/pull/387)] - Software closes gracefully when encountering an error

## [[2.6.0](https://github.com/nf-core/viralrecon/releases/tag/2.6.0)] - 2023-03-23
### Parameters

### Credits
| Old parameter | New parameter |
| ------------------- | ------------- |
| `--skip_freyja` | |
| `--freyja_repeats` | |
| `--freyja_db_name` | |
| `--freyja_barcodes` | |
| `--freyja_lineages` | |

> **NB:** Parameter has been **updated** if both old and new parameter information is present.
> **NB:** Parameter has been **added** if just the new parameter information is present.
> **NB:** Parameter has been **removed** if new parameter information isn't present.
### Software dependencies

Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own [Biocontainer](https://biocontainers.pro/#/registry). This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.

| Dependency | Old version | New version |
| ---------- | ----------- | ----------- |
| `freyja` | | 1.3.12 |

> **NB:** Dependency has been **updated** if both old and new version information is present.
>
> **NB:** Dependency has been **added** if just the new version information is present.
>
> **NB:** Dependency has been **removed** if new version information isn't present.
## [[2.6.0](https://github.com/nf-core/viralrecon/releases/tag/2.6.0)] - 2023-03-23

### Credits

Special thanks to the following for their code contributions to the release:

- [Friederike Hanssen](https://github.com/FriederikeHanssen)
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,8 @@ A number of improvements were made to the pipeline recently, mainly with regard
- Consensus assessment report ([`QUAST`](http://quast.sourceforge.net/quast))
- Lineage analysis ([`Pangolin`](https://github.com/cov-lineages/pangolin))
- Clade assignment, mutation calling and sequence quality checks ([`Nextclade`](https://github.com/nextstrain/nextclade))
9. Create variants long format table collating per-sample information for individual variants ([`BCFTools`](http://samtools.github.io/bcftools/bcftools.html)), functional effect prediction ([`SnpSift`](http://snpeff.sourceforge.net/SnpSift.html)) and lineage analysis ([`Pangolin`](https://github.com/cov-lineages/pangolin))
9. Relative lineage abundance analysis from mixed SARS-CoV-2 samples ([`Freyja`](https://github.com/andersen-lab/Freyja))
10. Create variants long format table collating per-sample information for individual variants ([`BCFTools`](http://samtools.github.io/bcftools/bcftools.html)), functional effect prediction ([`SnpSift`](http://snpeff.sourceforge.net/SnpSift.html)) and lineage analysis ([`Pangolin`](https://github.com/cov-lineages/pangolin))
6. _De novo_ assembly
1. Primer trimming ([`Cutadapt`](https://cutadapt.readthedocs.io/en/stable/guide.html); _amplicon data only_)
2. Choice of multiple assembly tools ([`SPAdes`](http://cab.spbu.ru/software/spades/) _||_ [`Unicycler`](https://github.com/rrwick/Unicycler) _||_ [`minia`](https://github.com/GATB/minia))
Expand All @@ -78,6 +79,7 @@ A number of improvements were made to the pipeline recently, mainly with regard
- Lineage analysis ([`Pangolin`](https://github.com/cov-lineages/pangolin))
- Clade assignment, mutation calling and sequence quality checks ([`Nextclade`](https://github.com/nextstrain/nextclade))
- Individual variant screenshots with annotation tracks ([`ASCIIGenome`](https://asciigenome.readthedocs.io/en/latest/))
- Recover relative lineage abundances from mixed SARS-CoV-2 samples ([`Freyja`](https://github.com/andersen-lab/Freyja))
- Create variants long format table collating per-sample information for individual variants ([`BCFTools`](http://samtools.github.io/bcftools/bcftools.html)), functional effect prediction ([`SnpSift`](http://snpeff.sourceforge.net/SnpSift.html)) and lineage analysis ([`Pangolin`](https://github.com/cov-lineages/pangolin))
8. Present QC, visualisation and custom reporting for sequencing, raw reads, alignment and variant calling results ([`MultiQC`](http://multiqc.info/))

Expand Down
33 changes: 33 additions & 0 deletions conf/modules_illumina.config
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,39 @@ if (!params.skip_variants) {
}
}

if (!params.skip_freyja) {
process {
withName: 'FREYJA_VARIANTS' {
publishDir = [
path: { "${params.outdir}/variants/freyja/variants" },
mode: params.publish_dir_mode,
pattern: "*.{tsv,csv}"
]
}
withName: 'FREYJA_DEMIX' {
publishDir = [
path: { "${params.outdir}/variants/freyja/demix" },
mode: params.publish_dir_mode,
pattern: "*.{tsv,csv}"
]
}
withName: 'FREYJA_BOOT' {
ext.args = '--boxplot pdf'
publishDir = [
path: { "${params.outdir}/variants/freyja/bootstrap" },
mode: params.publish_dir_mode,
pattern: "*.{tsv,csv,pdf}"
]
}
withName: 'FREYJA_UPDATE' {
publishDir = [
path: { "${params.outdir}/variants/freyja/" },
mode: params.publish_dir_mode,
]
}
}
}

if (!params.skip_ivar_trim && params.protocol == 'amplicon') {
process {
withName: 'IVAR_TRIM' {
Expand Down
36 changes: 36 additions & 0 deletions conf/modules_nanopore.config
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,42 @@ if (!params.skip_nextclade) {
}
}

if (!params.skip_freyja) {
process {
withName: 'FREYJA_VARIANTS' {
publishDir = [
path: { "${params.outdir}/${params.artic_minion_caller}/freyja/variants" },
mode: params.publish_dir_mode,
pattern: "*.{tsv,csv}"
]
}

withName: 'FREYJA_DEMIX' {
publishDir = [
path: { "${params.outdir}/${params.artic_minion_caller}/freyja/demix" },
mode: params.publish_dir_mode,
pattern: "*.{tsv,csv}"
]
}

withName: 'FREYJA_BOOT' {
ext.args = '--boxplot pdf'
publishDir = [
path: { "${params.outdir}/${params.artic_minion_caller}/freyja/bootstrap" },
mode: params.publish_dir_mode,
pattern: "*.{tsv,csv,pdf}"
]
}

withName: 'FREYJA_UPDATE' {
publishDir = [
path: { "${params.outdir}/${params.artic_minion_caller}/freyja/" },
mode: params.publish_dir_mode,
]
}
}
}

if (!params.skip_variants_quast) {
process {
withName: 'QUAST' {
Expand Down
1 change: 1 addition & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ params {

// Variant calling options
variant_caller = 'ivar'
freyja_repeats = 10

// Assembly options
assemblers = 'spades,unicycler,minia'
Expand Down
1 change: 1 addition & 0 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ params {

// Variant calling options
variant_caller = 'ivar'
freyja_repeats = 10

// Assembly options
assemblers = 'spades,unicycler,minia'
Expand Down
3 changes: 3 additions & 0 deletions conf/test_full_nanopore.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ params {
genome = 'MN908947.3'
primer_set_version = 3

// variant calling options
freyja_repeats = 10

// Other parameters
artic_minion_medaka_model = 's3://ngi-igenomes/test-data/viralrecon/20210205_1526_X4_FAP51364_21fa8135/r941_min_high_g360_model.hdf5'
}
1 change: 1 addition & 0 deletions conf/test_full_sispa.config
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ params {

// Variant calling options
variant_caller = 'bcftools'
freyja_repeats = 10

// Assembly options
assemblers = 'spades,unicycler,minia'
Expand Down
3 changes: 3 additions & 0 deletions conf/test_nanopore.config
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ params {
genome = 'MN908947.3'
primer_set_version = 3

// variant calling options
freyja_repeats = 10

// Other parameters
artic_minion_medaka_model = 's3://ngi-igenomes/test-data/viralrecon/minion_test/r941_min_high_g360_model.hdf5'
}
1 change: 1 addition & 0 deletions conf/test_sispa.config
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ params {

// Variant calling options
variant_caller = 'bcftools'
freyja_repeats = 10

// Assembly options
assemblers = 'spades,unicycler,minia'
Expand Down
48 changes: 48 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ The directories listed below will be created in the results directory after the
- [QUAST](#nanopore-quast) - Consensus assessment report
- [Pangolin](#nanopore-pangolin) - Lineage analysis
- [Nextclade](#nanopore-nextclade) - Clade assignment, mutation calling and sequence quality checks
- [Freyja](#nanopore-freyja) - Relative lineage abundance analysis from mixed SARS-CoV-2 samples (typically wastewater)
- [ASCIIGenome](#nanopore-asciigenome) - Individual variant screenshots with annotation tracks
- [Variants long table](#nanopore-variants-long-table) - Collate per-sample information for individual variants, functional effect prediction and lineage analysis
- [Workflow reporting](#nanopore-workflow-reporting)
Expand Down Expand Up @@ -241,6 +242,30 @@ Phylogenetic Assignment of Named Global Outbreak LINeages ([Pangolin](https://gi

[Nextclade](https://github.com/nextstrain/nextclade) performs viral genome clade assignment, mutation calling and sequence quality checks for the consensus sequences generated in this pipeline. Similar to Pangolin, it has been used extensively during the COVID-19 pandemic. A [web application](https://clades.nextstrain.org/) also exists that allows users to upload genome sequences via a web browser.

### Nanopore: Freyja

<details markdown="1">
<summary>Output files</summary>

- `<CALLER>/freyja/demix`
- `*.tsv`: Analysis results including the lineages present, their corresponding abundances, and summarization by constellation
- `<CALLER>/freyja/freyja_db`
- `.json`: dataset containing lineage metadata that correspond to barcodes.
- `.yml`: dataset containing the lineage topology.
- `.csv`: dataset containing lineage defining barcodes.
- `<CALLER>/freyja/variants`
- `*.variants.tsv`: Analysis results including identified variants in a gff-like format
- `*.depth.tsv`: Analysis results including the depth of the identified variants
- `<CALLER>/freyja/boot`
- `*lineages.csv` Analysis results inculding lineages present and their corresponding abundances with variation identified through bootstrapping
- `*summarized.csv`Analysis results inculding lineages present but summarized by constellation and their corresponding abundances with variation identified through bootstrapping

**NB:** The value of `<CALLER>` in the output directory name above is determined by the `--artic_minion_caller` parameter (Default: 'nanopolish').

</details>

[Freyja](https://github.com/andersen-lab/Freyja) is a tool to recover relative lineage abundances from mixed SARS-CoV-2 samples from a sequencing dataset (BAM aligned to the Hu-1 reference). The method uses lineage-determining mutational "barcodes" derived from the [UShER](https://usher-wiki.readthedocs.io/en/latest/#) global phylogenetic tree as a basis set to solve the constrained (unit sum, non-negative) de-mixing problem.

### Nanopore: ASCIIGenome

<details markdown="1">
Expand Down Expand Up @@ -321,6 +346,7 @@ An example MultiQC report generated from a full-sized dataset can be viewed on t
- [mosdepth](#mosdepth) - Whole-genome and amplicon coverage metrics
- [iVar variants](#ivar-variants) _||_ [BCFTools call](#bcftools-call) - Variant calling
- [SnpEff and SnpSift](#snpeff-and-snpsift) - Genetic variant annotation and functional effect prediction
- [Freyja](#freyja) - Relative lineage abundance analysis from mixed SARS-CoV-2 samples (typically wastewater)
- [ASCIIGenome](#asciigenome) - Individual variant screenshots with annotation tracks
- [iVar consensus](#ivar-consensus) _||_ [BCFTools and BEDTools](#bcftools-and-bedtools) - Consensus sequence generation
- [QUAST](#quast) - Consensus assessment report
Expand Down Expand Up @@ -593,6 +619,28 @@ iVar outputs a tsv format which is not compatible with downstream analysis such

![MultiQC - SnpEff annotation counts](images/mqc_snpeff_plot.png)

### Freyja

<details markdown="1">
<summary>Output files</summary>

- `<CALLER>/freyja/demix`
- `*.tsv`: Analysis results including the lineages present, their corresponding abundances, and summarization by constellation
- `<CALLER>/freyja/freyja_db`
- `.json`: dataset containing lineage metadata that correspond to barcodes.
- `.yml`: dataset containing the lineage topology.
- `.csv`: dataset containing lineage defining barcodes.
- `<CALLER>/freyja/variants`
- `*.variants.tsv`: Analysis results including identified variants in a gff-like format
- `*.depth.tsv`: Analysis results including the depth of the identified variants
- `<CALLER>/freyja/boot`
- `*lineages.csv` Analysis results inculding lineages present and their corresponding abundances with variation identified through bootstrapping
- `*summarized.csv`Analysis results inculding lineages present but summarized by constellation and their corresponding abundances with variation identified through bootstrapping

</details>

[Freyja](https://github.com/andersen-lab/Freyja) is a tool to recover relative lineage abundances from mixed SARS-CoV-2 samples from a sequencing dataset (BAM aligned to the Hu-1 reference). The method uses lineage-determining mutational "barcodes" derived from the [UShER](https://usher-wiki.readthedocs.io/en/latest/#) global phylogenetic tree as a basis set to solve the constrained (unit sum, non-negative) de-mixing problem.

### ASCIIGenome

<details markdown="1">
Expand Down
4 changes: 4 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,10 @@ If the `--save_reference` parameter is provided then the Nextclade dataset gener

> **NB:** If you wish to periodically update individual tool-specific results (e.g. Pangolin) generated by the pipeline then you must ensure to keep the `work/` directory otherwise the `-resume` ability of the pipeline will be compromised and it will restart from scratch.
#### Freyja

[Freyja](https://github.com/andersen-lab/Freyja) relies on a dataset of barcodes that use lineage defining mutations (see [UShER](https://usher-wiki.readthedocs.io/en/latest/#)). By default the most recent barcodes will be downloaded and used. However, if analyses need to be compared across multiple datasets, it might be of interest to re-use the same barcodes, or to rerun all Freyja analyses with the most recent dataset. To do this, specify the barcodes and lineages using the `--freyja_barcodes`, `--freyja_lineages` parameters, respectivly.

### nf-core/configs

In most cases, you will only need to create a custom config as a one-off but if you and others within your organisation are likely to be running nf-core pipelines regularly and need to use the same settings regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter. You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile.
Expand Down
25 changes: 25 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,26 @@
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"freyja/boot": {
"branch": "master",
"git_sha": "281c744ed84352c24697f0916c7744853ce83927",
"installed_by": ["bam_variant_demix_boot_freyja"]
},
"freyja/demix": {
"branch": "master",
"git_sha": "4bb5cb441e89811385d18a90809ecf36c9daafd8",
"installed_by": ["bam_variant_demix_boot_freyja"]
},
"freyja/update": {
"branch": "master",
"git_sha": "4bb5cb441e89811385d18a90809ecf36c9daafd8",
"installed_by": ["bam_variant_demix_boot_freyja"]
},
"freyja/variants": {
"branch": "master",
"git_sha": "4bb5cb441e89811385d18a90809ecf36c9daafd8",
"installed_by": ["bam_variant_demix_boot_freyja"]
},
"gunzip": {
"branch": "master",
"git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
Expand Down Expand Up @@ -279,6 +299,11 @@
"git_sha": "b4b7f89e7fd6d2293f0c176213f710e0bcdaf19e",
"installed_by": ["bam_sort_stats_samtools", "bam_markduplicates_picard"]
},
"bam_variant_demix_boot_freyja": {
"branch": "master",
"git_sha": "4bb5cb441e89811385d18a90809ecf36c9daafd8",
"installed_by": ["subworkflows"]
},
"fastq_align_bowtie2": {
"branch": "master",
"git_sha": "ac75f79157ecc64283a2b3a559f1ba90bc0f2259",
Expand Down
56 changes: 56 additions & 0 deletions modules/nf-core/freyja/boot/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 23e3441

Please sign in to comment.