Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#873](https://github.com/nf-core/mag/pull/873) - Document usage of `longread_percentidentity` and `shortread_percentidentity` and set the value of `longread_percentidentity` in the `test_full` profile to 85 (by @prototaxites)
- [#875](https://github.com/nf-core/mag/pull/875) - Add binner COMEBin (by @d4straub)

- [#931](https://github.com/nf-core/mag/pull/931) - Added ALE (Assembly Likelihood Estimator) for probabilistic assembly quality control (by @PetcuBogdan)
- ALE provides per-contig quality scores for short-read assemblies (SPAdes, MEGAHIT)
- Runs automatically when binning is enabled (default behavior)
- Output: `Assembly/[assembler]/QC/[sample]/ALE/`
- Can be disabled with `--skip_ale` parameter

### `Changed`

- [#878](https://github.com/nf-core/mag/pull/878) - Refine test_full config with optimised resource usage for AWS release megatests (by @jfy133)
Expand Down
8 changes: 8 additions & 0 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ run_modules:
- bowtie2
- busco
- quast
- ale
- prokka
- porechop
- filtlong
Expand Down Expand Up @@ -55,6 +56,13 @@ top_modules:
info: "Mapping statistics of reads mapped against host genome and subsequently removed."
path_filters:
- "*_host_removed.bowtie2.log"
- "ale":
name: "ALE: Assembly Likelihood Evaluation"
info: "Log-likelihood evaluation of assemblies using mapped reads (ALE module)."
path_filters:
- "*_ALE/*.ale"
- "*_ALE/*.txt"
- "*_ALE/*.log"
- "quast":
name: "QUAST: assembly"
info: "Assembly statistics of raw assemblies."
Expand Down
10 changes: 10 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,16 @@ process {
publishDir = [path: { "${params.outdir}/Assembly/${meta.assembler}/QC/${meta.id}" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename }]
}

withName: 'NFCORE_MAG:MAG:ALE' {
publishDir = [
path: { "${params.outdir}/Assembly/${meta.assembler?.toUpperCase() ?: 'UNKNOWN'}/QC/${meta.id}/ALE" },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
path: { "${params.outdir}/Assembly/${meta.assembler?.toUpperCase() ?: 'UNKNOWN'}/QC/${meta.id}/ALE" },
path: { "${params.outdir}/Assembly/${meta.assembler}/QC/${meta.id}/ALE" },

If there is ever a case where meta.assembler isn't set, that's a bug. And it will be put in a directory called null in that case, so this is unnecessary.

mode: params.publish_dir_mode,
pattern: "*.{ale,txt,log}",
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
ext.prefix = { "${meta.id}" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be something like this?

Suggested change
ext.prefix = { "${meta.id}" }
ext.prefix = { "${meta.id}-${meta.assembler}" }

}

withName: 'QUAST_BINS|QUAST_BINS_SUMMARY' {
publishDir = [
path: { "${params.outdir}/GenomeBinning/QC" },
Expand Down
15 changes: 15 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,21 @@ SPAdesHybrid is a part of the [SPAdes](http://cab.spbu.ru/software/spades/) soft

</details>

### Assembly Quality Control with ALE

[ALE (Assembly Likelihood Estimator)](https://github.com/sc932/ALE) is a probabilistic framework that evaluates assembly quality by computing the likelihood of the sequencing reads given an assembly. ALE provides per-contig quality scores and identifies potentially problematic regions in assemblies by analyzing read mapping patterns and insert size distributions. It is particularly useful for comparing assemblies and identifying misassemblies or low-confidence regions.

ALE is run on short-read assemblies (SPAdes, SPAdes hybrid, and MEGAHIT) when binning or ancient DNA analysis is enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just an assembly quality tool, it should run even when binning is off, no?


<details markdown="1">
<summary>Output files</summary>

- `Assembly/[assembler]/QC/[sample/group]/ALE/`
- `[sample]_ALEoutput.txt`: Per-contig ALE scores and quality metrics, including likelihood estimates for each contig
- `[sample].log`: ALE processing log file containing diagnostic information and runtime details

</details>

## Gene prediction

Protein-coding genes are predicted for each assembly.
Expand Down
14 changes: 14 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -454,6 +454,20 @@ This can also remove 'nonsense' bins of e.g. a single or a collection of very sh

Note that in this context, it is recommended to also set `--min_length_unbinned_contigs` to a suitably high value that corresponds to a reasonable bin size if the `-bin_*_length` parameters are used, so you have useful 'singular' contigs in the unbinned output.

## A note on assembly quality control with ALE

The pipeline uses [ALE (Assembly Likelihood Estimator)](https://github.com/sc932/ALE) to perform probabilistic quality assessment of short-read assemblies generated by MEGAHIT and SPAdes.

ALE evaluates assembly quality by computing the likelihood that the assembly could have generated the observed sequencing reads. Unlike traditional assembly QC tools that rely on reference genomes or marker genes, ALE provides a reference-free quality assessment that is particularly useful for novel organisms or complex metagenomes where references may not be available.

ALE runs automatically when binning is enabled (default behavior), short reads are provided, and assemblies are generated with MEGAHIT or SPAdes. The tool generates quality assessment files in `Assembly/[assembler]/QC/[sample]/ALE/` containing per-assembly likelihood metrics (`[sample]_ALEoutput.txt`).

ALE scores are log-likelihoods where higher (less negative) values indicate better assembly quality. These scores reflect how well the assembly explains the observed sequencing reads and can help identify assemblies that may have structural issues or errors that could affect downstream binning and annotation.

If you wish to skip the ALE quality assessment step (for example, to speed up the pipeline when working with well-characterized samples), you can disable it with `--skip_ale`.

Note that ALE only works with short-read assemblies (MEGAHIT, SPAdes). Long-read assemblies (Flye, MetaMDBG) are not supported by ALE, and hybrid assemblies (SPAdesHybrid) use only the short-read component for ALE scoring. For more information about ALE and how to interpret the results, see the [ALE GitHub repository](https://github.com/sc932/ALE) and the [publication](https://doi.org/10.1093/bioinformatics/bts723).

## A note on GTDB having too many files or using too many inodes

The GTDB is very large both in size and by the number of files it contains.
Expand Down
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@
"git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
"installed_by": ["modules"]
},
"ale": {
"branch": "master",
"git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
"installed_by": ["modules"]
},
"bbmap/bbnorm": {
"branch": "master",
"git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
Expand Down
9 changes: 9 additions & 0 deletions modules/nf-core/ale/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

49 changes: 49 additions & 0 deletions modules/nf-core/ale/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

58 changes: 58 additions & 0 deletions modules/nf-core/ale/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

108 changes: 108 additions & 0 deletions modules/nf-core/ale/tests/main.nf.test

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading