Skip to content

Conversation

@PetcuBogdan
Copy link

@PetcuBogdan PetcuBogdan commented Nov 16, 2025

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Description

This PR adds ALE (Assembly Likelihood Estimator) for assembly quality control in nf-core/mag.
ALE is a probabilistic framework that evaluates assembly quality by computing the likelihood of sequencing reads given an assembly. It provides per-contig quality scores useful for identifying misassemblies, comparing assemblies, and validating quality before binning.

Changes made

Workflow:

  • Added ALE analysis for short-read assemblies (SPAdes, MEGAHIT)
  • Runs when binning or ancient DNA analysis is enabled
  • Reuses existing BAM files from binning preparation

References

@jfy133
Copy link
Member

jfy133 commented Nov 16, 2025

@nf-core-bot fix linting

@PetcuBogdan
Copy link
Author

Please let me know what I can improve. Thank you!

Copy link
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ci tests fail with various errors. https://github.com/nf-core/mag/actions/runs/19404507607/job/55538896302?pr=931 fails with

    > ERROR ~ Error executing process > 'NFCORE_MAG:MAG:ALE (minigut)'
    > 
    > Caused by:
    >   Process `NFCORE_MAG:MAG:ALE (minigut)` terminated with an error exit status (134)
    > 
    > 
    > Command executed:
    > 
    >   ALE \
    >        \
    >       SPAdesHybrid-minigut-minigut.bam \
    >       SPAdesHybrid-minigut.scaffolds.fa \
    >       minigut_ALEoutput.txt
    >   
    >   cat <<-END_VERSIONS > versions.yml
    >   "NFCORE_MAG:MAG:ALE":
    >       ale: 20180904
    >   END_VERSIONS
    > 
    > Command exit status:
    >   134
    > 
    > Command output:
    >   BAM file: SPAdesHybrid-minigut-minigut.bam
    >   Assembly fasta file: SPAdesHybrid-minigut.scaffolds.fa
    >   ALE Output file: minigut_ALEoutput.txt
    >   Reading in assembly...
    >   Reading in the map and computing statistics...
    >   Insert length and std not given, will be calculated from input map.
    >   Found FR sample avg insert length to be 383.864169 from 28344 mapped reads
    >   Found FR sample insert length std to be 69.336488
    >   Found NOT_PROPER_FR sample avg insert length to be 892.122675 from 66297 mapped reads
    >   Found NOT_PROPER_FR sample insert length std to be 221.969163
    >   There were 99620 total reads, 99620 paired (97898 properly mated), 763 proper singles, 959 improper reads (818 chimeric). (83 reads were unmapped)
    >   Saved library parameters to minigut_ALEoutput.txt.param
    >   Computing read placements and depths
    > 
    > Command error:
    >   WARNING: The following read and its mate do not agree on the contigs and/or positions of their mappings:read1: NC_006347.1_4981 81: 0 0 106315 105875	read2: NC_006347.1_4981 161: 0 0 105578 106537	l: 1.000000 li: 1.000000, s1: 106315, s2: 105875, e1: 106441, e2: -1, c1: 0, c2: 0, NC_006347.1_4981, NOT_PROPER_FR, 0, b1: 34e7c540, b2: 0
    >   ALE: ALElike.c:1892: validateAlignmentMates: Assertion `thisAlignment->start2 == thisReadMate->core.pos' failed.
    >   BAM file: SPAdesHybrid-minigut-minigut.bam
    >   Assembly fasta file: SPAdesHybrid-minigut.scaffolds.fa
    >   ALE Output file: minigut_ALEoutput.txt
    >   Reading in assembly...
    >   Reading in the map and computing statistics...
    >   Insert length and std not given, will be calculated from input map.
    >   Found FR sample avg insert length to be 383.864169 from 28344 mapped reads
    >   Found FR sample insert length std to be 69.336488
    >   Found NOT_PROPER_FR sample avg insert length to be 892.122675 from 66297 mapped reads
    >   Found NOT_PROPER_FR sample insert length std to be 221.969163
    >   There were 99620 total reads, 99620 paired (97898 properly mated), 763 proper singles, 959 improper reads (818 chimeric). (83 reads were unmapped)
    >   Saved library parameters to minigut_ALEoutput.txt.param
    >   Computing read placements and depths
    >   .command.sh: line 6:    34 Aborted                 (core dumped) ALE SPAdesHybrid-minigut-minigut.bam SPAdesHybrid-minigut.scaffolds.fa minigut_ALEoutput.txt
    > 
    > Work dir:
    >   /home/runner/_work/mag/mag/~/tests/b1878932db1a90503becf8394b4ddfd4/work/f4/88d2098735b2dd029b42b1a840ced7
    > 
    > Container:
    >   quay.io/biocontainers/ale:20180904--py27ha92aebf_0
    > 
    > Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
    > 
    >  -- Check '/home/runner/_work/mag/mag/~/tests/b1878932db1a90503becf8394b4ddfd4/meta/nextflow.log' file for details
    > ERROR ~ Could not find which method load() to invoke from this list:
    >   public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.io.InputStream)
    >   public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.io.Reader)
    >   public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.lang.String)
    >   public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.io.File)
    >   public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.nio.file.Path)
    > 
    >  -- Check script '/home/runner/_work/mag/mag/subworkflows/nf-core/utils_nfcore_pipeline/main.nf' at line: 82 or see '/home/runner/_work/mag/mag/~/tests/b1878932db1a90503becf8394b4ddfd4/meta/nextflow.log' file for more details
    > ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting
    > 
    >  -- Check '/home/runner/_work/mag/mag/~/tests/b1878932db1a90503becf8394b4ddfd4/meta/nextflow.log' file for details
    > -[nf-core/mag] Pipeline completed with errors-
    > WARN: Killing running tasks (1)
    FAILED (481.488s)

Additionally, test https://github.com/nf-core/mag/actions/runs/19404507607/job/55538896283?pr=931 indicates that ALE is run but output files are not published to the results folder:

    2     {                                 2     {                            
    3         "ADJUST_MAXBIN2_EXT": {       3         "ADJUST_MAXBIN2_EXT": {  
    4             "coreutils": 9.5          4             "coreutils": 9.5     
                                        +   5         },                       
                                        +   6         "ALE": {                 
                                        +   7             "ale": 20180904      
    5         },                            8         },                       
    6         "BIN_SUMMARY": {              9         "BIN_SUMMARY": {         
    7             "pandas": "1.4.3",       10             "pandas": "1.4.3",  

1. Enable binning: --skip_binning false
2. Enable ancient DNA: --ancient_dna true
3. Disable ALE: --skip_ale true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But --skip_ale true will not run ALE?
Maybe rather

Suggested change
3. Disable ALE: --skip_ale true
To avoid that warning disable ALE: --skip_ale

also using --skip_ale is eqivalent to --skip_ale true.

if(!params.skip_ale) {
if ( !params.skip_binning || params.ancient_dna) {
ch_shortread_assemblies_for_ale = ch_assemblies.filter { meta, assembly ->
meta.assembler?.toUpperCase() in ['SPADES', 'SPADESHYBRID', 'MEGAHIT']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a particular fan of defining assembler names here, because in case another short read assembler is added in the future, one has to remember to add that to the list here. Probably there is a better solution?

PetcuBogdan and others added 2 commits November 18, 2025 20:39
Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>
Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>
Copy link
Contributor

@prototaxites prototaxites left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @PetcuBogdan, a few thoughts from me!


withName: 'NFCORE_MAG:MAG:ALE' {
publishDir = [
path: { "${params.outdir}/Assembly/${meta.assembler?.toUpperCase() ?: 'UNKNOWN'}/QC/${meta.id}/ALE" },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
path: { "${params.outdir}/Assembly/${meta.assembler?.toUpperCase() ?: 'UNKNOWN'}/QC/${meta.id}/ALE" },
path: { "${params.outdir}/Assembly/${meta.assembler}/QC/${meta.id}/ALE" },

If there is ever a case where meta.assembler isn't set, that's a bug. And it will be put in a directory called null in that case, so this is unnecessary.

pattern: "*.{ale,txt,log}",
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
ext.prefix = { "${meta.id}" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be something like this?

Suggested change
ext.prefix = { "${meta.id}" }
ext.prefix = { "${meta.id}-${meta.assembler}" }


[ALE (Assembly Likelihood Estimator)](https://github.com/sc932/ALE) is a probabilistic framework that evaluates assembly quality by computing the likelihood of the sequencing reads given an assembly. ALE provides per-contig quality scores and identifies potentially problematic regions in assemblies by analyzing read mapping patterns and insert size distributions. It is particularly useful for comparing assemblies and identifying misassemblies or low-confidence regions.

ALE is run on short-read assemblies (SPAdes, SPAdes hybrid, and MEGAHIT) when binning or ancient DNA analysis is enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just an assembly quality tool, it should run even when binning is off, no?

Comment on lines +529 to +540
log.warn """
[nf-core/mag] ALE (Assembly Likelihood Estimator) Warnings
ALE is enabled (--skip_ale false) but cannot run because:
- Binning is disabled (--skip_binning true)
- Ancient DNA mode is not enabled (--ancient_dna false)
To run ALE, choose one of the following options:
1. Enable binning: --skip_binning false
2. Enable ancient DNA: --ancient_dna true
3. Disable ALE: --skip_ale true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we don't need to have a long explainer about /why/ ALE doesn't run? It should either not run, silently, or print out a warning on a single line.

*/

if(!params.skip_ale) {
if ( !params.skip_binning || params.ancient_dna) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just pass assemblies into this and not worry about whether binning is enabled, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And since this is an assembly QC tool, should this logically be next to the QUAST section in this file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi thanks for the feedback!

Since ALE requires BAM files for the likelihood calculation, it needs access to the mapped reads produced in BINNING_PREPARATION.
Should I move BINNING_PREPARATION earlier in the workflow, or do you have another suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, of course 🤦‍♂️ In that case, ignore this comment entirely!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants