-
Notifications
You must be signed in to change notification settings - Fork 142
Ale module #931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Ale module #931
Conversation
Release v5.1.0
Release: v5.2.0 Puce Pangolin
|
@nf-core-bot fix linting |
|
Please let me know what I can improve. Thank you! |
d4straub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ci tests fail with various errors. https://github.com/nf-core/mag/actions/runs/19404507607/job/55538896302?pr=931 fails with
> ERROR ~ Error executing process > 'NFCORE_MAG:MAG:ALE (minigut)'
>
> Caused by:
> Process `NFCORE_MAG:MAG:ALE (minigut)` terminated with an error exit status (134)
>
>
> Command executed:
>
> ALE \
> \
> SPAdesHybrid-minigut-minigut.bam \
> SPAdesHybrid-minigut.scaffolds.fa \
> minigut_ALEoutput.txt
>
> cat <<-END_VERSIONS > versions.yml
> "NFCORE_MAG:MAG:ALE":
> ale: 20180904
> END_VERSIONS
>
> Command exit status:
> 134
>
> Command output:
> BAM file: SPAdesHybrid-minigut-minigut.bam
> Assembly fasta file: SPAdesHybrid-minigut.scaffolds.fa
> ALE Output file: minigut_ALEoutput.txt
> Reading in assembly...
> Reading in the map and computing statistics...
> Insert length and std not given, will be calculated from input map.
> Found FR sample avg insert length to be 383.864169 from 28344 mapped reads
> Found FR sample insert length std to be 69.336488
> Found NOT_PROPER_FR sample avg insert length to be 892.122675 from 66297 mapped reads
> Found NOT_PROPER_FR sample insert length std to be 221.969163
> There were 99620 total reads, 99620 paired (97898 properly mated), 763 proper singles, 959 improper reads (818 chimeric). (83 reads were unmapped)
> Saved library parameters to minigut_ALEoutput.txt.param
> Computing read placements and depths
>
> Command error:
> WARNING: The following read and its mate do not agree on the contigs and/or positions of their mappings:read1: NC_006347.1_4981 81: 0 0 106315 105875 read2: NC_006347.1_4981 161: 0 0 105578 106537 l: 1.000000 li: 1.000000, s1: 106315, s2: 105875, e1: 106441, e2: -1, c1: 0, c2: 0, NC_006347.1_4981, NOT_PROPER_FR, 0, b1: 34e7c540, b2: 0
> ALE: ALElike.c:1892: validateAlignmentMates: Assertion `thisAlignment->start2 == thisReadMate->core.pos' failed.
> BAM file: SPAdesHybrid-minigut-minigut.bam
> Assembly fasta file: SPAdesHybrid-minigut.scaffolds.fa
> ALE Output file: minigut_ALEoutput.txt
> Reading in assembly...
> Reading in the map and computing statistics...
> Insert length and std not given, will be calculated from input map.
> Found FR sample avg insert length to be 383.864169 from 28344 mapped reads
> Found FR sample insert length std to be 69.336488
> Found NOT_PROPER_FR sample avg insert length to be 892.122675 from 66297 mapped reads
> Found NOT_PROPER_FR sample insert length std to be 221.969163
> There were 99620 total reads, 99620 paired (97898 properly mated), 763 proper singles, 959 improper reads (818 chimeric). (83 reads were unmapped)
> Saved library parameters to minigut_ALEoutput.txt.param
> Computing read placements and depths
> .command.sh: line 6: 34 Aborted (core dumped) ALE SPAdesHybrid-minigut-minigut.bam SPAdesHybrid-minigut.scaffolds.fa minigut_ALEoutput.txt
>
> Work dir:
> /home/runner/_work/mag/mag/~/tests/b1878932db1a90503becf8394b4ddfd4/work/f4/88d2098735b2dd029b42b1a840ced7
>
> Container:
> quay.io/biocontainers/ale:20180904--py27ha92aebf_0
>
> Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
>
> -- Check '/home/runner/_work/mag/mag/~/tests/b1878932db1a90503becf8394b4ddfd4/meta/nextflow.log' file for details
> ERROR ~ Could not find which method load() to invoke from this list:
> public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.io.InputStream)
> public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.io.Reader)
> public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.lang.String)
> public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.io.File)
> public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.nio.file.Path)
>
> -- Check script '/home/runner/_work/mag/mag/subworkflows/nf-core/utils_nfcore_pipeline/main.nf' at line: 82 or see '/home/runner/_work/mag/mag/~/tests/b1878932db1a90503becf8394b4ddfd4/meta/nextflow.log' file for more details
> ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting
>
> -- Check '/home/runner/_work/mag/mag/~/tests/b1878932db1a90503becf8394b4ddfd4/meta/nextflow.log' file for details
> -[nf-core/mag] Pipeline completed with errors-
> WARN: Killing running tasks (1)
FAILED (481.488s)
Additionally, test https://github.com/nf-core/mag/actions/runs/19404507607/job/55538896283?pr=931 indicates that ALE is run but output files are not published to the results folder:
2 { 2 {
3 "ADJUST_MAXBIN2_EXT": { 3 "ADJUST_MAXBIN2_EXT": {
4 "coreutils": 9.5 4 "coreutils": 9.5
+ 5 },
+ 6 "ALE": {
+ 7 "ale": 20180904
5 }, 8 },
6 "BIN_SUMMARY": { 9 "BIN_SUMMARY": {
7 "pandas": "1.4.3", 10 "pandas": "1.4.3",
| 1. Enable binning: --skip_binning false | ||
| 2. Enable ancient DNA: --ancient_dna true | ||
| 3. Disable ALE: --skip_ale true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But --skip_ale true will not run ALE?
Maybe rather
| 3. Disable ALE: --skip_ale true | |
| To avoid that warning disable ALE: --skip_ale |
also using --skip_ale is eqivalent to --skip_ale true.
| if(!params.skip_ale) { | ||
| if ( !params.skip_binning || params.ancient_dna) { | ||
| ch_shortread_assemblies_for_ale = ch_assemblies.filter { meta, assembly -> | ||
| meta.assembler?.toUpperCase() in ['SPADES', 'SPADESHYBRID', 'MEGAHIT'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a particular fan of defining assembler names here, because in case another short read assembler is added in the future, one has to remember to add that to the list here. Probably there is a better solution?
Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>
Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>
prototaxites
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @PetcuBogdan, a few thoughts from me!
|
|
||
| withName: 'NFCORE_MAG:MAG:ALE' { | ||
| publishDir = [ | ||
| path: { "${params.outdir}/Assembly/${meta.assembler?.toUpperCase() ?: 'UNKNOWN'}/QC/${meta.id}/ALE" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| path: { "${params.outdir}/Assembly/${meta.assembler?.toUpperCase() ?: 'UNKNOWN'}/QC/${meta.id}/ALE" }, | |
| path: { "${params.outdir}/Assembly/${meta.assembler}/QC/${meta.id}/ALE" }, |
If there is ever a case where meta.assembler isn't set, that's a bug. And it will be put in a directory called null in that case, so this is unnecessary.
| pattern: "*.{ale,txt,log}", | ||
| saveAs: { filename -> filename.equals('versions.yml') ? null : filename } | ||
| ] | ||
| ext.prefix = { "${meta.id}" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be something like this?
| ext.prefix = { "${meta.id}" } | |
| ext.prefix = { "${meta.id}-${meta.assembler}" } |
|
|
||
| [ALE (Assembly Likelihood Estimator)](https://github.com/sc932/ALE) is a probabilistic framework that evaluates assembly quality by computing the likelihood of the sequencing reads given an assembly. ALE provides per-contig quality scores and identifies potentially problematic regions in assemblies by analyzing read mapping patterns and insert size distributions. It is particularly useful for comparing assemblies and identifying misassemblies or low-confidence regions. | ||
|
|
||
| ALE is run on short-read assemblies (SPAdes, SPAdes hybrid, and MEGAHIT) when binning or ancient DNA analysis is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is just an assembly quality tool, it should run even when binning is off, no?
| log.warn """ | ||
| [nf-core/mag] ALE (Assembly Likelihood Estimator) Warnings | ||
| ALE is enabled (--skip_ale false) but cannot run because: | ||
| - Binning is disabled (--skip_binning true) | ||
| - Ancient DNA mode is not enabled (--ancient_dna false) | ||
| To run ALE, choose one of the following options: | ||
| 1. Enable binning: --skip_binning false | ||
| 2. Enable ancient DNA: --ancient_dna true | ||
| 3. Disable ALE: --skip_ale true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we don't need to have a long explainer about /why/ ALE doesn't run? It should either not run, silently, or print out a warning on a single line.
| */ | ||
|
|
||
| if(!params.skip_ale) { | ||
| if ( !params.skip_binning || params.ancient_dna) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just pass assemblies into this and not worry about whether binning is enabled, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And since this is an assembly QC tool, should this logically be next to the QUAST section in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi thanks for the feedback!
Since ALE requires BAM files for the likelihood calculation, it needs access to the mapped reads produced in BINNING_PREPARATION.
Should I move BINNING_PREPARATION earlier in the workflow, or do you have another suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, of course 🤦♂️ In that case, ignore this comment entirely!
PR checklist
nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).Description
This PR adds ALE (Assembly Likelihood Estimator) for assembly quality control in nf-core/mag.
ALE is a probabilistic framework that evaluates assembly quality by computing the likelihood of sequencing reads given an assembly. It provides per-contig quality scores useful for identifying misassemblies, comparing assemblies, and validating quality before binning.
Changes made
Workflow:
References