The format is based on Keep a Changelog
and this project adheres to Semantic Versioning.
- #707 - Fix missing space resulting in malformed args for MEGAHIT (reported by @d4straub, fix by @jfy133)
- #674 - Added
--longread_adaptertrimming_tool
Where user can chose between porechop_abi (default) and porechop (added by @muabnezor)
- #674 - Changed to porechop-abi as default adapter trimming tool for long reads. User can still use porechop if preferred (added by @muabnezor)
- #666 - Update SPAdes to version 4.0.0, replace both METASPADES and MEGAHIT with official nf-core modules (requested by @elsherbini, fix by @jfy133)
- #666 - Update URLs to GTDB database downloads due to server move (reported by @Jokendo-collab, fix by @jfy133)
- #695 - Updated to nf-core 3.0.2
TEMPLATE
(by @jfy133)
- #695 - Switch more stable Zenodo link for CheckM data (by @jfy133)
- #674 - Make longread preprocessing a subworkflow (added by @muabnezor)
- #674 - Add porechop and filtlong logs to multiqc (added by @muabnezor)
- #674 - Change local filtlong module to the official nf-core/filtlong module (added by @muabnezor)
- #690 - MaxBin2 now using the abundance information from different samples rather than an average (reported by @uel3 and fixed by @d4straub)
- #698 - Updated prodigal module to not pick up input symlinks for compression causing pigz errors (reported by @zackhenny, fix by @jfy133 )
Tool |
Previous version |
New version |
Porechop_ABI |
|
0.5.0 |
Filtlong |
0.2.0 |
0.2.1 |
SPAdes |
3.15.3 |
4.0.0 |
- #665 - Add support for supplying pre-made bowtie host reference index (requested by @simone-pignotti, added by @jfy133)
- #670 - Added
--gtdbtk_pplacer_useram
to run GTDBTk in memory mode rather than write to disk (requested by @harper357, fixed by @jfy133)
- #664 - Update GTDBTk to latest version, with updated column names, update GTDB to release 220 (by @dialvarezs)
- #676 - Added exit code 12 to valid SPAdes retry codes, due to OOM errors from spades-hammer (reported by @bawee, fix by @jfy133)
- #667 - Fix pipeline crashing if only CONCOCT selected during binning (reported and fixed by @jfy133)
- #670 - Re-add missing GTDBTk parameters into GTDBTk module (reported by harper357, fixed by @jfy133)
- #672 - Fix GTDB-Tk per-sample TSV files not being published in output directory (reported by @jhayer, fix by @jfy133)
Tool |
Previous version |
New version |
GTDBTk |
2.3.2 |
2.4.0 |
- #670 - Deprecated
--gtdbtk_pplacer_scratch
due to unintuitive usage (reported by harper357, fixed by @jfy133)
- #648 - Fix sample ID/assembly ID check failure when no IDs match (reported by @zackhenny, fix by @prototaxites)
- #646 - GTDB-Tk directory input now creates a value channel so it runs for all entries to the process and not just the first (reported by @amizeranschi, fix by @prototaxites).
- #639 - Fix pipeline failure when a sample produces only a single bin (fix by @d-callan)
- #651 - Replace base container for bash only modules to reduce number of containers in pipeline (reported and fixed by @harper357)
- #652 - Fix documentation typo in using user-defined assembly parameters (reported and fixed by @amizeranschi)
- #653 - Fix overwriting of per-bin 'raw' GUNC RUN output files (multi-bin summary tables not affected) (reported by @zackhenny and fixed by @jfy133)
- #633 - Changed BUSCO to use offline mode when the database is specified by the user (reported by @ChristophKnapp and many others, fix by @jfy133)
- #632 - Use default NanoLyse log of just removed reads rather than custom (by @jfy133)
- #630 - Fix CONCOCT empty bins killing the pipeline, and allow for true multithreading again (removing OPENBLAS loop) (reported by @maxibor, fix by @maxibor and @jfy133)
Tool |
Previous version |
New version |
Porechop |
0.2.3_seqan2.1.1 |
0.2.4 |
NanoPlot |
1.26.3 |
1.41.6 |
NanoLyse |
1.1.0 |
1.2.0 |
- #625 - Updated link to geNomad database for downloading (reported by @amizeranschi, fix by @jfy133)
- #618 - Fix CENTRIFUGE mkfifo failures by using work directory /tmp (reported by @skrakau, fix by @jfy133)
Tool |
Previous version |
New version |
Centrifuge |
1.0.4_beta |
1.0.4.1 |
- #615 - Add new logo (by @jfy133)
- #599 - Update to nf-core v2.13.1
TEMPLATE
(by @jfy133)
- #614 - Update to nf-core v2.14.1
TEMPLATE
(by @jfy133)
- #606 - Prevent pipeline crash when premade mashdb given to or no alignments found with GTDB-TK_CLASSIFYWF (reported by @cedwardson4, fix by @jfy133)
- #599 - Direct reads input (
--input 'sample_{R1,R2}.fastq.gz'
) is no longer supported, all input must come via samplesheets (by @jfy133)
- #581 - Added explicit licence text to headers of all custom scripts (reported by @FriederikeHanssen and @maxibor, fix by @jfy133)
- #602 - Co-binning when using aDNA mode now enabled (added by @maxibor)
- #583 - Fix GTDB database input when directory supplied (fix by @jfy133)
- #575 - Deactivated MetaSPAdes, Centrifuge, and GTDB in test_full profile due to some container incompatibilities in nf-core megatest AWS configurations (by @jfy133)
- #574 - Fix wrong channel going to BIN_SUMMARY (fix by @maxibor)
- #562 - Add CAT summary into the global bin_summary (by @maxibor)
- #565 - Add warning of empty GTDB-TK results if no contigs pass completeness filter (by @jfy133 and @maxibor)
- #563 - Update to nf-core v2.12
TEMPLATE
(by @CarsonJM)
- #566 - More logical ordering of MultiQC sections (assembly and bin sections go together respectively) (fix by @jfy133)
- #548 - Fixes to (reported by @maxibor, @PPpissar, @muniheart, @llborcard, fix by @maxibor)
- GTDBK-TK execution
- CAT/QUAST/DEPTH bin summary file name collisions
- BUSCO database parsing
- Correct CAT name files
- #558 - Fix bug in run merging when dealing with single end data (reported by @roberta-davidson, fix by @jfy133)
- #489 - Fix file name collision clashes for CHECKM, CAT, GTDBTK, and QUAST (reported by @tillenglert and @maxibor, fix by @maxibor)
- #533 - Fix glob pattern for publishing MetaBAT2 bins in results (reported by @patriciatran, fix by @jfy133)
- #535 - Fix input validation pattern to again allow direct FASTQ input (reported by @lennijusten, @emnilsson, fix by @jfy133, @d4straub, @mahesh-panchal, @nvnieuwk)
Tool |
Previous version |
New version |
CAT |
4.6 |
5.2.3 |
- #536 - Remove custom function with native Nextflow for checking file extension (reported by @d4straub, fix by @jfy133)
- #504 - New parameters
--busco_db
, --kraken2_db
, and --centrifuge_db
now support directory input of a pre-uncompressed database archive directory (by @gregorysprenger).
- #511 - Update to nf-core 2.10
TEMPLATE
(by @jfy133)
- #504 -
--save_busco_reference
is now replaced by --save_busco_db
(by @gregorysprenger).
- #514 - Fix missing CONCOCT files in downstream output (reported by @maxibor, fix by @jfy133)
- #515 - Fix overwriting of GUNC output directories when running with domain classification (reported by @maxibor, fix by @jfy133)
- #516 - Fix edge-case bug where MEGAHIT re-uses previous work directory on resume and fails (reported by @husensofteng, fix by @prototaxites)
- #520 - Fix missing Tiara output files (fix by @jfy133)
- #522 - Fix 'nulls' in depth plot PNG files (fix by @jfy133)
- #504 -
--busco_reference
, --busco_download_path
, --save_busco_reference
parameters have been deprecated and replaced with new parameters (by @gregorysprenger).
- #497 - Adds support for pointing at a local db for krona, using the parameter
--krona_db
(by @willros).
- #395 - Adds support for fast domain-level classification of bins using Tiara, to allow bins to be separated into eukaryotic and prokaryotic-specific processes.
- #422 - Adds support for normalization of read depth with BBNorm (added by @erikrikarddaniel and @fabianegli)
- #439 - Adds ability to enter the pipeline at the binning stage by providing a CSV of pre-computed assemblies (by @prototaxites)
- #459 - Adds ability to skip damage correction step in the ancient DNA workflow and just run pyDamage (by @jfy133)
- #364 - Adds geNomad nf-core modules for identifying viruses in assemblies (by @PhilPalmer and @CarsonJM)
- #481 - Adds MetaEuk for annotation of eukaryotic MAGs, and MMSeqs2 to enable downloading databases for MetaEuk (by @prototaxites)
- #437 -
--gtdb_db
also now supports directory input of an pre-uncompressed GTDB archive directory (reported by @alneberg, fix by @jfy133)
- #494 - Adds support for saving the BAM files from Bowtie2 mapping of input reads back to assembly (fix by @jfy133)
- #428 #467 - Update to nf-core 2.8, 2.9
TEMPLATE
(by @jfy133)
- #429 - Replaced hardcoded CheckM database auto-download URL to a parameter (reported by @erikrikarddaniel, fix by @jfy133)
- #441 - Deactivated CONCOCT in AWS 'full test' due to very long runtime (fix by @jfy133).
- #442 - Remove warning when BUSCO finds no genes in bins, as this can be expected in some datasets (reported by @Lumimar, fix by @jfy133).
- #444 - Moved BUSCO bash code to script (by @jfy133)
- #477 -
--gtdb
parameter is split into --skip_gtdbtk
and --gtdb_db
to allow finer control over GTDB database retrieval (fix by @jfy133)
- #500 - Temporarily disabled downstream processing of both refined and raw bins due to bug (by @jfy133)
- #496 - Fix help text for paramters
--bowtie2_mode
, spades_options
and megahit_options
(by @willros)
- #400 - Fix duplicated Zenodo badge in README (by @jfy133)
- #406 - Fix CheckM database always downloading, regardless if CheckM is selected (by @jfy133)
- #419 - Fix bug with busco_clean parameter, where it is always activated (by @prototaxites)
- #426 - Fixed typo in help text for parameters
--host_genome
and --host_fasta
(by @tillenglert)
- #434 - Fix location of samplesheet for AWS full tests (reported by @Lfulcrum, fix by @jfy133)
- #438 - Fixed version inconsistency between conda and containers for GTDBTK_CLASSIFYWF (by @jfy133)
- #439 - Fix bug in assembly input (by @prototaxites)
- #447 - Remove
default: None
from parameter schema (by @drpatelh)
- #449 - Fix results file overwriting in Ancient DNA workflow (reported by @alexhbnr, fix by @jfy133)
- #470 - Fix binning preparation from running even when binning was requested to be skipped (reported by @prototaxites, fix by @jfy133)
- #480 - Improved
-resume
reliability through better meta map preservation (reported by @prototaxites, fix by @jfy133)
- #493 - Update
METABAT2
nf-core module so that it reduced the number of unnecessary file moves, enabling virtual filesystems (fix by @adamrtalbot)
- #500 - Fix MaxBin2 bins not being saved in results directly properly (reported by @Perugolate, fix by @jfy133)
Tool |
Previous version |
New version |
BCFtools |
1.16 |
1.17 |
SAMtools |
1.16.1 |
1.17 |
fastp |
0.23.2 |
0.23.4 |
MultiQC |
1.14 |
1.15 |
- #461 - Fix full-size AWS test profile paths (by @jfy133)
- #461 - Fix pyDamage results being overwritten (reported by @alexhbnr, fix by @jfy133)
- #458 - Correct the major issue in ancient DNA workflow of binning refinement being performed on uncorrected contigs instead of aDNA consensus recalled contigs (issue #449)
- #451 - Fix results file overwriting in Ancient DNA workflow (reported by @alexhbnr, fix by @jfy133, and integrated by @maxibor in #458 )
- #350 - Adds support for CheckM as alternative bin completeness and QC tool (added by @jfy133 and @skrakau)
- #353 - Added the busco_clean parameter to optionally clean each BUSCO directory after a successful (by @prototaxites)
- #361 - Added the skip_clipping parameter to skip read preprocessing with fastp or adapterremoval. Running the pipeline with skip_clipping, keep_phix and without specifying a host genome or fasta file skips the FASTQC_TRIMMED process (by @prototaxites)
- #365 - Added CONCOCT as an additional (optional) binning tool (by @jfy133)
- #366 - Added CAT_SUMMARISE process and cat_official_taxonomy parameter (by @prototaxites)
- #372 - Allow CAT_DB to take an extracted database as well as a tar.gz file (by @prototaxites).
- #380 - Added support for saving processed reads (clipped, host removed etc.) to results directory (by @jfy133)
- #394 - Added GUNC for additional chimeric bin/contamination QC (added by @jfy133)
- #340,#368,#373 - Update to nf-core 2.7.2
TEMPLATE
(by @jfy133, @d4straub, @skrakau)
- #373 - Removed parameter
--enable_conda
. Updated local modules to new conda syntax and updated nf-core modules (by @skrakau)
- #385 - CAT also now runs on unbinned contigs as well as binned contigs (added by @jfy133)
- #399 - Removed undocumented BUSCO_PLOT process (previously generated
*.busco_figure.png
plots unsuitable for metagenomics) (by @skrakau).
- #416 - Use GTDBTK_CLASSIFYWF nf-core module instead of local module (added by @alxndrdiaz)
- #345 - Bowtie2 mode changed to global alignment for ancient DNA mode (
--very-sensitive
mode) to prevent soft clipping at the end of reads when running in local mode. (by @maxibor)
- #349 - Add a warning that pipeline will reset minimum contig size to 1500 specifically MetaBAT2 process, if a user supplies below this threshold. (by @jfy133)
- #352 - Escape the case in the BUSCO module that BUSCO can just detect a root lineage but is not able to find any marker genes (by @alexhbnr)
- #355 - Include error code 21 for retrying with higher memory for SPAdes and hybridSPAdes (by @mglubber)
Tool |
Previous version |
New version |
BUSCO |
5.1.0 |
5.4.3 |
BCFtools |
1.14 |
1.16 |
Freebayes |
1.3.5 |
1.3.6 |
SAMtools |
1.15 |
1.16.1 |
- #328 - Fix too many symbolic links issue in local convert_depths module (reported by @ChristophKnapp and fixed by @apeltzer, @jfy133)
- #329 - Each sample now gets it's own result directory for PyDamage analysis and filter (reported and fixed by @maxibor)
- #263 - Restructure binning subworkflow in preparation for aDNA workflow and extended binning
- #247 - Add ancient DNA subworkflow
- #263 - Add MaxBin2 as second contig binning tool
- #285 - Add AdapterRemoval2 as an alternative read trimmer
- #291 - Add DAS Tool for bin refinement
- #319 - Activate pipeline-specific institutional nf-core/configs
- #269,#283,#289,#302 - Update to nf-core 2.4
TEMPLATE
- #286 - Cite our publication instead of the preprint
- #291, #299 - Add extra results folder
GenomeBinning/depths/contigs
for [assembler]-[sample/group]-depth.txt.gz
, and GenomeBinning/depths/bins
for bin_depths_summary.tsv
and [assembler]-[binner]-[sample/group]-binDepths.heatmap.png
- #315 - Replace base container for standard shell tools to fix problems with running on Google Cloud
- #290 - Fix caching of binning input
- #305 - Add missing Bowtie2 version for process
BOWTIE2_PHIX_REMOVAL_ALIGN
to software_versions.yml
- #307 - Fix retrieval of GTDB-Tk version (note about newer version caused error in
CUSTOM_DUMPSOFTWAREVERSIONS
)
- #309 - Fix publishing of BUSCO
busco_downloads/
folder, i.e. publish only when --save_busco_reference
is specified
- #321 - Fix parameter processing in
BOWTIE2_REMOVAL_ALIGN
(which was erroneously for BOWTIE2_PHIX_REMOVAL_ALIGN
)
Tool |
Previous version |
New version |
fastp |
0.20.1 |
0.23.2 |
MultiQC |
1.11 |
1.12 |
- #240 - Add prodigal to predict protein-coding genes for assemblies.
- #241 - Add parameter
--skip_prodigal
.
- #244 - Add pipeline preprint information.
- #245 - Add Prokka to annotate binned genomes.
- #249 - Update workflow overview figure.
- #258 - Updated MultiQC 1.9 to 1.11.
- #260 - Updated SPAdes 3.13.1 -> 3.15.3, MEGAHIT 1.2.7 -> 1.2.7
- #256 - Fix
--skip_busco
.
- #236 - Fix large assemblies (> 4 billion nucleotides in length).
- #254 - Fix MetaBAT2 error with nextflow version 21.10.x (21.04.03 is the latest functional version for nf-core/mag 2.1.0).
- #255 - Update gtdbtk conda channel.
- #258 - FastP results are now in MultiQC.
- #212, #214 - Add bin abundance estimation based on median sequencing depths of corresponding contigs (results are written to
results/GenomeBinning/bin_depths_summary.tsv
and results/GenomeBinning/bin_summary.tsv
) #197.
- #214 - Add generation of (clustered) heat maps with bin abundances across samples (using centered log-ratios)
- #217 - Publish genes predicted with Prodigal within BUSCO run (written to
results/GenomeBinning/QC/BUSCO/[assembler]-[bin]_prodigal.gff
).
- #218 - Update to nf-core 2.0.1
TEMPLATE
(DSL2)
- #226 - Fix handling of
BUSCO
output when run in auto lineage selection mode and selected specific lineage is the same as the generic one.
- #179 - Add BUSCO automated lineage selection functionality (new default). The pameter
--busco_auto_lineage_prok
can be used to only consider prokaryotes and the parameter --busco_download_path
to run BUSCO in offline
mode.
- #178 - Add taxonomic bin classification with
GTDB-Tk
v1.5.0
(for bins filtered based on BUSCO
QC metrics).
- #196 - Add process for CAT database creation as an alternative to using pre-built databases.
- #162 - Switch to DSL2
- #162 - Changed
--input
file format from TSV
to CSV
format, requires header now
- #162 - Update
README.md
, docs/usage.md
and docs/output.md
- #162 - Update
FastP
from version 0.20.0
to 0.20.1
- #162 - Update
Bowtie2
from version 2.3.5
to 2.4.2
- #162 - Update
FastQC
from version 0.11.8
to 0.11.9
- #172 - Compressed discarded MetaBAT2 output files
- #176 - Update CAT DB link
- #179 - Update
BUSCO
from version 4.1.4
to 5.1.0
- #179 - By default BUSCO now performs automated lineage selection instead of using the bacteria_odb10 lineage as reference. Specific lineage datasets can still be provided via
--busco_reference
.
- #178 - Change output file:
results/GenomeBinning/QC/quast_and_busco_summary.tsv
-> results/GenomeBinning/bin_summary.tsv
, contains GTDB-Tk results as well.
- #191 - Update to nf-core 1.14
TEMPLATE
- #193 - Compress CAT output files #180
- #198 - Requires nextflow version
>= 21.04.0
- #200 - Small changes in GitHub Actions tests
- #203 - Renamed
fastp
params and improved description in documentation: --mean_quality
-> --fastp_qualified_quality
, --trimming_quality
-> --fastp_cut_mean_quality
- #175 - Fix bug in retrieving the
--max_unbinned_contigs
longest unbinned sequences that are longer than --min_length_unbinned_contigs
(split_fasta.py
)
- #175 - Improved runtime of
split_fasta.py
in METABAT2
process (important for large assemblies, e.g. when computing co-assemblies)
- #194 - Allow different folder structures for Kraken2 databases containing
*.k2d
files #187
- #195 - Fix documentation regarding required compression of input FastQ files #160
- #196 - Add process for CAT database creation as solution for problem caused by incompatible
DIAMOND
version used for pre-built CAT database
and CAT classification
#90, #188
- #146 - Add
--coassemble_group
parameter to allow group-wise co-assembly
- #146 - Add
--binning_map_mode
parameter allowing different mapping strategies to compute co-abundances used for binning (all
, group
, own
)
- #149 - Add two new parameters to allow custom SPAdes and MEGAHIT options (
--spades_options
and --megahit_options
)
- #141 - Update to nf-core 1.12.1
TEMPLATE
- #143 - Manifest file has to be handed over via
--input
parameter now
- #143 - Changed format of manifest input file: requires a '.tsv' suffix and additionally contains group ID
- #143 - TSV
--input
file allows now also entries containing only short reads
- #145 - When using TSV input files, uses sample IDs now for
FastQC
instead of basenames of original read files. Allows non-unique file basenames.
- #143 - Change parameter:
--manifest
-> --input
- #135 - Update to nf-core 1.12
TEMPLATE
- #133 - Fixed processing of
--input
parameter #131
- #121 - Add full-size test
- #124 - Add worfklow overview figure to
README
- #123 - Update to new nf-core 1.11
TEMPLATE
- #118 - Fix
seaborn
to v0.10.1
to avoid nanoplot
error
- #120 - Fix link to CAT database in help message
- #124 - Fix description of
CAT
process in output.md
- #35 - Add social preview image
- #49 - Add host read removal with
Bowtie 2
and according custom section to MultiQC
- #49 - Add separate
MultiQC
section for FastQC
after preprocessing
- #65 - Add
MetaBAT2
RNG seed parameter --metabat_rng_seed
and set the default to 1 which ensures reproducible binning results
- #65 - Add parameters
--megahit_fix_cpu_1
, --spades_fix_cpus
and --spadeshybrid_fix_cpus
to ensure reproducible results from assembly tools
- #66 - Export
depth.txt.gz
into result folder
- #67 - Compress assembly files
- #82 - Add
nextflow_schema.json
- #104 - Add parameter
--save_busco_reference
- #56 - Update
MetaBAT2
from v2.13
to v2.15
- #46 - Update
MultiQC
from v1.7
to v1.9
- #88 - Update to new nf-core 1.10.2
TEMPLATE
- #88 -
--reads
is now removed, use --input
instead
- #101 - Prevented PhiX alignments from being stored in work directory #97
- #104, #111 - Update
BUSCO
from v3.0.2
to v4.1.4
- #29 - Fix
MetaBAT2
binning discards unbinned contigs #27
- #31, #36, #76, #107 - Fix links in README
- #47 - Fix missing
MultiQC
when --skip_quast
or --skip_busco
was specified
- #49, #89 - Added missing parameters to summary
- #50 - Fix missing channels when
--keep_phix
is specified
- #54 - Updated links to
minikraken db
- #54 - Fixed
Kraken2
dp preparation: allow different names for compressed archive file and contained folder as for some minikraken dbs
- #55 - Fixed channel joining for multiple samples causing
MetaBAT2
error #32
- #57 - Fix number of threads used by
MetaBAT2
program jgi_summarize_bam_contig_depths
- #70 - Fix
SPAdes
memory conversion issue #61
- #71 - No more ignoring errors in
SPAdes
assembly
- #72 - No more ignoring of
BUSCO
errors
- #73, #75 - Improved output documentation
- #96 - Fix missing bin names in
MultiQC
BUSCO section #78
- #104 - Fix
BUSCO
errors causing missing summary output #77
- #29 - Change depreciated parameters:
--singleEnd
-> --single_end
, --igenomesIgnore
-> --igenomes_ignore
Initial release of nf-core/mag, created with the nf-core template.
- short and long reads QC (fastp, porechop, filtlong, fastqc)
- Lambda and PhiX detection and filtering (bowtie2, nanolyse)
- Taxonomic classification of reads (centrifuge, kraken2)
- Short read and hybrid assembly (megahit, metaspades)
- metagenome binning (metabat2)
- QC of bins (busco, quast)
- annotation (cat/bat)