Releases: CDCgov/phoenix
Releases · CDCgov/phoenix
v2.2.0
v2.2.0 (01/06/2026)
COMMAND CHANGE:
- Due to deprecation of
-entrysince nextflowv24.10.0we switched to the use of--modeto run specific workflowsPHOENIX,CDC_PHOENIXetc. this parameter is case insensitive.
Implemented Enhancements:
- Creation of
--mode UPDATE_PHOENIXto take in a phoenix directory (runs all samples in dir) or a samplesheet (with format "sample,dir") and update MLST and AR calls. Files will be overwritten inplace and a "${samplename}_updater_log.tsv" file will be created the first time this is run and will be updated everytime it is run there after. This file will contain a record of the what was updated and when. --create_ncbi_sheetnow creates separate excel sheets for each BioProject (if there is more than one in your run) to make upload to NCBI easier.- Updating the big 5 genes to be highlighed, particularly OXA genes has become too big of lift to hard code so the BLDB databased was added to reference and the process is described in wiki.
- To reduce the space needed to save phx output,
*.kraken2_trimd.classifiedreads.txtand*.kraken2_wtasmbld.classifiedreads.txtwere removed from phx output. If you need or want these files you can get them from the workdir for the process(es)KRAKEN2_TRIMDandKRAKEN2_ASMBLD. Alternatively, you can create your own config file and add back in the publishing of the files like this - Improved linking of Taxonomy across modules. Use of NCBI TaxID in ANI,Assembly_Ratio, GC_Content allows for more standardized comparisons across tools
- Expanded available taxonomy for MLST SRST2 to match the expansion of the pubMLST and other MLST databases.
- MLST profile output now merges novel alleles into a single profile (e.g. MLST and SRST2 both find a novel allele at the rpoB loci then the out put would show rpoB(12*,33~)) instead of showing 2 separate lines/profiles
- Code base reductions:
- Condensing GENERATE_PIPELINE_STATS modules and subworkflows.
- GRiPHin module was rewritten to be nextflowly (i.e. module runs off input files rather than a directory). Thanks to Savannah Linen (@ztb2), Andreea Stoica (@astoicame) and Les Kallestad (@lekalle) for their help with this.
- Removed DETERMINE_TAXA_ID_FAILURE, CREATE_SUMMARY_LINE_FAILURE and GENERATE_PIPELINE_STATS_FAILURE_EXQC modules, by condensing them into DETERMINE_TAXA_ID, CREATE_SUMMARY_LINE and GENERATE_PIPELINE_STATS_FAILURE respectively.
- Haemophilus influenzae and Bordetella pertussis added as possible taxa to pass to AMRFinder with
--organism. Burkholderia mallei moved from Burkholderia pseudomallei complex set and is just run as Burkholderia mallei.
Summary File Changes:
- For spades failures, lack of reads after trimming or corruption we simplifed the warnings produced in
GRiPHin.pyby supressing other warnings as the root cause is the aforementioned failures. Similarly, if the reason for the Auto QC Failure is "Assembly file not found" then only that is reported rather than listing files with unknowns. - New columns added to GRiPHin summary files:
PHX_Version,Final_Taxa_IDandShigaPass_Organism - For alignment across GRiPHin summary files and
Phoenix_Summary.tsvin the latter the columnsFinal_Taxa_IDandShigaPass_Organismwere added. Additionally, theSpeciescolumn was changed toFastANI_Organism,Taxa_ConfidencetoFastANI_%ID, andTaxa_CoveragetoFastANI_%Coverage. - AMRFinder files
Terra.bio Output Updates:
- Columns are now reported based on
*_GRiPHin_Summary.tsvexcept for the columnsBETA_LACTAM_RESISTANCE_GENES,OTHER_AR_GENES,AMRFINDER_POINT_MUTATIONS,HYPERVIRULENCE_GENESandPLASMID_INCOMPATIBILITY_REPLICONSstill come from thePhoenix_Summary.tsvfile.MLST1_NCBIandMLST2_NCBIadded columns, which are a combination of the MLSTs and MLST_SCHEMEs columns. These new columns are formated for uploading to NCBI following ARLN guidance.SHIGAPASS_TAXAis the output of Shigapass if it was run. TheTAXA_SOURCEcolumn will state if Shigapass was used for the final taxa call.FINAL_TAXA_IDis the final taxa call for the isolate.N50is the N50 fromQuast.WARNINGS_COUNTwas changed toWARNINGSand it is print out of the warnings, rather than just a count.- AMRFinderPlus genes are now reported in the columns
AMRFINDERPLUS_AMR_CLASSES,AMRFINDERPLUS_AMR_CORE_GENES,AMRFINDERPLUS_AMR_PLUS_GENES,AMRFINDERPLUS_AMR_SUBCLASSES,AMRFINDERPLUS_STRESS_GENESandAMRFINDERPLUS_VIRULENCE_GENES. - To reduce the space needed to save phx output,
*.kraken2_trimd.classifiedreads.txtand*.kraken2_wtasmbld.classifiedreads.txtare no longer output from PHX.*.kraken2_asmbld.classifiedreads.txtwas added as an output as taxids are in that file, which is different from the*.kraken2_wtasmbld.classifiedreads.txt. These files aren't really needed expect for edge cases such as questions about conflicting results or investigating suspected contamination.
- Due to deprecation of "When" block in nextflow
whenstatements in modules were removed and.filter{}is used throughout instead. Thanks to Savannah Linen (@ztb2), Andreea Stoica (@astoicame) and Les Kallestad (@lekalle) for their help implimenting this.
Fixed Bugs:
- Taxonomy Fixes:
- BAD BUG!!
sort_and_prep_dist.shwas not evaluating scientific notation so in some cases exact matches were not being reported. For context, when reviewing our dataset of 71,670 samples, 2,837 (~4%) had scientific notation in their mash distances, 31 (0.04%) have different species if you sort with the scientific notation compared to what PHX was originally reported. All of the 31 would be considered in the same complex, e.g. E. hormaechei/E. cloacae, E. coli/Shigella, K. michiganensis/K. oxytoca. Thus, the impact of previously reported taxa isn't expected to be large, but this fix could maybe resolve differences reported between MALDI/WGS. - Shigapass was added to distinguish correctly between E. coli/Shigella. If FastANI determines the species to be either E. coli or Shigella Shigapass will now run to confirm the call. In GRiPHin there is a new
Final_Taxa_IDcolumn that has the final determined call. The columnTaxa_sourcewill still sayANI_REFSEQif the FastANI call was kept and now will haveShigapassif the FastANI call was determined to be wrong by Shigapass and was thus overwritten. This was added to all entry points.
- BAD BUG!!
- Changes were made to allow
-resumeto work better. - More robust checks in PHoeNIx to pull in only sample_names correctly.
- Fixed error that caused the column "No_AR_Genes_Found" to not appear in the GRiPHin report.
- Fix for
--coveragebeing converted to a string when run on Seqera Cloud. Thanks to @DOH-JDJ0303 for the PR. - Changes to genes were highlighted in teh GRiPHin_Summary:
- Beta-Lactamase DataBase (BLDB) is now used as an input to determine which genes to highlight rather than hard coding. The big-5 genes that have their function labelled as ESBL/IR/IR ESBL were removed from being highlighted as part of the
big 5genes as are not thought to have carbapenemase acvitity. - Full details on highlighing methods found in the wiki
- Beta-Lactamase DataBase (BLDB) is now used as an input to determine which genes to highlight rather than hard coding. The big-5 genes that have their function labelled as ESBL/IR/IR ESBL were removed from being highlighted as part of the
- The column
Kraken_ID_Raw_Reads_%in the GRiPHin summary files (xlsx and tsv) was changed toKraken_ID_Trimmed_Reads_%to accurately reflect what that column has been reporting... whoopsie. - Fixed bug where passing samples were not entering BBDuk step due to forward/reverse being in the file name rather than R1/R2.
Container Updates:
- Containers updated to include developers bug fixes:
- amrfinderplus: v3.12.8 to v4.2.5
- busco: v5.4.7--pyhdfd78af_0 to v6.0.0
- bbtools: v39.01 to v39.13
- spades: v3.15.5 to v4.2.0
- quast: v5.0.2 to v5.3.0
- sra-tools: v3.1.1 to v3.2.0--h4304569_0
- entrez-direct: v16.2--he881be0_1 to v24.0--he881be0_0
- MLST: v2.23.0_07282023 to v2.25.0_12312025
- phx_base: python upgraded from 3.7.12 to 3.12.3, base image updated from jammy to 24.04.
Database Updates:
- Curated AR gene database was updated on 2025-12-08 (yyyy-mm-dd) to include the new AMRFinder database:
- AMRFinderPlus database
- Version 2025-12-03.1
- ResFinder
- Notably, NDM-58 and 60 were added. See history.txt file for more details (for this new version changes from 2024-12-13 to 2025-09-09 are included).
- [ARG-ANNOT](http://backup.mediterranee-infection.com/...
- AMRFinderPlus database
v2.1.1
v2.1.1 (03/25/2024)
Implemented Enhancements:
- The following OXA genes were added to be highlighted as blaOXA-48 like in the griphin summary: "blaOXA-1167","blaOXA-1181","blaOXA-1200","blaOXA-1201","blaOXA-1205","blaOXA-1207","blaOXA-1211","blaOXA-1212","blaOXA-1213".
Fixed Bugs:
- Fix for issue #130 Identified when an Isolate is incorrectly assigned to cronobacter scheme when it should have been ecloacae. Extension of larger scoring problem with MLST-2.23.0.
- Fixed #142 where names with multiple instances of "R2" in their name couldn't be parsed properly and don't move past the corruption check step. commit
7fc0ac3c026b7c12608be4dd1d3682675e31d0fe - Fixed an issue in FASTANI. The file for checking 80%+ identity could not be found because the DB_Version was not set when "No Mash Hits found" occurs.
- Updated amrfinderplus container from BLAST v2.14.0 --> v2.15.0 to fix #144.
Container Updates:
- Containers updated to include developers bug fixes:
- amrfinderplus: v3.11.26 to v3.12.8 which has changes on how AR genes are called.
Database Updates:
- Curated AR gene database was updated on 2024-02-29 (yyyy-mm-dd) to include the new AMRFinder database:
- AMRFinderPlus database
- Version 2024-01-31.1
- ARG-ANNOT and ResFinder haven't changed since last version release.
- AMRFinderPlus database
v2.1.0
v2.1.0 (02/11/2024)
Implemented Enhancements:
- Added handling for "unknown" assemblers in the scaffolds entry point so genomes can be downloaded from NCBI and run through PHoeNIx.
- For entry points CDC_PHOENIX or PHOENIX you can now use the argument
--create_ncbi_sheetto generate partially filled out excel sheets for uploading to NCBI. You will still need to fill in some lab/sample specific information and review for accuracy, but this should speed up the process. As a reminder, please do not submit raw sequencing data to the CDC HAI-Seq BioProject (531911) that are auto populated in these sheet unless you are a state public health laboratory, a CDC partner or have been directed to do so by DHQP. The BioProject accession IDs in these files are specifically designated for domestic HAI bacterial pathogen sequencing data, including from the Antimicrobial Resistance Laboratory Network (AR Lab Network), state public health labs, surveillance programs, and outbreaks. For inquiries about the appropriate BioProject location for your data, please contact HAISeq@cdc.gov. - New Terra workflow for combining
Phoenix_Summary.tsv,GRiPHin_Summary.tsvandGRiPHin_Summary.xlsxof multiple runs into one file. This workflow will also combine the NCBI excel sheets created when using the--create_ncbi_sheet. software_versions.ymlnow contains versions for all custom scripts used in the pipeline to streamline its validation process and align it with CLIA requirements, ensuring smoother compliance.- MultiQC now contains graphs and data from BBDuk, FastP, Quast and Kraken. BUSCO is also part of MultiQC if the entry point runs it (i.e. CDC_* entries).
- AMRFinder+ species that are screened for point mutations were updated with Enterobacter asburiae, Vibrio vulfinicus and Vibrio parahaemolyticus.
- A check was added to ensure only SRR numbers are passed to -entry
CDC_SRAandSRA. - After extensive QC cut off review addtional warnings and minimum QC cut-offs were added:
- Minimum PASS/FAIL:
- > 500 scaffolds
- FAIry (file integrity check) - see Fixed Bugs section below for details.
- Warnings:
- 200-500 scaffolds -> high, but not enough for failure
- Taxa Quality Checks:
- FastANI Coverage <90% and Match <95%
- For entries BUSCO <97%
- Contamination Checks:
- <70% of reads/weighted scaffolds assigned to top geneus hit.
- Added weighted scaffold to kraken <30% unclassifed check (was just on reads before)
- Added weighted scaffold to kraken only 1 genera >25% of assigned check (was just reads before)
- Minimum PASS/FAIL:
Output File Changes:
- The default outdir phx produces was changed. If the user doesn't pass
--outdir, the default was changed fromresultstophx_output. This was changed in response to feedback from compliance program, to avoid confusion regarding the difference between public health results (i.e. summary) and diagnostic results (i.e. report). - The
phx_output/FAIryfolder will contain a*_summaryline_failure.tsvfile for any isolate where file corruption was detected. *.taxfile had the NCBI assigned taxID added after the:for easy lookup.
Fixed Bugs:
- Updated
tower.ymlfile to reflect file name changes in v2.0.2. This will enable nf-tower reports to properly show up. commit e1b2b91 GRiPHin_Summary.xlsxwas highlighting coverage outside 40-100x despite--coveragesetting, changes made to respect--coverageflag.- Added a fix to handle when auto select by the mlst script chooses the wrong taxonomy. PHoeNIx will force a rerun in cases where the taxonomy is known but initial mlst is run against incorrect scheme. Known instances found so far include: E. coli (Pasteur) being incorrectly indentified as Aeromonas and E. coli (Pasteur) being identified as Klebsiella. The scoring in the MLST program was updated and can now cause lower count perfect hits (e.g. 6 of 6 Aeromonas genes at 100%) to be scored higher than novel correct hits (e.g. 7 of 8 at 100%, 1 novel gene).
- Corrected instance where, in some cases, an mlst scheme could not be determined that a proper out file was not created.
- Fixed issue with MLST where certain characters in filename would cause array index out of bounds error
- Fixed issue where samples that failed SPAdes did not have
--coverageparameter respected when generating synopsis file. - Fixed
-entry CDC_SCAFFOLDSproviding incorrect headers (missingBUSCOandBUSCO_DB). - Updated FAIry (file integrity check) to catch additional file integrity errors.
- FAIry detects and reports when:
- Corrupt fastq files that prevents the completion of gzip and zcat and generate a synopsis file when needed.
- If R1/R2 fastqs that do not have equal number of reads in the files.
- If there are no reads or scaffolds left after filtering and read trimming steps, respectively.
- FAIry detects and reports when:
Container Updates:
- Containers are now called with their sha256 to streamline PHoeNIx's validation process and align it with CLIA requirements.
- Containers updated to include developers bug fixes:
- fastp: v0.23.2 to v0.23.4 bug fixes.
- fastqc: v0.11.9 to v0.12.1 bug fixes.
- kraken2: v2.1.2 to v2.1.3 which has improvements on efficiency and bug fixes.
- fastani: v1.33 to v1.34 bug fixes. Specifically, it fixed multi-threading output bugs. Output and interface of FastANI remains same as before.
- amrfinderplus: v3.11.11 to v3.11.26 which has improvements on efficiency and bug fixes.
- SRAtools v3.0.3 to 3.0.9 updates and bug fixes.
- Container for SRA entry steps
SRATOOLS_FASTERQDUMPandSRATOOLS_PREFETCHwas switched to a quay.io/biocontainers to address issues with the old container and ICA. commit 68815e3 - The srst2 container version stays the same, but it is now in a custom container built from commit
73f885f55c748644412ccbaacecf12a771d0cae9as there has been a bug fix for a rounding penalty to integer without a new release. In addition, a fix was added to address issues related to handling grepping of '(' and ')'. Hosting updated container on quay.io.
Database Updates:
- MLST database was pulled from PubMLST and updated on Jan 24th, 2024.
- The Plasmid Replicons database was updated to include an update to the Enterobacteriales.fsa database.
- Curated AR gene database was updated on 2024-01-24 (yyyy-mm-dd) which includes:
- AMRFinderPlus database
- Version 2023-11-15.1
- ARG-ANNOT hasn't changed since the last time the database was created and contains updates since version NT v6 July 2019
- ResFinder
- Includes until 2024-01-28 commit 97d1fe0cd0a119172037f6bdb29f8a1c7c6e6019
- AMRFinderPlus database
v2.0.2
v2.0.2 (08/03/2023)
Implemented Enhancements:
- Added handling for -entry
SCAFFOLDSandCDC_SCAFFOLDSto accept assemblies from tricylcer and flye. - Added tsv version of GRiPHin_Summary.xlsx
Output File Changes:
- GRiPHin_samplesheet.csv changed to Directory_samplesheet.csv
- In response to feedback from compliance program, "report" is being replaced by "summary" in file names to avoid confusion regarding the difference between public health results (i.e. summary) and diagnostic results (i.e. report).
- GRiPHin_Report.xlsx changed to GRiPHin_Summary.xlsx
- Phoenix_Output_Report.tsv changed to Phoenix_Summary.tsv
- quast/${samplename}_report.txt changed to quast/${samplename}_summary.tsv
- kraken2_trimd/${samplename}.trimd_summary.txt changed to kraken2_asmbld/${samplename}.kraken2_trimd.top_kraken_hit.txt
- kraken2_asmbld/${samplename}.asmbld_summary.txt changed to kraken2_asmbld/${samplename}.kraken2_asmbld.top_kraken_hit.txt
- kraken2_asmbld_weighted/${samplename}.wtasmbld_summary.txt changed to kraken2_asmbld/${samplename}.kraken2_wtasmbld.top_kraken_hit.txt
- kraken2_trimd/${samplename}.kraken2_trimd.report.txt changed to kraken2_trimd/${samplename}.kraken2_trimd.summary.txt
- kraken2_asmbld/${samplename}.kraken2_asmbld.report.txt changed to kraken2_asmbld/${samplename}.kraken2_asmbld.summary.txt
- kraken2_asmbld_weighted/${samplename}.kraken2_wtasmbld.report.txt changed to kraken2_asmbld_weighted/${samplename}.kraken2_wtasmbld.summary.txt
Fixed Bugs:
- For MLST when final alleles were assigned, PHX called 100% match despite 1 allele not being a match.
- MLST step not using the custom database. A custom MLST container was added with this database included.
Container Updates:
- MLST version remains the same, but a custom database was added so that it no longer uses the database included in the software. Now hosted on quay.io.
- Bumped up base container (v2.0.2) to have openpyxl module.
v2.0.1
v2.0.0
Implemented Enhancements:
- entry point for scaffolds added using either
-entry SCAFFOLDSor-entry CDC_SCAFFOLDSthat runs everything post SPAdes step. New input parameters--indirand--scaffold_extadded for functionality of this entry point commit f12da60.- Supports scaffold files from shovill, spades and unicycler.
- entry point for sra added using either
-entry SRAor-entry CDC_SRA. These entry points will pull samples from SRA based on what is passed to--input_sra, which is a file with one SRR number per line commit a86ad3f. - Check now performed on input samplesheets to confirm the same sample id, forward read and reverse read aren't used multiple times in the samplesheet commit fd6127f.
- Changed many modules to
process_singlerather thanprocess_lowto reduce resource requirements for these steps. - Updates to run PHX on nf-tower with an AWS back-end. Also, updated
tower.ymlfile to have working reports. - AMRFinder+ was updated v3.11.11 allows point mutation calling for Burkholderia cepacia species complex, Burkholderia pseudomallei species complex, Serratia marcescens and Staphylococcus_pseudintermedius.
- Argument,
--coverageadded. Can be passed to increase coverage cut off that will cause sample to fail minimum qc standards (default is 30x). - Public Kraken2 database is required rather than requesting from sharefile. For PHoeNIx >=2.0.0 you will need to download the public Standard-8 version kraken2 database created on or after March 14th, 2023 from Ben Langmead's github page. You CANNOT use an older version of the public kraken databases on Ben Langmead's github page. We thank @BenLangmead and @jenniferlu717 for taking the time to include an extra file in public kraken databases created after March 14th, 2023 to allow them to work in PHoeNIx!
- For PHoeNIx <=1.1.1 you will need to download the public Standard-8 version kraken2 database created on May 17, 2021 from Ben Langmead's github page. The download link is https://genome-idx.s3.amazonaws.com/kraken/k2_standard_8gb_20210517.tar.gz.
- The kraken database can be passed as a uncompressed folder or just in its downloaded
.tar.gzform.
Output File Changes:
- The folder
fastqcwas changed tofastqc_trimdto clarify it contains results from the trimmed data. - PROKKA module now outputs
.fsafile (nucleotide file of genes) rather than.fnaas the.fnafile is really just the assembly file again. - Added version for base container information for
FAIRY,ASSET_CHECK,FORMAT_ANI,FETCH_FAILED_SUMMARIES,CREATE_SUMMARY_LINE,GATHER_SUMMARY_LINES, andGENERATE_PIPELINE_STATS. This was added tosoftware_versions.yml. - Changing the file/folder structure of some files for clarity and to make it less cluttered:
- Folders
AnnotationandAssemblywere changed toannotationandassemblyrespectively to keep continuity. - Files
kraken2_asmbld/*.unclassified.fastq.gzandkraken2_asmbld/*.classified.fastq.gzwere changed tokraken2_asmbld/*.unclassified.fasta.gzandkraken2_asmbld/*.classified.fasta.gzas they are actuallyfastafiles. *.fastANI.txt--> moved from~/ANI/fastANIto~/ANI.- The file
*_trimmed_read_counts.txtthat was infastp_trimdwas moved to the folderqc_stats. - Files
*_fastqc.zipand*_fastqc.htmlin folderfastqc_trimdmoved toqc_stats. *.bbduk.log--> moved from~/removedAdaptersto~/${sample}/qc_statsandremovedAdaptersis not longer and output folder.raw_statsfolder was created and contains${sample}_raw_read_counts.txtand${sample}_FAIry_synopsis.txt, previously these were in the foldersfastp_trimdandFAIry, respectively.
- Folders
- Sample GC% added to
*_GC_content_20230504.txtfile. *_trimmed_read_counts.txthasPaired_Sequenced_[reads]column added asTotal_Sequenced_[reads]is the number of the paired sequences and singletons.- Files produced from FastANI, MASH and FORMAT_ANI had mash database's data appended to the file name for tracking and validation. Files are now named
*${sample}_REFSEQ_20230504.ani.txt,${samplename}_REFSEQ_20230504.fastANI.txt,${samplename}_REFSEQ_20230504_best_MASH_hits.txtand${samplename}_REFSEQ_20230504.txt. - GRiPHin file updates
- New columns for
WARNINGS,ALERTS,Minimum_QC_Issues,Total_Raw_[reads],Paired_Trimmed_[reads]andGC%. - New column
Primary_MLST_Sourceas added to show if the assmebly (MLST program) or reads (SRST2) was used for MLST determination. Auto_PassFailandPassFail_Reasonwere changed toMinimum_QC_ChecksandMinimum_QC_Issues, respectively. This was to clarifiy these are minimum requirements for QC.- The column
Total_Sequenced_[bp]was removed from the report for lack of utility. Q30_R1_[%],Q30_R2_[%], andTotal_Sequenced_[reads]were relabelled asRaw_Q30_R1_[%],Raw_Q30_R2_[%]andTotal_Trimmed_[reads], respectively for clarity.
- New columns for
Fixed Bugs:
- Added module
GET_RAW_STATSto get raw stats, previously this was information was pulled fromFASTP_TRIMDstep, however, the input data here was postBBDUKwhich removes PhiX reads and adapters. Thus, the previous raw count was slightly off. - Fixed python version information not showing up for
GET_TAXA_FOR_AMRFINDERandGATHERING_TRIMD_READ_QC_STATS. This was added tosoftware_versions.yml. - Fixed issue where sample names with underscore it in caused incorrect parsing and contig number not showing up in GRiPHin reported genes commit a0fdff5.
- Fixed
AttributeError: 'DataFrame' object has no attribute 'map'error that came up in GRiPhin step when your set of samples had both a macrolide and macrolide_lincosamide_streptogramin AR gene commit 460bdbc. Phoenix_Output_Report.tsvwas reporting %Coverage for FastANI in theTaxa_Confidencecolumn rather than%ID. Now both are reported when FastANI is successful commit 3b26fec.GRiPHin_Report.xlsxwas switch from reported rounded numbers for coverage/similarity % to reporting the floor as reporting 100% when 99.5% is the actual number is misleading and doesn't alert the user to SNPs in genes. Now by switching to the floor 99.5% would be reported as 99% commit 5477627.- Corrected GAMMA modules not printing the right version in the
software_version.ymlfile commit 5477627.
Database Updates:
- Curated AR gene database was updated on 2023-05-17 (yyyy-mm-dd) which includes:
- AMRFinderPlus database
- Version 2023-04-17.1
- ARG-ANNOT
- Latest version NT v6 July 2019
- ResFinder
- Bumped from
v2.0.0tov2.1.0including until 2023-04-12 commit f46d8fc.
- Bumped from
- AMRFinderPlus database
- Updated AMRFinder Database used by AMRFinder+ and GAMMA to v2023-04-17.1.
SRST2_MLSTandMLSTstep now use the mlst_db which is provided in~/phoenix/assests/databasesthis is now static and no longer pulls updates from PubMLST.org. This will keep the pipeline running when PubMLST.org is down and keeps the schemes from changing if you run the same sample at different times. This was implemented to deal with PubMLST.org being down fairly often and with pipeline validation in mind.
Container Updates:
v1.1.1
Implemented Enhancements:
-entry CDC_PHOENIXworkflow checks all FASTQ files for corruption and creates a list of the checked files usng the FAIry (FASTQ file Assesment of Integrity) tool commit 1111df8. This is a required internal QC check.- Expanded MLST lookup of Citrobacter species complex commit 43ea24d lists the new species.
- Increased SPAdes CPUs to 8 and memory to 16GB in
base.config.
Fixed Bugs:
- Fix for issue #99 where first gene in ar, plasmid and hypervirulence genes didn't end up in the
*_summaryline.tsv. This same error was inPhoenix_summary_line.pythat caused the first sample to not be include in the final report. - Fixed tabulation error into
*_combined.tsvoutput files that in some cases would show inGRiPHin_Report.xlsxoutput as a long singular line as the MLST type. - Fix for issue #91 where Klebsiella MLST lookup would not properly match to the correct lookup database.
- Fixed problem where samples that didn't create scaffolds, but created contigs didn't have species printed out in
Phoenix_Output_Report.tsvdetails in commit c7f7ea5. - Fixed problem in
-entry CDC_PHOENIXwhere samples that didn't create scaffolds, but created contigs or samples that failed spades completely didn't have correct columns lining up inPhoenix_Output_Report.tsvdetails in commit d17bdda.
v1.1.0
Implemented Enhancements:
- Default branch set to main thanks @erinyoung #84.
- Added emits to allow linking of workflows to close #42 #e32132d.
- MLST output is now scanned for completeness of profiles by consolidating any allele tags to the ST column for easier scanning as well as known paralog alleles are marked for easier identification. In CDC_PHOENIX workflow ST types are consolidated, if applicable, to show concordance bewteen tools.
- Addition of 🔥🐎🐦🔥 GRiPhin: General Report Pipeline from PHoeNIx output to
-entry CDC_PHOENIX#6291e9c. This was implemented to replace common report generated internally, which is why it is only in the-entry CDC_PHOENIX. - Changes to allow relative paths for kraken2 and BUSCO database to be passed rather than it requiring it to be a full path #ecb3618 and #d938a64.
Phoenix_Output_Report.tsvnow has antibiotic genes and plasmid markers filtered to ensure quality #d0fa32c.- Plasmid markers require >=60% length and >=98% identity to be reported
- Antibiotic Genes require >=90% length and >=98% identity to be reported
Output File Changes:
- Removed spaces in header of
*_all_genes.tsvfile from AMRFinder+ output and replace with underscore to allow for more friendly parsing #fd048d1. - Fixed error causing PROKKA output to not be in Annotation folder #d014aa0.
- Added headers to 2 files:
*.fastANI.txtand*.wtasmbld_summary.txt. - Also, added headers to
phoenix_line_summary.tsvsee wiki for details. - MLST final output that includes different headers and organization was renamed to
*_combined.tsvwhich includes srst2 types, if appicable, paralog tags, and any extra allele/profile tags.
Fixed Bugs:
- Edit to allow nf-tower to work #b21d61f
- Fixed pipeline failure when prokka throws error for sample names being too long (Error: ID must <= 37 chars long) #e48e01f. Now sample name length doesn't matter.
- Fixed bug where samples wouldn't end up in the
Phoenix_Output_Report.tsvdue to srst2 not finding any AR genes so the file wasn't created. Now blank file is created and remaining sample informatin is in thePhoenix_Output_Report.tsv#2f52edc. This change only occured in-entry CDC_PHOENIX. - Fixed issue where
cperror was thrown when relative path was given for output directory #0c0ca55 and #d938a64.
Database Updates:
- AMRFinder+ database is now static and included in the database folder #a5d2d03. We removed the automatic updating for more control of the pipeline and lockdown to prepare for possible CLIA requirements.
- Version 2022-08-09.1 currently used to be the same as the one in the curated db.
- Curated AR gene database was updated on 2022-09-15 (yyyy-mm-dd) which includes:
- AMRFinderPlus database
- Version 2022-08-09.1
- ARG-ANNOT
- Latest version NT v6 July 2019
- ResFinder
- Includes until 2022-08-08 commit 39f4b26
- AMRFinderPlus database
- Fresh pull of plasmidfinder database on 2022-09-16 up to commit 9002e72
- Updated Mash sketch from all complete refseq bacteria on 2022-09-15
- Updated a NCBI Assembly stats file, which is calculated based on this file
Container Updates:
- MLST updated from 2.22.1 to 2.23.0.
- BBTools updated from 38.96 to 39.01.
- AMRFinder+ was updated from 3.10.40 to 3.10.45.
- Scripts the utilize the phoenix_base container were updated to
quay.io/jvhagey/phoenix:base_v1.1.0which had the python libraryxlsxwriteradded to it forGRiPHin.py.