Releases: jtamames/SqueezeMeta
Releases · jtamames/SqueezeMeta
v1.1.1
New features
- LCA steps are now multithreaded in SqueezeMeta and SQMreads.
- The
exportPathway
function in SQMtools now can color KEGG reactions based on their abundance in the different samples, or in their log2 fold-change between two groups of samples. - The
plotTaxonomy
function in SQMtools now can optionally use an arbitrary maximum for the y-scale, and collapse partially classified reads into a single "Unclassified" category.
Minor changes / bugfixes
- Fixed a bug in which the SQMtools package would not install in R < 3.5.
- Fixed a bug in which SQMtools taxonomy tables did not work directly in DESeq2.
- Fixed a bug in the PFAM table generation by
sqm2tables.py
. - Fixed a minor bug in which some non-standard NCBI taxa would cause checkm to fail.
v1.1.0 - Easier done than said
New features
- SqueezeMeta can now be installed usind conda. This does not require any kind of root access. Just get in conda, install it with
conda create -n SqueezeMeta -c fpusan -c bioconda squeezemeta
and then activate the package withconda activate SqueezeMeta
. Don't like conda? No problem, we'll keep updating and testing the old installation instructions too. - Added the
utils/install_utils/test_install.pl
script to quickly test whether all the required dependencies are present in the user's environment and that the databases are correctly downloaded and configured. The script will inform the user if any problem is found. This covers interpreters and libraries for perl, python, java, ruby and R, but not other system libraries, so passing the tests does not 100% guarantee that there won't be any problem when running the pipeline. Still, we hope it will help to identify common issues that may arise during installation. - We now provide the
seqmerge
assembly mode, which first perform an individual assembly of all the samples, and then combines them pairwaise in a sequential fashion. This should be less memory demanding that thecoassembly
andmerged
modes. - Added the
--euk
which will drop identity filters for the taxonomic annotation of eukaryotic contigs. This will greatly increase the number of eukaryotic sequences reported by SqueezeMeta. By default, SqueezeMeta applies Luo et al. (2014) identity cutoffs in order to assign an ORF to a given taxonomic rank. These cutoffs were devised using bacterial genomes, and may be too stringent for eukaryotic sequences. - We now provide the
utils/combine-sqm-tables.py
script to combine the outputs of different SqueezeMeta or SQM_reads projects. This can prove useful in high performance computation clusters, in which each computing node will be running "small" SqueezeMeta or SQM_reads projects consisting only of a few samples, which then need to be aggregated into a single set of results. - We now redistribute and apply aragorn for detecting tRNA coding regions in the contigs.
- The
nobinning
flag can be added to the samples file to tag samples that will used for mapping but not for binning. - The scripts from the SqueezeMeta suite no longer need to be run in the same directory where the project was created, and should now consistently be able to read and write in arbitrary locations.
- The functionality of the SQMtools R package was expanded:
- Annotations from user-supplied databases are now consideres
- Added the
loadSQMlite
function, which loads only the aggregated functional/taxonomic profiles (but not the orf/contig/bins tables and sequences). This has a much smaller memory footprint, and can be used with results coming from SQM_reads orcombine-sqm-tables.py
. - Added the
exportKrona
(kudos to Giussepe d'Auria for the idea and the code!) and theexportPathway
functions, to respectively generate Krona Charts and KEGG pathway plots fromSQM
orSQMlite
objects.
- SqueezeMeta works on top of LOTS of awesome software which deserves due credit! Running SqueezeMeta will now produce a citation report at
<project>/methods.txt
containing a summary of the software used in that run, and the corresponding bibliographic references.
Minor changes / bugfixes
- Decreased memory usage while generating the ORF table.
- Fixed missing nucleotide lengths for RNAs in the ORF table.
- Fixed "{}" appearing instead of a rank code when Unclassified reads were summarized by
sqm2tables.py
. - Minimum R version for the SQMtools package was downgraded from 3.4.0 to 3.2.0.
- Subset methods in the SQMtools package will now work when trying to subset only one ORF.
- Binning will now work even if one of the two methods (maxbin or metabat) produces no results.
- SQM_reads now supports the use of user-supplied databases for annotation, similarly to
SqueezeMeta.pl
. - Non-standard taxonomy ranks (e.g. suborder) are now ignored and treated as "no_rank" in order to avoid problems with checkm.
- All python2 scripts were ported to python3.
- All scripts in SqueezeMeta will now use the interpreters present in the user's PATH rather than being hardcoded to use
/usr/bin/
. - Fixed an issue producing incomplete taxonomic assignments in SQM_reads and the doublepass mode.
- Megahit was updated to v1.2.9, which fixed some uncommon crashes.
- Fixed an empty "Name" field appearing in the
12.*.cog.funcover
table. - We now redistribute comparem so the
sqm2itol.pl
script should work out of the box. - Fixed an issue during database creation in which plasmid sequences ended up having a non-consistent taxonomy string that broke the parsers in
sqm2tables.py
andsqmreads2tables.py
. - Fixed an issue during database creation in which symbiont species were missing parts of their species name
- Contig nomenclature was standardized to
assemblyMethod_contigNumber
. - Added extra checks to ensure that reverse reads (pair2) have corresponding forward reads (pair1) in the samples file, which would cause the pipeline to stall at step 10.
- Fixed base counting from SAM files with unusual CIGAR strings.
- Change "RAW base count" to "Raw base count" in the headers of the ORF table for consistency.
sqm2tables.py
andsqmreads2tables.py
now generate extra information.- The
SQM
andSQMlite
R objects now store extra information on function names. - The SqueezeMeta to anvi'o pipeline will now work with anvio 6.
- The
utils
directory was subdivided into subdirectories to better organize the different functionalities of the SqueezeMeta suite. The ReadMe and the manual have been updated to reflect the new paths.
v1.0.0 - Worth a thousand words
New features
- This update focuses on the downstream analysis of SqueezeMeta results. It includes different ways of exploring your data and generating different plots. It also comes with a revamped PDF manual explaining all the SqueezeMeta algorithms in detail.
- Integration with R: We provide the SQMtools R package, which allows to easily load a whole SqueezeMeta project and expose the results into R. The package includes functions to select particular taxa or functions and generate plots. The package also makes the different tables generated by SqueezeMeta easily available for third-party R packages such as vegan (for multivariate analysis), DESeq2 (for differential abundance testing) or for custom analysis pipelines. A description of the package can be found in the SqueezeMeta manual. The full documentation (including usage examples) can be found in the
SQMtools_v0.3.pdf
file. - Don't like R? We can't blame you. The
sqm2tables.py
script will generate tabular outputs that can be loaded in your favourite analysis environment. - Integration with the anvi'o analysis pipeline: We provide a compatibility layer for loading SqueezeMeta results into the anvi'o analysis and visualization platform (http://merenlab.org/software/anvio/). This includes a built-in query language for selecting the contigs to be visualized in the anvi'o interactive interface. Check the SqueezeMeta manual for more details.
- We also include the
sqm2itol.pl
andsqm2pavian.pl
scripts for generating itol and pavian -compatible outputs. - We have added the
SQM_hmm_reads.pl
script for performing sensitive searches for particular functions in unassembled reads. This comes in addition toSQM_reads.pl
, which performs taxonomic and functional profiling of metagenomes without resorting to assembly. - We have added the
remove_duplicate_markers.pl
andfind_missing_markers.pl
scripts for refining individual bins.
Minor changes/bugfixes
- Updated installation instructions.
- Changed some column names in the
13.*.orftable
,19.*.bintable
and20.*.contigtable
outputs so that the three tables follow a consistent naming style. - SqueezeMeta should no longer crash when fed a samples file generated in Windows.
- Fixed abundance calculation in the bin table.
- Fixed 16S rRNA taxonomy not appearing in the bin table.
- Fixed reads and bases being swapped in the
11.*.mcount
table. - Fixed the last ORF/RNA predicted by prodigal/barrnap having an empty nucleotide length in the orf table.
v1.0.0-beta2
- This patch fixes a bug that appeared in v1.0.0-beta in which SqueezeMeta would die at step 13 when not using the doublepass mode.
This is a pre-release!
- This update comes with significant additions and changes to the SqueezeMeta pipeline. We have already tested them extensively, but we still expect to include some extra features and bug fixes in the "real" v1.0.0.
- If something is not working for you, please don't hesitate to open an issue or write us directly!
New features
- We now include a PDF manual with details on SqueezeMeta and its different algorithms. Make sure to check it!
- Different parameters for fine-tuning SqueezeMeta can be found (together with a brief documentation) in the
.../SqueezeMeta/scripts/parameters.pl
file. The different parameters are set to sensible defaults, but users can modify the file. - We have added the option of skipping the assembly step and working with an user-supplied assembly. This should help users that prefer to use a custom assembly pipeline (e.g. assembling minION reads with canu, and then using Illumina reads and pylon to correct the resulting contigs).
- Users can now provide their own reference databases for functional annotation (e.g. of membrane transporters, mobile elements, antibiotic resistance genes... etc). The results will be seamlessly included into the different SqueezeMeta output files. Please refer to the PDF manual for details.
- We have added an optional step for extra-sensitive ORF detection which combines the prodigal predictions with a BlastX search on parts of the contigs where no ORFs were predicted, or where predicted ORFs did not match anything in the taxonomic and functional databases. This can be selected by providing the --D (doublepass) flag when calling SqueezeMeta.
- We added the
.../SqueezeMeta/utils
directory, which includes useful scripts related to the SqueezeMeta pìpeline. These include:- sqm2itol.pl: generate the files required for creating a radial plot of bin abundances across samples using itol.
- make-tables.py: generate tabular outputs, suitable for analysis in environments such as R, summarizing the taxonomic and functional profiles obtained in a SqueezeMeta run.
- make-SqueezeMdb-files.py: generate the files required for loading a SqueezeMeta project into the built in MySQL database (https://github.com/jtamames/SqueezeMdb).
- SQM_reads.pl: run SqueezeMeta's taxonomic and functional classification algorithms on individual metagenomic reads. Short reads are harder to annotate, but provide a view of the metagenome that is free from assembly and ORF prediction biases.
Minor changes / Bug fixes
- We now use TPM instead of RPKM for reporting normalized counts of genes and functions.
- The format of the taxonomy strings produced by SqueezeMeta has changed from
superkingdom:foo;phylum:bar;class:baz;...
tok_foo;p_bar;c_baz;...
. - Some outputs have been moved from the results directory to two new directories called intermediate and ext_tables. Please refer to the PDF manual for details.
- Increased the number of KEGG functions with detailed text descriptions in the 12..kegg.funcover and 13..orftable files.
- SqueezeMeta should now provide more detailed error messages when dying.
- Minimus2 now uses multiple threads to run nucmer.
- Updated maxbin to v2.2.6. This fixes an error that appeared when assigning a large number of threads.
- We now redistribute libpcre with SqueezeMeta, which should simplify installation in Centos7 and fix some issues with DAS_tool / pullseq.
- Fixed the cause for a warning message (package ?methods? in options("defaultPackages") was not found) that appeared when running DAS_tool. Also removed some warnings about missing usearch/blast, as those programs are not needed for running DAS_tool within the SqueezeMeta pipeline.
- SqueezeMeta should now print a warning message instead of dying in the unlikely case that no bins are reported by DAS_tool.
v1.0.0-beta
This is a pre-release!
- This update comes with significant additions and changes to the SqueezeMeta pipeline. We have already tested them extensively, but we still expect to include some extra features and bug fixes in the "real" v1.0.0.
New features
- We now include a PDF manual with details on SqueezeMeta and its different algorithms. Make sure to check it!
- Different parameters for fine-tuning SqueezeMeta can be found (together with a brief documentation) in the
.../SqueezeMeta/scripts/parameters.pl
file. The different parameters are set to sensible defaults, but users can modify the file. - We have added the option of skipping the assembly step and working with an user-supplied assembly. This should help users that prefer to use a custom assembly pipeline (e.g. assembling minION reads with canu, and then using Illumina reads and pylon to correct the resulting contigs).
- Users can now provide their own reference databases for functional annotation (e.g. of membrane transporters, mobile elements, antibiotic resistance genes... etc). The results will be seamlessly included into the different SqueezeMeta output files. Please refer to the PDF manual for details.
- We have added an optional step for extra-sensitive ORF detection which combines the prodigal predictions with a BlastX search on parts of the contigs where no ORFs were predicted, or where predicted ORFs did not match anything in the taxonomic and functional databases. This can be selected by providing the --D (doublepass) flag when calling SqueezeMeta.
- We added the
.../SqueezeMeta/utils
directory, which includes useful scripts related to the SqueezeMeta pìpeline. These include:- sqm2itol.pl: generate the files required for creating a radial plot of bin abundances across samples using itol.
- make-tables.py: generate tabular outputs, suitable for analysis in environments such as R, summarizing the taxonomic and functional profiles obtained in a SqueezeMeta run.
- make-SqueezeMdb-files.py: generate the files required for loading a SqueezeMeta project into the built in MySQL database (https://github.com/jtamames/SqueezeMdb).
- SQM_reads.pl: run SqueezeMeta's taxonomic and functional classification algorithms on individual metagenomic reads. Short reads are harder to annotate, but provide a view of the metagenome that is free from assembly and ORF prediction biases.
Minor changes / Bug fixes
- We now use TPM instead of RPKM for reporting normalized counts of genes and functions.
- The format of the taxonomy strings produced by SqueezeMeta has changed from
superkingdom:foo;phylum:bar;class:baz;...
tok_foo;p_bar;c_baz;...
. - Some outputs have been moved from the results directory to two new directories called intermediate and ext_tables. Please refer to the PDF manual for details.
- Increased the number of KEGG functions with detailed text descriptions in the 12..kegg.funcover and 13..orftable files.
- SqueezeMeta should now provide more detailed error messages when dying.
- Minimus2 now uses multiple threads to run nucmer.
- Updated maxbin to v2.2.6. This fixes an error that appeared when assigning a large number of threads.
- We now redistribute libpcre with SqueezeMeta, which should simplify installation in Centos7 and fix some issues with DAS_tool / pullseq.
- Fixed the cause for a warning message (package ?methods? in options("defaultPackages") was not found) that appeared when running DAS_tool. Also removed some warnings about missing usearch/blast, as those programs are not needed for running DAS_tool within the SqueezeMeta pipeline.
- SqueezeMeta should now print a warning message instead of dying in the unlikely case that no bins are reported by DAS_tool.
v0.4.4
Announcements
- The web interface for accessing SqueezeMeta results has been updated, and the remaining bugs have been ironed out. You can find it here.
- The next version will likely be v1.0.0, and will include updated and more detailed documentation, extra features and more simple outputs. Stay tuned, and meanwhile write us if you have any question or would like something to be included!
Minor changes / bug fixes
- Minpath can fail to run for small bins, if it finds zero potential pathways. This now prints a warning instead of propagating and killing SqueezeMeta.
- The LCA step now generates an additional set of files
06.<PROJECT_NAME>.fun3.tax.noidfilter*
including the taxonomic assignment that the ORF would have gotten if the identity filters from Luo et al. (2014) had not been applied. We still apply them by default throughout our pipeline, but we have become aware that they might be too stringent when working with uncultured eukaryotes. We thus provide the unfiltered taxonomy so that the user can search for her/his favourite bugs there. - The DIAMOND against the COG database was being run twice. This has been fixed.
- Previously SqueezeMeta would die if DAS returned zero bins. This is an unlikely, but not impossible, scenario. Behaviour has been changed to print a warning and skip further bin-related steps.
- Fixed minor bugs in output construction.
v0.4.3
Minor changes
- We now provide a pre-compiled version of the database, which can be downloaded via the script
.../SqueezeMeta/scripts/preparing_databases/download_databases.pl <datapath>
. This is quicker and safer against changes in NCBI. - Removed dependency on bedtools.
v0.4.2
Announcement
- The latest NCBI nr release didn't play nice with our parser. In particular, one of the new entries led to the appearance of unclosed quotes in an intermediate file, which in turn led to SQLite not parsing the file from that point on. As a result, taxonomic annotation was far more scarce that it should be. Those of you that have been experiencing troubles and downloaded the database in the last month might want to re-build it using the scripts provided in this version, which fix that particular problem. We'll keep improving our parser so that issues like that become as infrequent as possible. Meanwhile, please smash that issue button if you find that something is not working as intended!
New features
- SqueezeMeta does now work in CentOS7, in addition to Ubuntu14+. All necessary dependencies and installation instructions are listed in the INSTALL-CENTOS7 file.
Minor changes / Bug fixes
- Removed dependency on GCC5.
- Fixed an issue in which a recently-added entry in the NCBI nr database resulted in our LCA database being only partially created.
- Added extra checks in the make_databases.pl script to ensure that the LCA SQLite database has the same number of rows as its plain text source file.
v0.4.1
Christmas announcement
- We've started working on an installation guide for CentOS 7. Stay tuned!
Minor changes / Bug fixes
- Contigs with no proteins are now included in the 19.*.contigtable file.
- Fixed a bug occurring in newer versions of Perl (such as the one shipped with Ubuntu 18).
- Fixed a bug in which CheckM was called incorrectly with bins assigned to Candidatus genera.
- Fixed a minor bug when calculating the best average functional assignment for orfs.
v0.4.0
New features
- Changed name from SqueezeM to SqueezeMeta
- Full support for long reads (MinION, PacBio) through canu and minimap2
- Inclusion of DAS_tool for integrating the binning results of MaxBin and MetaBAT
- Control diamond memory usage via the -b parameter
Minor changes / Bug fixes
- Default minimum contig length is now 200 instead of 1200
- SqueezeMeta.pl and restart.pl should now be much more consistent at stopping whenever an intermediate step fails, and display more informative error messages
- Contig/bin "chimerism" is now called "disparity" instead
- Fixed a bug in which the latest release of nr wasn't being properly parched during database creation
- Fixed a bug in which some ORFs had an incomplete taxonomic annotation
- Fixed a bug in which many ORFs were being ignored when running MinPath
- Fixed some unfrequent bugs when using the merged mode on uncompressed files or single-end reads
- Unmapped reads now count towards the total in rpkm normalization