Releases · jtamames/SqueezeMeta

22 Apr 16:13

fpusan

v1.1.1

ee2b915

v1.1.1

New features

LCA steps are now multithreaded in SqueezeMeta and SQMreads.
The exportPathway function in SQMtools now can color KEGG reactions based on their abundance in the different samples, or in their log2 fold-change between two groups of samples.
The plotTaxonomy function in SQMtools now can optionally use an arbitrary maximum for the y-scale, and collapse partially classified reads into a single "Unclassified" category.

Minor changes / bugfixes

Fixed a bug in which the SQMtools package would not install in R < 3.5.
Fixed a bug in which SQMtools taxonomy tables did not work directly in DESeq2.
Fixed a bug in the PFAM table generation by sqm2tables.py.
Fixed a minor bug in which some non-standard NCBI taxa would cause checkm to fail.

Assets 2

10 Mar 16:05

fpusan

v1.1.0

51a7cef

v1.1.0 - Easier done than said

New features

SqueezeMeta can now be installed usind conda. This does not require any kind of root access. Just get in conda, install it with conda create -n SqueezeMeta -c fpusan -c bioconda squeezemeta and then activate the package with conda activate SqueezeMeta. Don't like conda? No problem, we'll keep updating and testing the old installation instructions too.
Added the utils/install_utils/test_install.pl script to quickly test whether all the required dependencies are present in the user's environment and that the databases are correctly downloaded and configured. The script will inform the user if any problem is found. This covers interpreters and libraries for perl, python, java, ruby and R, but not other system libraries, so passing the tests does not 100% guarantee that there won't be any problem when running the pipeline. Still, we hope it will help to identify common issues that may arise during installation.
We now provide the seqmerge assembly mode, which first perform an individual assembly of all the samples, and then combines them pairwaise in a sequential fashion. This should be less memory demanding that the coassembly and merged modes.
Added the --euk which will drop identity filters for the taxonomic annotation of eukaryotic contigs. This will greatly increase the number of eukaryotic sequences reported by SqueezeMeta. By default, SqueezeMeta applies Luo et al. (2014) identity cutoffs in order to assign an ORF to a given taxonomic rank. These cutoffs were devised using bacterial genomes, and may be too stringent for eukaryotic sequences.
We now provide the utils/combine-sqm-tables.py script to combine the outputs of different SqueezeMeta or SQM_reads projects. This can prove useful in high performance computation clusters, in which each computing node will be running "small" SqueezeMeta or SQM_reads projects consisting only of a few samples, which then need to be aggregated into a single set of results.
We now redistribute and apply aragorn for detecting tRNA coding regions in the contigs.
The nobinning flag can be added to the samples file to tag samples that will used for mapping but not for binning.
The scripts from the SqueezeMeta suite no longer need to be run in the same directory where the project was created, and should now consistently be able to read and write in arbitrary locations.
The functionality of the SQMtools R package was expanded:
- Annotations from user-supplied databases are now consideres
- Added the loadSQMlite function, which loads only the aggregated functional/taxonomic profiles (but not the orf/contig/bins tables and sequences). This has a much smaller memory footprint, and can be used with results coming from SQM_reads or combine-sqm-tables.py.
- Added the exportKrona (kudos to Giussepe d'Auria for the idea and the code!) and the exportPathway functions, to respectively generate Krona Charts and KEGG pathway plots from SQM or SQMlite objects.
SqueezeMeta works on top of LOTS of awesome software which deserves due credit! Running SqueezeMeta will now produce a citation report at <project>/methods.txt containing a summary of the software used in that run, and the corresponding bibliographic references.

Minor changes / bugfixes

Decreased memory usage while generating the ORF table.
Fixed missing nucleotide lengths for RNAs in the ORF table.
Fixed "{}" appearing instead of a rank code when Unclassified reads were summarized by sqm2tables.py.
Minimum R version for the SQMtools package was downgraded from 3.4.0 to 3.2.0.
Subset methods in the SQMtools package will now work when trying to subset only one ORF.
Binning will now work even if one of the two methods (maxbin or metabat) produces no results.
SQM_reads now supports the use of user-supplied databases for annotation, similarly to SqueezeMeta.pl.
Non-standard taxonomy ranks (e.g. suborder) are now ignored and treated as "no_rank" in order to avoid problems with checkm.
All python2 scripts were ported to python3.
All scripts in SqueezeMeta will now use the interpreters present in the user's PATH rather than being hardcoded to use /usr/bin/.
Fixed an issue producing incomplete taxonomic assignments in SQM_reads and the doublepass mode.
Megahit was updated to v1.2.9, which fixed some uncommon crashes.
Fixed an empty "Name" field appearing in the 12.*.cog.funcover table.
We now redistribute comparem so the sqm2itol.pl script should work out of the box.
Fixed an issue during database creation in which plasmid sequences ended up having a non-consistent taxonomy string that broke the parsers in sqm2tables.py and sqmreads2tables.py.
Fixed an issue during database creation in which symbiont species were missing parts of their species name
Contig nomenclature was standardized to assemblyMethod_contigNumber.
Added extra checks to ensure that reverse reads (pair2) have corresponding forward reads (pair1) in the samples file, which would cause the pipeline to stall at step 10.
Fixed base counting from SAM files with unusual CIGAR strings.
Change "RAW base count" to "Raw base count" in the headers of the ORF table for consistency.
sqm2tables.py and sqmreads2tables.py now generate extra information.
The SQM and SQMlite R objects now store extra information on function names.
The SqueezeMeta to anvi'o pipeline will now work with anvio 6.
The utils directory was subdivided into subdirectories to better organize the different functionalities of the SqueezeMeta suite. The ReadMe and the manual have been updated to reflect the new paths.

Assets 2

10 Jul 15:59

fpusan

v1.0.0

b9c03f1

v1.0.0 - Worth a thousand words

New features

This update focuses on the downstream analysis of SqueezeMeta results. It includes different ways of exploring your data and generating different plots. It also comes with a revamped PDF manual explaining all the SqueezeMeta algorithms in detail.
Integration with R: We provide the SQMtools R package, which allows to easily load a whole SqueezeMeta project and expose the results into R. The package includes functions to select particular taxa or functions and generate plots. The package also makes the different tables generated by SqueezeMeta easily available for third-party R packages such as vegan (for multivariate analysis), DESeq2 (for differential abundance testing) or for custom analysis pipelines. A description of the package can be found in the SqueezeMeta manual. The full documentation (including usage examples) can be found in the SQMtools_v0.3.pdf file.
Don't like R? We can't blame you. The sqm2tables.py script will generate tabular outputs that can be loaded in your favourite analysis environment.
Integration with the anvi'o analysis pipeline: We provide a compatibility layer for loading SqueezeMeta results into the anvi'o analysis and visualization platform (http://merenlab.org/software/anvio/). This includes a built-in query language for selecting the contigs to be visualized in the anvi'o interactive interface. Check the SqueezeMeta manual for more details.
We also include the sqm2itol.pl and sqm2pavian.pl scripts for generating itol and pavian -compatible outputs.
We have added the SQM_hmm_reads.plscript for performing sensitive searches for particular functions in unassembled reads. This comes in addition to SQM_reads.pl, which performs taxonomic and functional profiling of metagenomes without resorting to assembly.
We have added the remove_duplicate_markers.pl and find_missing_markers.pl scripts for refining individual bins.

Minor changes/bugfixes

Updated installation instructions.
Changed some column names in the 13.*.orftable, 19.*.bintable and 20.*.contigtable outputs so that the three tables follow a consistent naming style.
SqueezeMeta should no longer crash when fed a samples file generated in Windows.
Fixed abundance calculation in the bin table.
Fixed 16S rRNA taxonomy not appearing in the bin table.
Fixed reads and bases being swapped in the 11.*.mcount table.
Fixed the last ORF/RNA predicted by prodigal/barrnap having an empty nucleotide length in the orf table.

Assets 2

24 May 17:06

fpusan

V1.0.0-beta2

6a63a34

v1.0.0-beta2 Pre-release

Pre-release

This patch fixes a bug that appeared in v1.0.0-beta in which SqueezeMeta would die at step 13 when not using the doublepass mode.

This is a pre-release!

This update comes with significant additions and changes to the SqueezeMeta pipeline. We have already tested them extensively, but we still expect to include some extra features and bug fixes in the "real" v1.0.0.
If something is not working for you, please don't hesitate to open an issue or write us directly!

New features

We now include a PDF manual with details on SqueezeMeta and its different algorithms. Make sure to check it!
Different parameters for fine-tuning SqueezeMeta can be found (together with a brief documentation) in the .../SqueezeMeta/scripts/parameters.pl file. The different parameters are set to sensible defaults, but users can modify the file.
We have added the option of skipping the assembly step and working with an user-supplied assembly. This should help users that prefer to use a custom assembly pipeline (e.g. assembling minION reads with canu, and then using Illumina reads and pylon to correct the resulting contigs).
Users can now provide their own reference databases for functional annotation (e.g. of membrane transporters, mobile elements, antibiotic resistance genes... etc). The results will be seamlessly included into the different SqueezeMeta output files. Please refer to the PDF manual for details.
We have added an optional step for extra-sensitive ORF detection which combines the prodigal predictions with a BlastX search on parts of the contigs where no ORFs were predicted, or where predicted ORFs did not match anything in the taxonomic and functional databases. This can be selected by providing the --D (doublepass) flag when calling SqueezeMeta.
We added the .../SqueezeMeta/utils directory, which includes useful scripts related to the SqueezeMeta pìpeline. These include:
- sqm2itol.pl: generate the files required for creating a radial plot of bin abundances across samples using itol.
- make-tables.py: generate tabular outputs, suitable for analysis in environments such as R, summarizing the taxonomic and functional profiles obtained in a SqueezeMeta run.
- make-SqueezeMdb-files.py: generate the files required for loading a SqueezeMeta project into the built in MySQL database (https://github.com/jtamames/SqueezeMdb).
- SQM_reads.pl: run SqueezeMeta's taxonomic and functional classification algorithms on individual metagenomic reads. Short reads are harder to annotate, but provide a view of the metagenome that is free from assembly and ORF prediction biases.

Minor changes / Bug fixes

We now use TPM instead of RPKM for reporting normalized counts of genes and functions.
The format of the taxonomy strings produced by SqueezeMeta has changed from superkingdom:foo;phylum:bar;class:baz;... to k_foo;p_bar;c_baz;....
Some outputs have been moved from the results directory to two new directories called intermediate and ext_tables. Please refer to the PDF manual for details.
Increased the number of KEGG functions with detailed text descriptions in the 12..kegg.funcover and 13..orftable files.
SqueezeMeta should now provide more detailed error messages when dying.
Minimus2 now uses multiple threads to run nucmer.
Updated maxbin to v2.2.6. This fixes an error that appeared when assigning a large number of threads.
We now redistribute libpcre with SqueezeMeta, which should simplify installation in Centos7 and fix some issues with DAS_tool / pullseq.
Fixed the cause for a warning message (package ?methods? in options("defaultPackages") was not found) that appeared when running DAS_tool. Also removed some warnings about missing usearch/blast, as those programs are not needed for running DAS_tool within the SqueezeMeta pipeline.
SqueezeMeta should now print a warning message instead of dying in the unlikely case that no bins are reported by DAS_tool.

Assets 2

15 Apr 17:43

fpusan

v1.0.0-beta

4ef49cb

v1.0.0-beta Pre-release

Pre-release

This is a pre-release!

This update comes with significant additions and changes to the SqueezeMeta pipeline. We have already tested them extensively, but we still expect to include some extra features and bug fixes in the "real" v1.0.0.

New features

We now include a PDF manual with details on SqueezeMeta and its different algorithms. Make sure to check it!
Different parameters for fine-tuning SqueezeMeta can be found (together with a brief documentation) in the .../SqueezeMeta/scripts/parameters.pl file. The different parameters are set to sensible defaults, but users can modify the file.
We have added the option of skipping the assembly step and working with an user-supplied assembly. This should help users that prefer to use a custom assembly pipeline (e.g. assembling minION reads with canu, and then using Illumina reads and pylon to correct the resulting contigs).
Users can now provide their own reference databases for functional annotation (e.g. of membrane transporters, mobile elements, antibiotic resistance genes... etc). The results will be seamlessly included into the different SqueezeMeta output files. Please refer to the PDF manual for details.
We have added an optional step for extra-sensitive ORF detection which combines the prodigal predictions with a BlastX search on parts of the contigs where no ORFs were predicted, or where predicted ORFs did not match anything in the taxonomic and functional databases. This can be selected by providing the --D (doublepass) flag when calling SqueezeMeta.
We added the .../SqueezeMeta/utils directory, which includes useful scripts related to the SqueezeMeta pìpeline. These include:
- sqm2itol.pl: generate the files required for creating a radial plot of bin abundances across samples using itol.
- make-tables.py: generate tabular outputs, suitable for analysis in environments such as R, summarizing the taxonomic and functional profiles obtained in a SqueezeMeta run.
- make-SqueezeMdb-files.py: generate the files required for loading a SqueezeMeta project into the built in MySQL database (https://github.com/jtamames/SqueezeMdb).
- SQM_reads.pl: run SqueezeMeta's taxonomic and functional classification algorithms on individual metagenomic reads. Short reads are harder to annotate, but provide a view of the metagenome that is free from assembly and ORF prediction biases.

Minor changes / Bug fixes

We now use TPM instead of RPKM for reporting normalized counts of genes and functions.
The format of the taxonomy strings produced by SqueezeMeta has changed from superkingdom:foo;phylum:bar;class:baz;... to k_foo;p_bar;c_baz;....
Some outputs have been moved from the results directory to two new directories called intermediate and ext_tables. Please refer to the PDF manual for details.
Increased the number of KEGG functions with detailed text descriptions in the 12..kegg.funcover and 13..orftable files.
SqueezeMeta should now provide more detailed error messages when dying.
Minimus2 now uses multiple threads to run nucmer.
Updated maxbin to v2.2.6. This fixes an error that appeared when assigning a large number of threads.
We now redistribute libpcre with SqueezeMeta, which should simplify installation in Centos7 and fix some issues with DAS_tool / pullseq.
Fixed the cause for a warning message (package ?methods? in options("defaultPackages") was not found) that appeared when running DAS_tool. Also removed some warnings about missing usearch/blast, as those programs are not needed for running DAS_tool within the SqueezeMeta pipeline.
SqueezeMeta should now print a warning message instead of dying in the unlikely case that no bins are reported by DAS_tool.

Assets 2

05 Mar 09:48

fpusan

v0.4.4

ceb9e7c

v0.4.4

Announcements

The web interface for accessing SqueezeMeta results has been updated, and the remaining bugs have been ironed out. You can find it here.
The next version will likely be v1.0.0, and will include updated and more detailed documentation, extra features and more simple outputs. Stay tuned, and meanwhile write us if you have any question or would like something to be included!

Minor changes / bug fixes

Minpath can fail to run for small bins, if it finds zero potential pathways. This now prints a warning instead of propagating and killing SqueezeMeta.
The LCA step now generates an additional set of files 06.<PROJECT_NAME>.fun3.tax.noidfilter* including the taxonomic assignment that the ORF would have gotten if the identity filters from Luo et al. (2014) had not been applied. We still apply them by default throughout our pipeline, but we have become aware that they might be too stringent when working with uncultured eukaryotes. We thus provide the unfiltered taxonomy so that the user can search for her/his favourite bugs there.
The DIAMOND against the COG database was being run twice. This has been fixed.
Previously SqueezeMeta would die if DAS returned zero bins. This is an unlikely, but not impossible, scenario. Behaviour has been changed to print a warning and skip further bin-related steps.
Fixed minor bugs in output construction.

Assets 2

06 Feb 08:53

fpusan

v0.4.3

09f5524

v0.4.3

Minor changes

We now provide a pre-compiled version of the database, which can be downloaded via the script .../SqueezeMeta/scripts/preparing_databases/download_databases.pl <datapath>. This is quicker and safer against changes in NCBI.
Removed dependency on bedtools.

Assets 2

18 Jan 11:32

fpusan

v0.4.2

98c9d53

v0.4.2

Announcement

The latest NCBI nr release didn't play nice with our parser. In particular, one of the new entries led to the appearance of unclosed quotes in an intermediate file, which in turn led to SQLite not parsing the file from that point on. As a result, taxonomic annotation was far more scarce that it should be. Those of you that have been experiencing troubles and downloaded the database in the last month might want to re-build it using the scripts provided in this version, which fix that particular problem. We'll keep improving our parser so that issues like that become as infrequent as possible. Meanwhile, please smash that issue button if you find that something is not working as intended!

New features

SqueezeMeta does now work in CentOS7, in addition to Ubuntu14+. All necessary dependencies and installation instructions are listed in the INSTALL-CENTOS7 file.

Minor changes / Bug fixes

Removed dependency on GCC5.
Fixed an issue in which a recently-added entry in the NCBI nr database resulted in our LCA database being only partially created.
Added extra checks in the make_databases.pl script to ensure that the LCA SQLite database has the same number of rows as its plain text source file.

Assets 2

24 Dec 10:10

fpusan

v0.4.1

39e69f4

v0.4.1

Christmas announcement

We've started working on an installation guide for CentOS 7. Stay tuned!

Minor changes / Bug fixes

Contigs with no proteins are now included in the 19.*.contigtable file.
Fixed a bug occurring in newer versions of Perl (such as the one shipped with Ubuntu 18).
Fixed a bug in which CheckM was called incorrectly with bins assigned to Candidatus genera.
Fixed a minor bug when calculating the best average functional assignment for orfs.

Assets 2

08 Dec 08:55

fpusan

v0.4.0

eca020d

v0.4.0

New features

Changed name from SqueezeM to SqueezeMeta
Full support for long reads (MinION, PacBio) through canu and minimap2
Inclusion of DAS_tool for integrating the binning results of MaxBin and MetaBAT
Control diamond memory usage via the -b parameter

Minor changes / Bug fixes

Default minimum contig length is now 200 instead of 1200
SqueezeMeta.pl and restart.pl should now be much more consistent at stopping whenever an intermediate step fails, and display more informative error messages
Contig/bin "chimerism" is now called "disparity" instead
Fixed a bug in which the latest release of nr wasn't being properly parched during database creation
Fixed a bug in which some ORFs had an incomplete taxonomic annotation
Fixed a bug in which many ORFs were being ignored when running MinPath
Fixed some unfrequent bugs when using the merged mode on uncompressed files or single-end reads
Unmapped reads now count towards the total in rpkm normalization

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New features

Minor changes / bugfixes

New features

Minor changes / bugfixes

New features

Minor changes/bugfixes

This is a pre-release!

New features

Minor changes / Bug fixes

This is a pre-release!

New features

Minor changes / Bug fixes

Announcements

Minor changes / bug fixes

Minor changes

Announcement

New features

Minor changes / Bug fixes

Christmas announcement

Minor changes / Bug fixes

New features

Minor changes / Bug fixes

Releases: jtamames/SqueezeMeta

v1.1.1

New features

Minor changes / bugfixes

v1.1.0 - Easier done than said

New features

Minor changes / bugfixes

v1.0.0 - Worth a thousand words

New features

Minor changes/bugfixes

v1.0.0-beta2

This is a pre-release!

New features

Minor changes / Bug fixes

v1.0.0-beta

This is a pre-release!

New features

Minor changes / Bug fixes

v0.4.4

Announcements

Minor changes / bug fixes

v0.4.3

Minor changes

v0.4.2

Announcement

New features

Minor changes / Bug fixes

v0.4.1

Christmas announcement

Minor changes / Bug fixes

v0.4.0

New features

Minor changes / Bug fixes