Skip to content

Latest commit

 

History

History
154 lines (109 loc) · 9.5 KB

README.md

File metadata and controls

154 lines (109 loc) · 9.5 KB

logo

What is Protologger?

Protologger is an all-in-one genome description tool, aimed at simplifying the process of gathering the data required for writing protologues. This includes providing; taxonomic, functional and ecological insights, described in detail below;

Taxonomic placement

  • 16S rRNA identity values
  • 16S rRNA gene phylogenetic tree
  • Taxonomic assignment based on GTDB (GTDB-Tk r89)
  • Genomic tree (phylophlan3)
  • ANI values (FastANI)
  • POCP values

Functional analysis

  • KEGG based pathway reconstruction
  • CAZyme profiling
  • Antibiotic resistance profiling (CARD)

Ecological analysis

  • Prevalence and average relative abundance across 1,000 samples for 19 unique environments (IMNGS)
  • Occurence based on MASH comparison to a database of >50,000 MAGs from thousands of samples

Reference

Hitch, T.C.A., Riedel, T., Oren, A. et al. Automated analysis of genomic sequences facilitates high-throughput and comprehensive description of bacteria. ISME Communications. 1, 16 (2021). https://doi.org/10.1038/s43705-021-00017-z

What are protologues and why are they needed?

According to the latest version of the International Code on the Nomenclature of Prokaryotes (ICNP), publication of a novel taxa must include a description of that taxas' features. The format of this information is termed a 'protologue', of which some examples are provided on the example page. Whilst the ICNP are vague on the specifics of what should be included, generally protologues include; the functional features, isolation source and taxonomic placement of the species in relation to existing validly named taxa. Therefore, Protologger provides all the necessary information for writing protologues, reducing the burden on cultivation experts for the validation of names for novel taxa.

Installation

Web-server

If you have single isolates that you wish to study, why not use our Galaxy web-server which, freely available at; http://protologger.de/

Included on our website is an implementation of GAN (The Great Automatic Nomenclator) (https://github.com/telatin/gan) which accepts Protologger output to generate ecological and functionally informed names.

Conda environment

Protologger has been re-written in python3 for easy conda installation on linux systems.

There are four steps that are required to get Protologger working;

  1. Create a python3 environment using the command; conda create -n protologger python=3.7 prokka
  2. Install Protologger into this environment; conda install -c thitch protologger
  3. Once installed, the databases must be downloaded using the following command; setup-protologger.sh
  4. Make sure you have Usearch installed (version 5.2.32 is the tested version) and is in your $PATH

When finished (~5 hours depending on your internet speed), Protologger will be ready to run. Additionally, the command protologger-update.sh can be run to download the latest validation list which is updated monthly.

If PROKKA has issues please try the following commands;

export PERL5LIB=$CONDA_PREFIX/lib/perl5/site_perl/5.22.0/
conda remove blast 
conda install -c bioconda blast=2.9.0
conda install -c conda-forge -c bioconda -c defaults prokka

Usage

Protologger.py requires three inputs to be run on the commandline, detailed below.

Input flag File type Description
-r Nucleotide FASTA file Provide a file containing the 16S rRNA gene sequence for your species of interest
-g Nucleotide FASTA file Provide the genome file of your species of interest
-p String Provide the name of your project which will be used to name your input and output folders
-q NA This option activates 'quick' mode which ignores both GTDB-Tk and PhyloPhlAn analysis, meaning Protologger can be run on a desktop PC

Datasets

Within the publication we provide the Protologger output for four distinct datasets; the HBC, the BIO-ML collection, the Hungate1000 and the iMGMC.

The Protologger output from all four datasets are downloadable here.

ChiBAC - Chicken Bacterial Collection

Protologger has been applied to characterise isolates from the chicken gut (CHiBAC), for which the entire Protologger outputs are available here

MAG datasets

We always aim to expand the MAG database used within Protologger. If there is an additional dataset you wish included, please contact us at; admin@protologger.de

The list of currently included datasets (numbers represent Bacterial and Archaeal numbers) is as follows;

Publication Number of MAGs Description
Parks et al (2020) 3,397 Generic
Woodcraft et al (2018) 568 Permafrost
Anantharaman et al (2016) 303 Groundwater
Crits-Cristoph et al (2018) 225 Soil
Dombrowski et al (2018) 36 Hydro-thermal sediment
Tully et al (2018) 339 Ocean
Lesker et al (2020) 831 Mouse gut
Wylensek et al (2020) 589 Pig gut
Stewart et al (2018) 488 Bovine rumen
Almeida et al (2019) 39,891 Human gut
Pasolli et al (2019) 225 Human stool/vagina/skin/oral cavity
Manara et al (2019) 1,008 soil

The list of datasets currently undergoing integration are;

Publication Number of MAGs Description
Wilkinson et al (2020) 3,397 African Boran rumen
Chen et al (2021) 1,358 Pig gut
Albanese et al (2021) 263 Cryptoendolithic community in Antarctica
Levin et al (2021) 1,209 Gut of 180 wild species
Jegousse et al (2021) 219 Icelandic marine water
Robbins et al (2021) ~1200 Coral sponge
Wibowo et al (2021) 498 Ancient human gut
Collins et al (2021) 111 Deep sea fish gut
Nayfach et al (2021) 52,515 Earth associated microbiota
Becraft et al (2021) 126 Deep subsurface
Krüger et al (2019) 3101 Algae blooms
Lavrinienko et al (2020) 254 Bank vole gut

Citation

Hitch, T.C.A., Riedel, T., Oren, A. et al. Automated analysis of genomic sequences facilitates high-throughput and comprehensive description of bacteria. ISME COMMUN. 1, 16 (2021). https://doi.org/10.1038/s43705-021-00017-z

We ask that anyone who uses Protologger cites not only our publication but the list of publications below which provide tools and databases which are integral for Protologgers working;

Tools

Databases