Skip to content

Installing MetaRefSGB

Fabio Cumbo edited this page Dec 3, 2021 · 5 revisions

We provide our framework as a conda package.

It can be easily installed from the bioconda channel by typing the following command in your shell:

conda install -c bioconda MetaRefSGB

However, you may need to add the bioconda channel first:

conda config --add channels bioconda

You can also use MetaRefSGB by cloning this repository and running the MetaRefSGB script. However, we strongly recommend to adopt conda as the first option because it will automatically solve all the framework software dependencies.

The following section is for a non-conda installation

If you do not use conda, then remember to check that the following external software dependencies are installed and available on your system:

  • python (version >= 3.7)
  • pip
  • checkm (version >= 1.1.3)
  • mash (version >= 2.0)
  • bzip2
  • bzgrep
  • wget

Please, also be sure that the following python dependencies are correctly installed:

  • pyyaml (version >= 5.4)
  • fastcluster (version >= 1.1.25)
  • numpy (version >= 1.16.3)
  • pandas (version >= 1.0.1)
  • scipy (version >= 1.3.0)
  • jsonschema (version >= 3.0.2)
  • requests (version >= 2.22.0)
  • tqdm (version >= 4.38.0)
  • biopython (version >= 1.79)

While Python should be already available on your system if you are using a recent Linux or MacOS distributions, we strongly suggest you to download the most updated precompiled binary of MASH from the releases section of the official GitHub repository. For what concerns CheckM, we suggest you to install it through pip or conda, but it will require in any case a couple of extra steps to correctly link the software to its database. This must be necessarily executed manually as reported on the official CheckM Wiki.

First, you need to download the last available database from the following repository https://data.ace.uq.edu.au/public/CheckM_databases/, decompress it on a dedicated location, and finally inform CheckM about where its database is located by typing:

checkm data setRoot <checkm_data_dir>

Thus, you must download the most updated version of MetaRefSGB from the releases section or clone the whole repository by typing:

git clone https://github.com/SegataLab/MetaRefSGB

You should now move to the MetaRefSGB directory and make the main script executable with:

chmod +x MetaRefSGB

Additionally, we strongly suggest to add the MetaRefSGB root folder to the PATH env variable. This will allow you to call MetaRefSGB even if you are not located on its folder:

PATH=$PATH:~/MetaRefSGB

Please note that you should change ~/MetaRefSGB with the correct path on which MetaRefSGB is located on your system. You may also want to make this persistent so that you can call MetaRefSGB from other shell instances by adding the line to modify the PATH env variable to your ~/.profile or ~/.bash_profile (if BASH is your default shell).

Once installed, MetaRefSGB will be available on your environment. You can check whether it has been correctly installed by typing the following command in your terminal:

MetaRefSGB --version

This will print the software version of the pipeline on screen if MetaRefSGB has been correctly installed.

You can check whether all the software dependencies and Python modules are available on your system by running the following command in your terminal:

MetaRefSGB --resolve-dependencies

This will also ask you if you want to automatically install the required python modules, but it does not take care of the external software dependencies (i.e. python, pip, checkm, mash, bzip2, bzgrep, and wget).

Warning (for macOS users)

Unfortunately, the checkm-genome conda package is still not fully compatible with macOS because of its software dependency pplacer. For this reason, macOS users are strongly encouraged to build a Docker container by running the following command from the MetaRefSGB root directory in which the Dockerfile is located:

docker build . -t MetaRefSGB

Once the container is built, you can finally open an interactive shell on the Ubuntu based container already configured to run MetaRefSGB:

docker run -it MetaRefSGB

In case you are able to install all the MetaRefSGB dependencies without the help of Docker, please remember that the MetaRefSGB script makes use of the date command in order to take track of the amount of time the whole pipeline requires to process input genomes. Unfortunately, on macOS systems, the date command uses a different syntax to produce a timestamp accurate to the nanoseconds compared to the GNU equivalent command. For this reason, in order to make the script fully compatible also with macOS systems, users should follow the next additional steps.

Use homebrew to install coreutils from your terminal:

brew install coreutils

The GNU equivalent date will be named gdate. At this point, we strongly encourage users to not replace the built-in version of date with the GNU equivalent in order to avoid issues with other software components that rely on the original version of date.

We instead suggest to add an alias with the following command:

alias date='gdate'

In order to make the alias effective every time you open the shell, this last command should be added to you .bashrc profile (in case Bash is your default shell). Please, remember to clone and open a new terminal instance to make this change effective, or reload your .bashrc profile by typing the following command:

source ~/.bashrc