-
Notifications
You must be signed in to change notification settings - Fork 0
Installing MetaRefSGB
We provide our framework as a conda
package.
It can be easily installed from the bioconda
channel by typing the following command in your shell:
conda install -c bioconda MetaRefSGB
However, you may need to add the bioconda
channel first:
conda config --add channels bioconda
You can also use MetaRefSGB by cloning this repository and running the MetaRefSGB
script. However, we strongly recommend to adopt conda
as the first option because it will automatically solve all the framework software dependencies.
If you do not use conda
, then remember to check that the following external software dependencies are installed and available on your system:
Please, also be sure that the following python dependencies are correctly installed:
- pyyaml (version >= 5.4)
- fastcluster (version >= 1.1.25)
- numpy (version >= 1.16.3)
- pandas (version >= 1.0.1)
- scipy (version >= 1.3.0)
- jsonschema (version >= 3.0.2)
- requests (version >= 2.22.0)
- tqdm (version >= 4.38.0)
- biopython (version >= 1.79)
While Python should be already available on your system if you are using a recent Linux or MacOS distributions, we strongly suggest you to download the most updated precompiled binary of MASH from the releases section of the official GitHub repository. For what concerns CheckM, we suggest you to install it through pip or conda, but it will require in any case a couple of extra steps to correctly link the software to its database. This must be necessarily executed manually as reported on the official CheckM Wiki.
First, you need to download the last available database from the following repository https://data.ace.uq.edu.au/public/CheckM_databases/, decompress it on a dedicated location, and finally inform CheckM about where its database is located by typing:
checkm data setRoot <checkm_data_dir>
Thus, you must download the most updated version of MetaRefSGB from the releases section or clone the whole repository by typing:
git clone https://github.com/SegataLab/MetaRefSGB
You should now move to the MetaRefSGB directory and make the main script executable with:
chmod +x MetaRefSGB
Additionally, we strongly suggest to add the MetaRefSGB root folder to the PATH env variable. This will allow you to call MetaRefSGB even if you are not located on its folder:
PATH=$PATH:~/MetaRefSGB
Please note that you should change ~/MetaRefSGB
with the correct path on which MetaRefSGB is located on your system.
You may also want to make this persistent so that you can call MetaRefSGB from other shell instances by adding the line to modify the PATH
env variable to your ~/.profile
or ~/.bash_profile
(if BASH is your default shell).
Once installed, MetaRefSGB will be available on your environment. You can check whether it has been correctly installed by typing the following command in your terminal:
MetaRefSGB --version
This will print the software version of the pipeline on screen if MetaRefSGB has been correctly installed.
You can check whether all the software dependencies and Python modules are available on your system by running the following command in your terminal:
MetaRefSGB --resolve-dependencies
This will also ask you if you want to automatically install the required python modules, but it does not take care of the external software dependencies (i.e. python
, pip
, checkm
, mash
, bzip2
, bzgrep
, and wget
).
Unfortunately, the checkm-genome
conda package is still not fully compatible with macOS because of its software dependency pplacer
.
For this reason, macOS users are strongly encouraged to build a Docker container by running the following command from the MetaRefSGB root directory in which the Dockerfile
is located:
docker build . -t MetaRefSGB
Once the container is built, you can finally open an interactive shell on the Ubuntu based container already configured to run MetaRefSGB:
docker run -it MetaRefSGB
In case you are able to install all the MetaRefSGB dependencies without the help of Docker, please remember that the MetaRefSGB script makes use of the date
command in order to take track of the amount of time the whole pipeline requires to process input genomes. Unfortunately, on macOS systems, the date
command uses a different syntax to produce a timestamp accurate to the nanoseconds compared to the GNU equivalent command. For this reason, in order to make the script fully compatible also with macOS systems, users should follow the next additional steps.
Use homebrew
to install coreutils
from your terminal:
brew install coreutils
The GNU equivalent date
will be named gdate
. At this point, we strongly encourage users to not replace the built-in version of date
with the GNU equivalent in order to avoid issues with other software components that rely on the original version of date
.
We instead suggest to add an alias with the following command:
alias date='gdate'
In order to make the alias effective every time you open the shell, this last command should be added to you .bashrc
profile (in case Bash is your default shell). Please, remember to clone and open a new terminal instance to make this change effective, or reload your .bashrc
profile by typing the following command:
source ~/.bashrc