-
Notifications
You must be signed in to change notification settings - Fork 39
Installation
The most straightforward method to install MitoZ is by utilizing Apptainer/Singularity and Docker (or Udocker) MitoZ images. These approaches can help you circumvent various potential installation issues. Additionally, the Conda-Pack version could prove beneficial. Unlike the typical Conda installation method, the Conda-Pack tar file contains all the necessary dependencies, eliminating the need to download them separately from the website. This feature is handy if you have limited network connectivity. It's important to note that while Conda is a valuable tool, it may NOT always resolve all dependency problems, which is the reason I have provided various alternative installation methods here.
Here are some tips to ensure a successful installation:
-
After completing the installation, it is recommended to run the test dataset first. This step helps verify that the installation was successful and that no issues are present. You can find instructions for running the test dataset here.
-
If one installation method fails, don't hesitate to try alternative approaches. Experimenting with different methods increases the chances of finding a suitable installation option.
-
If you or your servers prefer not to use Singularity/Docker or encounter difficulties with the Conda installation method, consider attempting the Conda-Pack installation first. This alternative could be a viable solution in such cases. You can find detailed instructions for the Conda-Pack installation here.
Any platform (e.g. Linux, Mac or Windows) on which Docker is able to run should be able to run MitoZ via the MitoZ Docker image. This also applies to Singularity.
Please refer to https://docs.docker.com/.
$ docker pull guanliangmeng/mitoz:3.6
# or
$ docker pull guanliangmeng/mitoz:3.4
-
with the Docker image, you don't need to install the etetoolkit (NCBI Taxonomy) database by yourself, everything has been packaged into the Docker image.
-
In docker, image is different from container, when you run an image, docker actually creates a container based on that image, therefore, you are actually running the newly created container. Multiple containers can be created and run from the same image at the same time, each with a unique container ID (use
docker ps
to check running containers). Therefore, we usually add the--rm
option to delete the containers after we get our analysis done (the original image is still there)
$PWD
is an environmental variable of your current terminal window, its value is the absolute path of your current directory. This means that when you change to another directory, its value will automatically change at the same time.
In the working directory (i.e. $PWD
) (the fastq files should be in there) of your terminal, execute:
# Go to your directory where your raw data (fastq) files are located
$ cd /your/working/directory/
$ ls
sample1.R1.fq.gz sample1.R2.fq.gz
# You can check the value of your current $PWD
$ echo $PWD
$ docker run -v $PWD:$PWD -w $PWD --rm guanliangmeng/mitoz:3.6 mitoz -h
$ docker run -v $PWD:$PWD -w $PWD --rm guanliangmeng/mitoz:3.6 mitoz-tools -h
# For example:
$ docker run -v $PWD:$PWD -w $PWD --rm guanliangmeng/mitoz:3.6 mitoz all --fq1 $PWD/sample1.R1.fq.gz --fq2 $PWD/sample1.R2.fq.gz ...
The -v $PWD:$PWD
here means mounting your current host directory into the $PWD
of the Docker container. Only in this way, can you access the files under the $PWD
of your host machine within the docker container. But within the docker container, we won't be able to access any other files (or soft-links or maybe hard-links) outside the $PWD
directory of your host machine.
Multiple -v
options can be used at the same time, for example, if your fastq files are in /pool/data/
and you are NOT in this directory now but you want to access these files within the Docker container, you can do:
$ docker run -v $PWD:$PWD -v /pool/data/:/pool/data/ -w $PWD --rm guanliangmeng/mitoz:3.6 mitoz -h
# for example:
$ docker run -v $PWD:$PWD -v /pool/data/:/pool/data/ -w $PWD --rm guanliangmeng/mitoz:3.6 mitoz all --fq1 /pool/data/sample1.R1.fq.gz --fq2 /pool/data/sample1.R2.fq.gz ...
Known bugs:
For some reason, the default shell used in the Mitoz 3.5 Docker image is NOT bash
, which leads to the missing annotation of tRNA genes (e.g. https://github.com/linzhi2013/MitoZ/issues/187). So please use either Mitoz 3.6 or 3.4 or 2.3 instead.
Workaround: use the 1.4 methods below and do something before running MitoZ.
I will rebuild the image asap.
In your host working directory (i.e. $PWD
) (the fastq files should be in there and they are NOT soft-links pointing to other directories!!!),
shell into the container:
$ cd /your/working/directory/
$ echo $PWD
$ docker run -v $PWD:$PWD -w $PWD --rm -it guanliangmeng/mitoz:3.4
# for mitoz 3.6, use this:
$ docker run -v $PWD:$PWD -w $PWD --rm -it guanliangmeng/mitoz:3.6 /bin/bash
To learn more about Docker usage, please go to https://docs.docker.com/.
With the Docker image, MitoZ (version 3.4) is installed /app/anaconda/bin/mitoz
and /app/anaconda/lib/python3.9/site-packages/mitoz
:
$ docker run -it -v $PWD:$PWD -w $PWD --rm guanliangmeng/mitoz:3.4
root@cb99de738f74:/Users/gmeng# ls -lhrt /app/anaconda/lib/python3.9/site-packages/mitoz
total 48K
-rw-rw-r-- 2 root root 12 Jun 10 08:54 __init__.py
-rw-rw-r-- 2 root root 3.2K Jun 10 08:54 MitoZ.py
drwxr-xr-x 4 root root 4.0K Jul 1 13:47 annotate
drwxr-xr-x 3 root root 4.0K Jul 1 13:47 utility
drwxr-xr-x 4 root root 4.0K Jul 1 13:47 assemble
drwxr-xr-x 3 root root 4.0K Jul 1 13:47 all
drwxr-xr-x 3 root root 4.0K Jul 1 13:47 visualize
drwxr-xr-x 7 root root 4.0K Jul 1 13:47 tools
drwxr-xr-x 6 root root 4.0K Jul 1 13:47 profiles
drwxr-xr-x 4 root root 4.0K Jul 1 13:47 findmitoscaf
drwxr-xr-x 3 root root 4.0K Jul 1 13:47 filter
drwxr-xr-x 2 root root 4.0K Jul 1 13:47 __pycache__
Or you can find it out by yourself:
$ docker run -it -v $PWD:$PWD -w $PWD --rm guanliangmeng/mitoz:3.6
# Now we enter the docker container:
root@67f2dbb26f08:/# alias ll='ls -lhtr'
root@67f2dbb26f08:/# which mitoz
/usr/local/bin/mitoz
root@67f2dbb26f08:/# which mitoz-tools
/usr/local/bin/mitoz-tools
root@67f2dbb26f08:/# ll /usr/local/lib/python3.9/site-packages/mitoz
total 48K
-rw-rw-r-- 1 root root 12 Jan 6 10:09 __init__.py
-rw-rw-r-- 1 root root 3.6K Jan 6 10:09 MitoZ.py
drwxr-xr-x 3 root root 4.0K Jan 6 10:16 visualize
drwxr-xr-x 3 root root 4.0K Jan 6 10:16 utility
drwxr-xr-x 12 root root 4.0K Jan 6 10:16 tools
drwxr-xr-x 6 root root 4.0K Jan 6 10:16 profiles
drwxr-xr-x 4 root root 4.0K Jan 6 10:16 findmitoscaf
drwxr-xr-x 3 root root 4.0K Jan 6 10:16 filter
drwxr-xr-x 4 root root 4.0K Jan 6 10:16 assemble
drwxr-xr-x 4 root root 4.0K Jan 6 10:16 annotate
drwxr-xr-x 3 root root 4.0K Jan 6 10:16 all
drwxr-xr-x 2 root root 4.0K Jan 6 10:16 __pycache__
And MitoZ's database is at:
root@cb99de738f74:/Users/gmeng# ls -lhrt /app/anaconda/lib/python3.9/site-packages/mitoz/profiles/
total 16K
-rw-rw-r-- 2 root root 0 Jun 10 08:54 __init__.py
drwxr-xr-x 2 root root 4.0K Jul 1 13:47 rRNA_CM
drwxr-xr-x 2 root root 4.0K Jul 1 13:47 MT_database
drwxr-xr-x 2 root root 4.0K Jul 1 13:47 CDS_HMM
drwxr-xr-x 2 root root 4.0K Jul 1 13:47 __pycache__
If you want to copy this database out of the Docker image, do:
$ cd ~
$ mkdir mitoz_custom_db
$ docker run -v $PWD:$PWD -w $PWD --rm -it guanliangmeng/mitoz:3.4
root@cb99de738f74:/Users/gmeng# cp -a /app/anaconda/lib/python3.9/site-packages/mitoz/profiles mitoz_custom_db
$ exit
# This way, the 'profiles' directory is copied to the 'mitoz_custom_db' of your host machine.
# Later, if you want to use the '--profiles_dir' option, you need to use Docker's '-v' option
# to map this host's 'mitoz_custom_db' directory into the Docker container via
$ docker run -v $PWD:$PWD -v ~/mitoz_custom_db:/mitoz_custom_db/ -w $PWD --rm -it guanliangmeng/mitoz:3.4
# Then within the Docker container:
root@cb99de738f74:/Users/gmeng# mitoz --profiles_dir /mitoz_custom_db/profiles <other options>
See also https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ%27s-database.
Unlike, Singularity and Docker, you don't need root/sudo privilege to install or/and run Udocker!!
https://github.com/indigo-dc/udocker
For example,
$ mkdir /home/gmeng/soft/
$ cd /home/gmeng/soft/
$ wget https://github.com/indigo-dc/udocker/releases/download/v1.3.1/udocker-1.3.1.tar.gz
$ tar zxvf udocker-1.3.1.tar.gz
$ export PATH=`pwd`/udocker:$PATH
$ which udocker
/home/gmeng/soft/udocker/udocker
$ udocker install
You can add the udocker
command to your PATH
environmental variable:
$ echo 'export PATH="/home/gmeng/soft/udocker/:$PATH"' >>~/.bashrc
$ source ~/.bashrc
Keep in mind that Udocker installs dependencies (and images) into your ~/.udocker/
directory. If your HOME
directory has limited space, you can move this directory to another place, then use ln -s
command to link it back to your HOME
directory.
Go to https://github.com/indigo-dc/udocker for more details.
$ udocker pull guanliangmeng/mitoz:3.4
# or
$ udocker pull guanliangmeng/mitoz:3.6
$ udocker images
REPOSITORY
guanliangmeng/mitoz:3.4
The usage of Udocker is similar to Docker, simply replace the docker
command with udocker
. Please refer to the above Docker part.
Go to https://indigo-dc.github.io/udocker/user_manual.html for more detail about Udocker usage.
See https://www.sylabs.io/docs/ or https://apptainer.org/ for instructions to install Apptainer/Singularity.
Apptainer was formerly known as Singularity and is now a part of the Linux Foundation. See https://github.com/apptainer/apptainer.
Any platform (e.g. Linux, Mac or Windows) on which Singularity is able to run should be able to run MitoZ via the MitoZ Singularity image. This also applies to Docker.
For the installation of Singularity on Mac or Windows, please refer to https://docs.sylabs.io/guides/3.2/user-guide/installation.html#install-on-windows-or-mac.
Note: according to the official documentation (Oct. 2019), the Singularity must be installed with root privilege. For the non-root installation, please refer to https://docs.sylabs.io/guides/3.6/admin-guide/user_namespace.html#unprivileged-installations, it has some requirements though, and you should ask your IT administrator to help you.
And the Singularity installed via conda (e.g. conda install -c bioconda singularity
) may not work (at least when installing as normal users)!
How about Singularity on Mac and Windows?
MitoZ only runs on Linux systems, although some of its functions can now run on Mac or Windows.
Why do we want to run MitoZ on Mac and Windows? There are two main reasons:
(1) With the two new de novo assemblers and small datasets, it is now possible to perform mitogenome assembly on a Mac or Windows with 16GB or 32GB RAM theoretically;
(2) and actually only the mitoz all
and mitoz assemble
commands need much memory, all the other commands (mitoz filter/findmitoscaf/annotate/visualize or
mitoz-tools`) need very little memory and thus can run on normal Mac or Windows (e.g. with 8GB RAM), and sometimes for these analyses, you do not want to upload the data to a Linux server.
(3) MitoZ can now be installed on Mac via Conda (some assemblers might not work though)
You can download a pre-built Singularity (https://sylabs.io/) image from https://www.dropbox.com/sh/mqjqn656x41q2wb/AAD02t_kUCjNHbBgCeYpEM88a?dl=0 (**only for version 3.4); https://pan.baidu.com/s/1YIULJ9H3BeWKcIZdMZpcuw?pwd=7r9d (提取码:7r9d) (MitoZ version 3.2 and newer versions).
More easily, you can pull the image from the Docker Hub directly (so you can get the latest version).
$ singularity pull MitoZ_v3.6.sif docker://guanliangmeng/mitoz:3.6
# or
$ singularity pull MitoZ_v3.4.sif docker://guanliangmeng/mitoz:3.4
-
FYI. When I tried to run the
MitoZ_v3.5.sif
in a Ubuntu system within the Parallel Desktop on a Mac OS (M1 chip), I got the errorthe image's architecture (amd64) could not run on the host's (arm64)
. -
After downloading MitoZ, you still need to install the etetoolkit (NCBI Taxonomy) database, especially when the automatic installation does not work for you. See 6. The Etetoolkit database section below.
Within the Singularity image,
MitoZ is installed at /app/anaconda/bin/mitoz
and /app/anaconda/lib/python3.9/site-packages/mitoz
. MitoZ's annotation database is at /app/anaconda/lib/python3.9/site-packages/mitoz/profiles
.
$ /path/to/MitoZ_v3.4.sif -h
# For example, to use the `all` subcommand:
$ /path/to/MitoZ_v3.4.sif all -h
# but for MitoZ 3.6, use this:
$ /path/to/MitoZ_3.6.sif mitoz -h
$ /path/to/MitoZ_3.6.sif mitoz all -h
$ /path/to/MitoZ_3.6.sif mitoz-tools -h
# or
$ singularity run /path/to/MitoZ_v3.4.sif -h
$ singularity run /path/to/MitoZ_v3.4.sif all -h
# but for MitoZ 3.6, use this:
$ singularity run /path/to/MitoZ_3.6.sif mitoz -h
$ singularity run /path/to/MitoZ_3.6.sif mitoz all -h
$ singularity run /path/to/MitoZ_3.6.sif mitoz-tools -h
However, if you want to use the mitoz-tools
command, or if you pull the Singularity image from the docker hub, you need to do it this way:
$ singularity exec /path/to/MitoZ_v3.4.sif mitoz
$ singularity exec /path/to/MitoZ_v3.4.sif mitoz-tools
# To use the `all` command of MitoZ with the `exec` command, do this:
$ singularity exec /path/to/MitoZ_v3.4.sif mitoz all -h
# but for MitoZ 3.6, keep using the 'run' command:
$ singularity run /path/to/MitoZ_3.6.sif mitoz all -h
$ singularity run /path/to/MitoZ_3.6.sif mitoz-tools -h
Like Docker, Singularity also has a mounting problem, to solve the problem, we use the --bind
option instead of the -v
as in Docker.
By default, Singularity automatically mounts the $PWD
and $HOME
directories into the Singularity container.
Multiple --bind
options can be used at the same time, for example, if your fastq files are in /pool/data/
and you are NOT in this directory now but you want to access these files within the container, you can do:
$ singularity exec --bind /pool/data/ /path/to/MitoZ_v3.4.sif mitoz all -h
You can also 'shell' into the container, as shown by Usage 2 below.
Warning: You will run into errors if your fastq files (-fq1 -fq2
) are soft links pointing to other directories when you do not explicitly bind these directories to the container. This is because neither Docekr nor Singularity can assess these files. The best way to solve the problem is like this:
$ singularity exec --bind /pool/data/ /path/to/MitoZ_v3.4.sif mitoz all -fq1 /pool/data/sample.R1.fq.gz -fq2 /pool/data/sample.R2.fq.gz
$ mkdir -p /my/workdir/projectID
$ cd /my/workdir/projectID
# The below command assumes your fastq files are located under the `/my/workdir/projectID` directory,
# so within Singularity's shell, you can access these fastq files directly.
$ singularity shell /path/to/MitoZ_v3.4.sif
# After login the container, it is just like you are in another Linux machine,
# so you can use the `mitoz` command directly:
Singularity> which mitoz
/app/anaconda/bin/mitoz
Singularity> mitoz -h
Singularity> mitoz-tools -h
#
# After you finish the analysis, use the `exit` command to exit the container:
Singularity> exit
# However, if your fastq files are located at other places, say `/pool/data/`,
# To let the MitoZ Singularity container can access them, you need to mirror the path into the container using the `--bind` option:
$ cd /my/workdir/projectID
$ singularity shell --bind /pool/data/ /path/to/MitoZ_v3.4.sif
Do this only if you want to customize your PCG annotation database, see https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ%27s-database for more details.
$ mkidr ~/mitoz_custom_db/
$ singularity shell /path/to/MitoZ_v3.4.sif
Singularity> cp -r /app/anaconda/lib/python3.9/site-packages/mitoz/profiles ~/mitoz_custom_db
Singularity> exit
# Now the `profiles` are under the `~/mitoz_custom_db` of your host machine and you can modify them to create your custom database.
The installation of MitoZ via Conda often has missing Perl module problems. If you cannot use the Singularity images nor Docker images methods, you can try this Conda-Pack version. This method is also useful if your server cannot access the Internet (But you still need to install the Etetoolkit taxonomy database by yourself if there is no Internet).
Here I packaged the whole conda environment (including all files) into a file named mitoz3.6.tar.gz
using the Conda-Pack tool (https://conda.github.io/conda-pack/).
I created this environment on a Linux machine, thus it should also work on another Linux machine.
Firstly, can download the mitoz3.6.tar.gz
file from Dropbox (https://www.dropbox.com/sh/x0xn8of73fub1p7/AAA9RCZe9k-rN2WstUn5cUKia?dl=0). 或者从百度云盘下载 (打开 https://pan.baidu.com/s/1uNLIF1SNrkBJp9EoCMoTDQ?pwd=cqz6 找到版本3.6)
Next,
# Choose a directory on your machine for the installation of MitoZ, e.g. '~/soft/mitoz3.6'
$ mkdir -p ~/soft/mitoz3.6
# then unpack mitoz3.6.tar.gz into this target directory
$ tar -xzf /path/to/downloaded/mitoz3.6.tar.gz -C ~/soft/mitoz3.6
# Activate the environment
$ source ~/soft/mitoz3.6/bin/activate
# Cleanup prefixes from the active environment.
# Note that this command can also be run without activating the environment
# as long as some version of Python is already installed on the machine.
(mitoz3.6) $ conda-unpack
# At this point the environment is exactly as if you installed it here
# using conda directly. All scripts should work fine.
(mitoz3.6) $ mitoz -h
(mitoz3.6) $ mitoz-tools -h
# Deactivate the environment to remove it from your path when your finish the MitoZ analysis
(mitoz3.6) $ source ~/soft/mitoz3.6/bin/deactivate
Please refer to https://conda.github.io/conda-pack/ for more details.
Now you can go to install the Etetoolkit database
Firstly, install Miniconda (https://docs.conda.io/en/latest/miniconda.html) (recommended) or Anaconda (https://www.anaconda.com/products/distribution#Downloads) :
$ curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ sh Miniconda3-latest-Linux-x86_64.sh
# setup channels
$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge
$ conda install mamba -n base -c conda-forge # "mamba" is much much faster than the "conda" command!
The Conda version of MitoZ currently only fully functionally runs on Linux.
$ mamba clean -y -a # in case something unknown interferes with our installation
$ conda clean -y -a # in case something unknown interferes with our installation
# It is a good idea to install MitoZ into an independent environment, i.e. 'mitozEnv' here!
$ mamba create -n mitozEnv -c bioconda -c conda-forge mitoz=3.6 # It's recommended to specify the version you want to install!
# Tips: If the above command failed, try this instead:
$ mamba create -n mitozEnv -c bioconda -c conda-forge python=3.8 mitoz=3.6 # It's recommended to specify the version you want to install!
# Note:
# 1. You can use any other name instead of 'mitozEnv' as the environment name, e.g. 'mitoz3.6',
# so you can do 'mamba create -n mitoz3.6 -c bioconda -c conda-forge mitoz=3.6'.
# Personally, I prefer this way, so you can directly see which version of MitoZ you are using by the environment name.
#. But for the convenience of this tutorial, I will keep using the name 'mitozEnv'.
#
# 2. You can also install MitoZ to a specific path,
# like 'mamba create -p /share/pool/guanliang/soft/mitoz3.6 -c bioconda mitoz=3.6',
# and then use 'source activate /share/pool/guanliang/soft/mitoz3.6' to activate the environment.
$ source activate mitozEnv # or use "mamba" or "conda" instead of "source" the command here.
$ circos --modules # check if all Perl modules required by circos are installed. Some modules could still be missing (don't know why conda did not fix them automatically). Similar problems can be seen at https://github.com/bioconda/bioconda-recipes/issues/9830
# Now we are ready to go:
$ mitoz # all subcommands are within this command now!
$ mitoz-tools # some useful tools for mitochondrial genome analysis
Now you can go to install the Etetoolkit database
If you want to find the path where MitoZ is installed, execute:
$ conda env list
# conda environments:
#
base * /home/guanliang/soft/miniconda3
mitozEnv /home/guanliang/soft/miniconda3/envs/mitozEnv
The exact path for me is: /home/guanliang/soft/miniconda3/envs/mitozEnv/lib/python3.7/site-packages/mitoz
. For example, this is the path for MitoZ's database:
$ ll /home/guanliang/soft/miniconda3/envs/mitozEnv/lib/python3.7/site-packages/mitoz/profiles
total 16K
-rw-rw-r-- 2 guanliang 0 May 12 06:47 __init__.py
drwxrwxr-x 2 guanliang 4.0K May 24 16:06 CDS_HMM
drwxrwxr-x 2 guanliang 4.0K May 24 16:06 rRNA_CM
drwxrwxr-x 2 guanliang 4.0K May 24 16:06 __pycache__
drwxrwxr-x 2 guanliang 4.0K May 24 17:36 MT_database
See also Extending MitoZ's database.
Make sure that you are the owner of the conda
/mamba
commands, it happened to me that when I used another user's conda
command I got a lot of trouble. In this case, you can follow the very beginning instruction of this page and install your own Miniconda/Anaconda.
$ conda install mamba -n base -c conda-forge
Collecting package metadata (current_repodata.json): done
Solving environment: /
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:
- defaults/linux-64::python-language-server==0.34.1=py38_0
- defaults/noarch::python-jsonrpc-server==0.3.4=py_1
\ failed with initial frozen solve. Retrying with flexible solve.
Possible solutions:
$ mamba clean -y -a # in case something unknown interferes with our installation
$ conda clean -y -a # in case something unknown interferes with our installation
and try again.
Or, install the mamba
into a separate environment:
$ conda create -n mambaEnv -c conda-forge mamba
# and then use the `mamba` command within this env:
$ conda activate mambaEnv
Finally, you can try to install a new Miniconda (https://docs.conda.io/en/latest/miniconda.html) (recommended) or Anaconda (https://www.anaconda.com/products/distribution#Downloads) at a totally different place.
You can use Google to find the solution that works for you.
Or, you can simply keep using the conda
command (just replace the mamba
with conda
) to install MitoZ, which might cost you extra time though.
After the mamba create -n mitozEnv -c bioconda mitoz
command, you should check if there are some missing Perl modules required by Circos, sometimes they are missing, and I do not know the exact reason.
$ source activate mitozEnv
$ circos --modules # check if all Perl modules required by circos are installed. Some modules could still be missing (don't know why conda did not fix them automatically). Similar problems can be seen at https://github.com/bioconda/bioconda-recipes/issues/9830
# For me, the Perl modules "GD" and "GD::Polyline" were missing (although conda said they have been installed already when I ran 'conda install perl-gd'), I fixed them by running the following three commands:
$ mamba install -c conda-forge pkg-config
$ mamba install -c anaconda gcc_linux-64
$ cpanm install GD
# I will try to fix the circos' problem in bioconda's MitoZ recipe file, but for the moment, please use the above solution, or try the "mitozEnv.yaml" solution below.
# You can use the ”cpanm“ command to install other missing Perl modules if necessary.
-
A user proposed this solution: https://github.com/linzhi2013/MitoZ/issues/152 for the missing GD module problem. You can test this solution and see if it works, and then leave some comments at https://github.com/linzhi2013/MitoZ/issues/152, so the other users and I can know if this is a universal solution. Thanks a lot for helping to improve the software!
-
Another user reported that after she installed a new Conda, the problem got solved without changing the Perl version (i.e. using 5.26).
Finally, if the above methods don't work for you, then don't waste your time on them, try to use the Singularity images or Docker images instead.
Firstly, install Miniconda (https://docs.conda.io/en/latest/miniconda.html) (recommended) or Anaconda (https://www.anaconda.com/products/distribution#Downloads) :
$ curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ sh Miniconda3-latest-Linux-x86_64.sh
# setup channels
$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge
$ conda install mamba -n base -c conda-forge # "mamba" is much much faster than the "conda" command!
The first way:
$ mamba clean -y -a # in case something unknown interferes with our installation
$ conda clean -y -a # in case something unknown interferes with our installation
$ mamba env create -n mitozEnv -f https://github.com/linzhi2013/MitoZ/releases/download/3.6/mitozEnv.yml
$ conda activate mitozEnv
# Note:
# 1. You can use any other name instead of 'mitozEnv' as the environment name, e.g. 'mitoz3.6',
#. so you can do 'mamba env create -n mitoz3.6 -f https://github.com/linzhi2013/MitoZ/releases/download/3.6/mitozEnv.yml'
#
# 2. You can also install MitoZ to a specific path,
# like 'mamba env create -p /share/pool/guanliang/soft/mitoz3.6 -f https://github.com/linzhi2013/MitoZ/releases/download/3.6/mitozEnv.yml',
# and then use 'source activate /share/pool/guanliang/soft/mitoz3.6' to activate the environment.
Then go to section 6.3 below.
The second way:
$ mamba env create -f https://github.com/linzhi2013/MitoZ/releases/download/3.6/mitoz3.6.environment.yml
# which will create an environment named 'mitoz3.6' in your system, and MitoZ has also been installed in it!
# Tips:
# If the above command failed, try
$ conda config --set channel_priority flexible
# and then
$ mamba env create -f https://github.com/linzhi2013/MitoZ/releases/download/3.6/mitoz3.6.environment.yml
# To activate the environment,
$ conda activate mitoz3.6
# If 'conda activate mitoz3.6' does not work for you, you can
$ conda env list
# to list the path of the environment, for example, mine is '/home/gmeng/.conda/envs/mybase/envs/mitoz3.6'
# and then I will do
$ source activate /home/gmeng/.conda/envs/mybase/envs/mitoz3.6
# to activate the environment.
If you are using the second way, you can skip section 6.3 below.
# Next, please download the newest version of MitoZ source code from https://github.com/linzhi2013/MitoZ/releases/
$ pip install ./mitoz-3.6.tar.gz
# or
$ tar -zxvf mitoz-3.6.tar.gz
$ cd mitoz-3.6
$ python3 setup.py install
# Finally, check
$ circos --modules # check if all Perl modules required by circos are installed. Some modules could still be missing (don't know why conda did not fix them automatically). Similar problems can be seen at https://github.com/bioconda/bioconda-recipes/issues/9830
Now you can go to install the Etetoolkit database
If you want to find the path where MitoZ is installed, execute:
$ conda env list
# conda environments:
#
base * /home/guanliang/soft/miniconda3
mitozEnv /home/guanliang/soft/miniconda3/envs/mitozEnv
The exact path for me is: /home/guanliang/soft/miniconda3/envs/mitozEnv/lib/python3.7/site-packages/mitoz
.
The newest version may not always be available on bioconda, because it takes time for the bioconda team to incorporate a new version of software into the bioconda channel, besides, the Bioconda website often does not show the latest available versions or builds. Thus, in this case, you may want to check the https://anaconda.org/bioconda/mitoz/files or use the second method for installation.
-
After the installation of MitoZ, you still need to install the etetoolkit (NCBI Taxonomy) database, especially when the automatic installation does not work for you.
-
Warning: it is reported that a broken etetoolkit (NCBI Taxonomy) database would result in some PCGs not annotated (https://github.com/linzhi2013/MitoZ/issues/89), or MitoZ getting "Error" during the run (e.g. during the
findmitoscaf
step). Thus, please make sure this database works well before running MitoZ. -
It is recommended to run the test dataset before applying MitoZ to your own samples, just to make sure your installation is okay. See 8. Running the test dataset.
-
Make sure your HOME directory has more than 700 MB of space available. Otherwise, you may get some error like
sqlite3.OperationalEoor: disk I/O error
. To solve the problem, do this first:- Create a directory somewhere else that has enough space left:
$ mkdir /other/place/myetetoolkit
- Remove the directory
~/.etetoolkit
created by ete3 before (if any):
$ rm -rf ~/.etetoolkit
- Link your new directory to the HOME directory:
$ ln -s /other/place/myetetoolkit ~/.etetoolkit
- Follow the instructions below.
Unless you install MitoZ via the Docker method, otherwise you always need further to install the etetoolkit database.
Firstly try:
$ conda activate mitozEnv
$ python3
Python 3.9.7 (default, Sep 16 2021, 08:50:36)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from ete3 import NCBITaxa
>>> ncbi = NCBITaxa()
>>> exit()
If you are using the Singularity image, you need to shell into the container first:
$ singularity shell /path/to/MitoZ_v3.4.sif
Singularity> python3
Singularity> Python 3.9.7 (default, Sep 16 2021, 08:50:36)
Singularity> [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Singularity> Type "help", "copyright", "credits" or "license" for more information.
Singularity> >>> from ete3 import NCBITaxa
Singularity> >>> ncbi = NCBITaxa()
Singularity> >>> exit()
Singularity> exit
Now verify the database:
$ conda activate mitozEnv
# or shell into the singularity container:
# $ singularity shell /path/to/MitoZ_v3.4.sif
$ python3
>>> from ete3 import NCBITaxa
>>> a = NCBITaxa()
>>> a.get_name_translator(["Arthropoda"])
{'Arthropoda': [6656]}
If the above works for you, then you are finished and can go to 6. Running the test dataset. Otherwise, please read the below instructions.
If you have trouble downloading and installing the Etetoolkit database, you can download the taxdump.tar.gz
file or my pre-built database from https://www.dropbox.com/sh/mqjqn656x41q2wb/AAD02t_kUCjNHbBgCeYpEM88a?dl=0
or from https://pan.baidu.com/s/1YIULJ9H3BeWKcIZdMZpcuw?pwd=7r9d (提取码:7r9d).
Then execute:
$ conda activate mitozEnv
$ python3
Python 3.9.7 (default, Sep 16 2021, 08:50:36)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from ete3 import NCBITaxa
>>> ncbi = NCBITaxa(taxdump_file='/path/to/downloaded/taxdump.tar.gz')
Loading node names...
2424313 names loaded.
277227 synonyms loaded.
Loading nodes...
2424313 nodes loaded.
Linking nodes...
Tree is loaded.
Updating database: /Users/gmeng/.etetoolkit/taxa.sqlite ...
2424000 generating entries...
Uploading to /Users/gmeng/.etetoolkit/taxa.sqlite
Inserting synonyms: 275000
Inserting taxid merges: 65000
Inserting taxids: 2420000
>>> exit()
$ ls -lhrt ~/.etetoolkit/
total 1171272
-rw-r--r-- 1 gmeng staff 12M Jun 2 11:35 taxa.sqlite.traverse.pkl
-rw-r--r-- 1 gmeng staff 558M Jun 2 11:36 taxa.sqlite
However, if you got something like this:
>>> from ete3 import NCBITaxa
>>> ncbi = NCBITaxa(taxdump_file='taxdump.tar.gz')
Loading node names...
2424313 names loaded.
277265 synonyms loaded.
Loading nodes...
2424313 nodes loaded.
Linking nodes...
Tree is loaded.
Updating database: /home/guanliang/.etetoolkit/taxa.sqlite ...
2424000 generating entries...
Uploading to /home/guanliang/.etetoolkit/taxa.sqlite
Inserting synonyms: 75000 Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/app/anaconda/lib/python3.9/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 106, in __init__
self.update_taxonomy_database(taxdump_file)
File "/app/anaconda/lib/python3.9/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 131, in update_taxonomy_database
update_db(self.dbfile, taxdump_file)
File "/app/anaconda/lib/python3.9/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 760, in update_db
upload_data(dbfile)
File "/app/anaconda/lib/python3.9/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 802, in upload_data
db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname))
sqlite3.IntegrityError: UNIQUE constraint failed: synonym.spname, synonym.taxid
There are some bugs in the ETE 3.1.1 package, you got this problem because you installed MitoZ (<= 3.3) via conda
or mamba
commands, and unfortunately, at the early builds on BioConda, I wrongly specified ete3=3.1.1
, I should use ete3>=3.1.2
instead.
In this case, you can download my pre-build version of the etetoolkit database (filename: etetoolkit.tgz
) from https://www.dropbox.com/sh/mqjqn656x41q2wb/AAD02t_kUCjNHbBgCeYpEM88a?dl=0
or from https://pan.baidu.com/s/1YIULJ9H3BeWKcIZdMZpcuw?pwd=7r9d (提取码:7r9d), and then:
$ mv /path/to/etetoolkit.tgz ~
$ cd ~
$ rm -rf ~/.etetoolkit
$ tar -zxvf etetoolkit.tgz
OR, you can upgrade MitoZ
- via the
mamba env create -n mitozEnv -f mitozEnv.yaml
method (see the beginning) - via the
mamba create -n mitozEnv -c bioconda mitoz=3.4
command (see the beginning) - get out of the
mitozEnv
environment (conda deactivate mitozEnv
), then install an ete3 in this 'base' environment viamamba install -c conda-forge ete3>=3.1.2
. And then, use the Python and ete3 in this 'base' environment to create the etetoolkit database by following the beginning part of 4-the-etetoolkit-database.
When you want to upgrade the etetoolkit database you can check this. See http://etetoolkit.org/docs/latest/tutorial/tutorial_ncbitaxonomy.html#upgrading-the-local-database
$ python3
Python 3.9.7 (default, Sep 16 2021, 08:50:36)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from ete3 import NCBITaxa
>>> ncbi = NCBITaxa()
>>> ncbi.update_taxonomy_database()
>>> exit()
Or you can also download the latest file from https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
,
$ wget -c https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
$ python3
Python 3.9.7 (default, Sep 16 2021, 08:50:36)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from ete3 import NCBITaxa
>>> ncbi = NCBITaxa(taxdump_file='/path/to/downloaded/taxdump.tar.gz')
>>> exit()
-
There is a bug for the
--data_size_for_mt_assembly
option.--data_size_for_mt_assembly 5
actually means to extract 50 Gb data. So if you want only 5 Gb data, use--data_size_for_mt_assembly 0.5
instead! -
For MitoAssemble (
--assembler mitoassemble
), using 8 to 16 threads + 2 to 8 G bp fastq data is good enough, for example--thread_number 8
, or--thread_number 12
. A bigger thread could take a lot of RAM (e.g. 150 GB) for the assembly step.- More data does not necessarily mean better mitogenome
- Too many threads do not necessarily mean faster.
-
For Megahit (
--assembler megahit
),-
When tested 5 Gbp data with 4 threads and set
--memory 20
, megahit actually took up to 32 G RAM. -
When tested 15 G bp fastq data with 16 threads, and set
--memory 50
, and it took around 50 GB RAM, so you can use more data and threads with Megahit. -
While memory usage with more data (e.g. 15G bp) seems not to be a big problem for Megahit, using more data does take more time, so it is recommended to use fewer data to save time.
-
You can also increase
--memory
usage to save time if your servers have enough RAM.
-
-
For Spades (
--assembler spades
), I did not record the RAM usage, which may be similar to Megahit?
How to check how much RAM MitoZ uses?
You can use the top
or htop
(recommended; https://anaconda.org/conda-forge/htop) to check how many resources MitoZ uses if you are running MitoZ on your server; or you can use the qstat
command if you are using an SGE cluster.
Before applying MitoZ to your own samples, it is important to run MitoZ on the test dataset.
$ mkdir ~/test
$ cd ~/test
$ wget -c https://raw.githubusercontent.com/linzhi2013/MitoZ/master/test/test.R1.fq.gz
$ wget -c https://raw.githubusercontent.com/linzhi2013/MitoZ/master/test/test.R2.fq.gz
$ conda activate mitozEnv
$ mitoz all \
--outprefix test \
--thread_number 4 \
--clade Chordata \
--genetic_code 2 \
--species_name "Homo sapiens" \
--fq1 test.R1.fq.gz \
--fq2 test.R2.fq.gz \
--fastq_read_length 151 \
--data_size_for_mt_assembly 3,0 \
--assembler megahit \
--kmers_megahit 71 99 \
--memory 50 \
--requiring_taxa Chordata
The above command takes around 12 minutes and 1.1 GB RAM to finish.
You can then analyze your samples by following the Tutorial
About:
Commands:
- The -all- subcommand
- The -filter- subcommand
- The -assemble- subcommand
- The -findmitoscaf- subcommand
- The -annotate- subcommand
- The -visualize- subcommand
Usages:
- Installation
- Tutorial
- Extending MitoZ-s database
- Batch processing of many samples
- Known issues
- FAQ
- Some important intermediate files
- Upload to GenBank
MitoZ-tools:
- Overview: The -mitoz tools- command
- The -mitoz-tools--group_seq_by_gene- command
- The -mitoz tools bold_identification- command
- The -mitoz tools circle_check- command
- The -mitoz tools gbfiletool- command
- The -mitoz tools gbseqextractor- command
- The -mitoz tools msaconverter- command
- The -mitoz tools taxonomy_ranks- command