-
Notifications
You must be signed in to change notification settings - Fork 0
Installation
Tejaas is distributed via PyPI. The simplest way to install a Python module from PyPI is with pip
:
pip install tejaas
This will install Tejaas along with all the dependencies. However, we recommend installing numpy
and mpi4py
Python packages before installing Tejaas, to ensure proper linking to linear algebra routines (e.g. MKL, OpenBLAS, etc) and MPI interface (e.g. OpenMPI, MPICH, etc) of the system.
To install Tejaas, you will need to have the following software on your computer:
- Python (version 3.6 or greater).
- Intel MKL library
- any flavor of MPI linked to the Intel MKL library (e.g. OpenMPI)
- Some Python packages, namely
- NumPy / array operations
- SciPy / optimization and other special functions
- statsmodel / used for ECDF calculation in JPA-score
- Pygtrie / used for reading MAF file in RR-score / maf null
- mpi4py / linked to MPI and MKL for python parallelization
- scikit-learn / used for PCA decomposition in KNN correction
Installing Tejaas requires setting up your computing environment to ensure that these components can communicate with each other. We will explain how to do this in the steps below.
To use Tejaas, you must have Python version 3.6 or greater. There are several ways you can install Python >= 3.6.
Our recommendation: Install Python via a conda-based package manager such as Miniconda or Anaconda. If you are starting from scratch, we recommend Miniconda and follow the installation instructions.
Once Miniconda is installed, you can create a Python 3.9 environment using:
conda create -n py39 python=3.9
conda activate py39
Other options: Tejaas will also work with standalone distributions of Python (e.g., downloaded from Python.org), and the instructions below should work regardless of how Python is installed on your computer.
Tejaas is distributed via PyPI. The simplest way to install a Python module from PyPI is with pip
(see documentation). Before running pip
, check that you are running the version bundled with Python >= 3.6:
python --version
pip --version
If the reported Python version is or greater than 3.6.0 and pip
is reported to come from that version (eg. pip 21.2.4 from /path/to/python (python 3.9)
), then you are ready for the next step.
Important: In the instructions below, we assume your Python 3.6+ executable is python
, and your pip (python 3.6+) executable is pip
. However, you might need to replace python
with python3
and pip
with pip3
.
Tejaas uses MPI interface for parallel computation. If you are trying out Tejaas in your local computer, you can install OpenMPI following the build instructions. OpenMPI is generally provided in most remote servers. The command to load the module will be different on different systems. For example, at the GWDG server at MPIBPC, it can be loaded using
module load openmpi
Please contact the system administrator if you are having trouble using OpenMPI.
NumPy is the fundamental package for scientific computing in Python and uses linear algebra routines (e.g. MKL, OpenBLAS, etc.). The core of Tejaas is written in C which is compiled using the same linear algebra routines as used by NumPy. Hence, we recommend installing NumPy using the fastest available linear algebra routine for your system. In the conda defaults channel, NumPy is built against Intel MKL. MKL is a separate package that will be installed in the users' environment when they install NumPy..
conda activate py39
conda install numpy
To enforce NumPy to be built with OpenBLAS, you can use conda install "libblas=*=*openblas" numpy
.
If you want to use some other optimized linear algebra routines of your system, you have to install NumPy from source code; also refer to installation instructions
Note: The NumPy wheels on PyPI, which is what pip installs, are built with OpenBLAS. The OpenBLAS libraries are included in the wheel. This makes the wheel larger, and if a user installs SciPy also from PyPI, they will now have two copies of OpenBLAS on disk. In the conda-forge channel, NumPy is built against a dummy "BLAS" package. When a user installs NumPy from conda-forge, that BLAS package then gets installed together with the actual library - this defaults to OpenBLAS, but it can also be MKL (from the defaults channel), or even BLIS or reference BLAS.
MPI4Py provides Python bindings for the Message Passing Interface (MPI) standard,
allowing Python applications to exploit multiple processors on workstations, clusters and supercomputers.
The mpi4py installation requires proper links to the MPI library of the system.
As of August 2021, the mpi4py package in PyPI
is distributed using source code (instead of build wheels).
Therefore, installing using pip
will build the package from the source code using the MPI library of your system,
and ensure that the correct libraries of the system are linked.
pip install mpi4py
Often the MPI libraries are not in default locations,
and hence building from the wheel using conda
can lead to errors later while using mpi4py
.
Although all dependencies will be automatically installed, you may want to install the dependencies separately for a better control of your Python environment.
conda install scipy scikit-learn statsmodels
To build these libraries with OpenBLAS, use conda install "libblas=*=*openblas" scipy scikit-learn statsmodels
.
Finally, install Tejaas using:
pip install tejaas
To check if the installation was successful, you can run the minimum working example (MWE)
provided in the example
subdirectory of Tejaas source code.
- Clone the repository and change to the
tejaas/example
subdirectory.
git clone git@github.com:soedinglab/tejaas.git
cd tejaas/example/
- Run the MWE bash script, which takes two arguments: (1) output directory and (2) number of CPUs used by
mpirun
.
./run_example.sh <outdir> <ncpu>
The script downloads some example input files in <outdir>/data
and runs Tejaas on <ncpu>
cores.
The output is created in <outdir>/data
.
Note that the data
subdirectory is automatically created within the <outdir>
specified by the user.
For example, a valid command will be: ./run_example.sh . 2
, where the output is created in tejaas/example/data
(.
refers to the current directory) and the code is parallelized on 2 CPUs.
You may also want to see the contents of run_example.sh
to get an illustration of using Tejaas from command line.
- If the example runs successfully, the output can be checked with the provided results
python compare_with_gold.py --outdir <outdir>
This checks if the output matches with the results provided in the example/gold
subdirectory.
This Python snippet code needs to be executed from tejaas/example
directory.
By default, the code looks for output in the current directory, that is tejaas/example/data
.
If you specify an output directory with the flag --outdir <outdir>
,
then the code will search for the output files within <outdir>/data
.
If you specify