Skip to content

Installation

Saikat Banerjee edited this page Sep 3, 2021 · 25 revisions

Tejaas Installation Guide

Quick start

Tejaas is distributed via PyPI. The simplest way to install a Python module from PyPI is with pip:

pip install tejaas

This will install Tejaas along with all the dependencies. However, we recommend installing numpy and mpi4py Python packages before installing Tejaas, to ensure proper linking to linear algebra routines (e.g. MKL, OpenBLAS, etc) and MPI interface (e.g. OpenMPI, MPICH, etc) of the system.

Overview of Tejaas dependencies

To install Tejaas, you will need to have the following software on your computer:

  1. Python (version 3.6 or greater).
  2. Intel MKL library
  3. any flavor of MPI linked to the Intel MKL library (e.g. OpenMPI)
  4. Some Python packages, namely
  • NumPy / array operations
  • SciPy / optimization and other special functions
  • statsmodel / used for ECDF calculation in JPA-score
  • Pygtrie / used for reading MAF file in RR-score / maf null
  • mpi4py / linked to MPI and MKL for python parallelization
  • scikit-learn / used for PCA decomposition in KNN correction

Installing Tejaas requires setting up your computing environment to ensure that these components can communicate with each other. We will explain how to do this in the steps below.

1. Install python >= 3.6

To use Tejaas, you must have Python version 3.6 or greater. There are several ways you can install Python >= 3.6.

Our recommendation: Install Python via a conda-based package manager such as Miniconda or Anaconda. If you are starting from scratch, we recommend Miniconda and follow the installation instructions.

Once Miniconda is installed, you can create a Python 3.9 environment using:

conda create -n py39 python=3.9
conda activate py39

Other options: Tejaas will also work with standalone distributions of Python (e.g., downloaded from Python.org), and the instructions below should work regardless of how Python is installed on your computer.

2. Check your Python installation

Tejaas is distributed via PyPI. The simplest way to install a Python module from PyPI is with pip (see documentation). Before running pip, check that you are running the version bundled with Python >= 3.6:

python --version
pip --version

If the reported Python version is or greater than 3.6.0 and pip is reported to come from that version (eg. pip 21.2.4 from /path/to/python (python 3.9)), then you are ready for the next step.

Important: In the instructions below, we assume your Python 3.6+ executable is python, and your pip (python 3.6+) executable is pip. However, you might need to replace python with python3 and pip with pip3.

3. MPI interface

Tejaas uses MPI interface for parallel computation. If you are trying out Tejaas in your local computer, you can install OpenMPI following the build instructions. OpenMPI is generally provided in most remote servers. The command to load the module will be different on different systems. For example, at the GWDG server at MPIBPC, it can be loaded using

module load openmpi

Please contact the system administrator if you are having trouble using OpenMPI.

4. Install dependencies

4a. NumPy

NumPy is the fundamental package for scientific computing in Python and uses linear algebra routines (e.g. MKL, OpenBLAS, etc.). The core of Tejaas is written in C which is compiled using the same linear algebra routines as used by NumPy. Hence, we recommend installing NumPy using the fastest available linear algebra routine for your system. In the conda defaults channel, NumPy is built against Intel MKL. MKL is a separate package that will be installed in the users' environment when they install NumPy..

conda activate py39
conda install numpy

To enforce NumPy to be built with OpenBLAS, you can use conda install "libblas=*=*openblas" numpy.

If you want to use some other optimized linear algebra routines of your system, you have to install NumPy from source code; also refer to installation instructions

Note: The NumPy wheels on PyPI, which is what pip installs, are built with OpenBLAS. The OpenBLAS libraries are included in the wheel. This makes the wheel larger, and if a user installs SciPy also from PyPI, they will now have two copies of OpenBLAS on disk. In the conda-forge channel, NumPy is built against a dummy "BLAS" package. When a user installs NumPy from conda-forge, that BLAS package then gets installed together with the actual library - this defaults to OpenBLAS, but it can also be MKL (from the defaults channel), or even BLIS or reference BLAS.

4b. MPI4Py

MPI4Py provides Python bindings for the Message Passing Interface (MPI) standard, allowing Python applications to exploit multiple processors on workstations, clusters and supercomputers. The mpi4py installation requires proper links to the MPI library of the system. As of August 2021, the mpi4py package in PyPI is distributed using source code (instead of build wheels). Therefore, installing using pip will build the package from the source code using the MPI library of your system, and ensure that the correct libraries of the system are linked.

pip install mpi4py

Often the MPI libraries are not in default locations, and hence building from the wheel using conda can lead to errors later while using mpi4py.

4c. Other dependencies

All dependencies will be automatically installed. However, if you are using a package / environment manager like conda, you may want a better control of your Python environment by installing the dependencies separately.

conda install scipy scikit-learn statsmodels

To build these libraries with OpenBLAS, use conda install "libblas=*=*openblas" scipy scikit-learn statsmodels.

5. Install Tejaas

Finally, you can install the latest stable version of Tejaas using

pip install tejaas

Development version for users

If you want the latest development version from Github, you can install using

pip install git+https://github.com/soedinglab/tejaas.git

For developers

If you want to make changes to the code for development, you can clone the repository and install Tejaas in editable mode from the local path

git clone git@github.com:soedinglab/tejaas.git
cd tejaas
pip install -e .

See pip documentation for -e or --editable flag. Any changes to the Tejaas Python code will be available in real time. For changes to the C code to take effect, it has to be reinstalled.

6. Check your Tejaas installation

To check if the installation was successful, you can run the minimum working example (MWE) provided in the example subdirectory of Tejaas source code.

  • Clone the repository and change to the tejaas/example subdirectory.
git clone git@github.com:soedinglab/tejaas.git
cd tejaas/example/
  • Run the MWE bash script, which takes two arguments: (1) output directory and (2) number of CPUs used by mpirun.
./run_example.sh <outdir> <ncpu>

The script downloads some example input files in <outdir>/data and runs Tejaas on <ncpu> cores. The output is created in <outdir>/data. Note that the data subdirectory is automatically created within the <outdir> specified by the user. For example, a valid command will be: ./run_example.sh . 2, where the output is created in tejaas/example/data (. refers to the current directory) and the code is parallelized on 2 CPUs. You may also want to see the contents of run_example.sh to get an illustration of using Tejaas from command line.

  • If the example runs successfully, the output can be checked with the provided results
python compare_with_gold.py --outdir <outdir>

This checks if the output matches with the results provided in the example/gold subdirectory. This Python snippet code needs to be executed from tejaas/example directory. If you specify an output directory with the flag --outdir <outdir>, then the code will search for the output files within <outdir>/data. If you do not specify any output directory, then by default, the code looks for output in the current directory, that is tejaas/example/data.