Skip to content

Installation

Saikat Banerjee edited this page Aug 30, 2021 · 25 revisions

Tejaas Installation Guide

Quick start

Tejaas is distributed via PyPI. The simplest way to install a Python module from PyPI is with pip:

pip install tejaas

This will install Tejaas along with all the dependencies. However, we recommend installing the dependencies separately (for easy maintenance).

Overview of Tejaas dependencies

To install Tejaas, you will need to have the following software on your computer:

  1. Python (version 3.6 or greater).
  2. Intel MKL library
  3. any flavor of MPI linked to the Intel MKL library (e.g. OpenMPI)
  4. Some Python packages, namely
  • NumPy / array operations
  • SciPy / optimization and other special functions
  • statsmodel / used for ECDF calculation in JPA-score
  • Pygtrie / used for reading MAF file in RR-score / maf null
  • mpi4py / linked to MPI and MKL for python parallelization
  • scikit-learn / used for PCA decomposition in KNN correction

Installing Tejaas requires setting up your computing environment to ensure that these components can communicate with each other. We will explain how to do this in the steps below.

1. Install python >= 3.6

To use Tejaas, you must have Python version 3.6 or greater. There are several ways you can install Python >= 3.6.

Our recommendation: Install Python via a conda-based package manager such as Miniconda or Anaconda. If you are starting from scratch, we recommend Miniconda and follow the installation instructions.

Once Miniconda is installed, you can create a Python 3.9 environment using:

conda create -n py39 python=3.9
conda activate py39

Other options: Tejaas will also work with standalone distributions of Python (e.g., downloaded from Python.org), and the instructions below should work regardless of how Python is installed on your computer.

2. Check your Python installation

Tejaas is distributed via PyPI. The simplest way to install a Python module from PyPI is with pip (see documentation). Before running pip, check that you are running the version bundled with Python >= 3.6:

python --version
pip --version

If the reported Python version is or greater than 3.6.0 and pip is reported to come from that version (eg. pip 21.2.4 from /path/to/python (python 3.9)), then you are ready for the next step.

Important: In the instructions below, we assume your Python 3.6+ executable is python, and your pip (python 3.6+) executable is pip. However, you might need to replace python with python3 and pip with pip3.

3. MPI interface

Tejaas uses MPI interface for parallel computation. If you are trying out Tejaas in your local computer, you can install OpenMPI following the build instructions. OpenMPI is generally provided in most remote servers. The command to load the module will be different on different systems. For example, at the GWDG server at MPIBPC, it can be loaded using

module load openmpi

Please contact the system administrator if you are having trouble using OpenMPI.

3. Install dependencies

3a. numpy, scipy, scikit-learn, statsmodels

We activate the newly created environment, and install the required dependencies

conda activate py39
conda install numpy scipy scikit-learn statsmodels

The functions in numpy and scipy requires linear algebra routines. By default, MKL is installed by conda. If you want to use OpenBLAS for the linear algebra routines, you can use conda install "libblas=*=*openblas" numpy scipy scikit-learn statsmodels, but it is not recommended. Tejaas will automatically use whichever linear algebra routine is used by numpy.

3b. mpi4py

The mpi4py installation requires proper links to the MPI library of the system. We recommend installing the latest mpi4py package from the PyPI repository using pip, because this will build the package from the source code using the MPI library of your system, and ensure that the correct libraries of the system is linked.

pip install mpi4py

Often the MPI libraries are in non-default location, and hence building from the wheel using conda can lead to errors later while using mpi4py. Some users have also reported problems using the Anaconda repository.

4. Install Tejaas

Finally, we can install Tejaas using:

pip install tejaas

5. Check your installation

To check if the installation was successful, you can run the minimum working example (MWE) provided in the example subdirectory of Tejaas source code.

  • Clone the repository and change to the tejaas/example subdirectory.
git clone git@github.com:soedinglab/tejaas.git
cd tejaas/example/
  • Run the MWE bash script, which takes two arguments: (1) output directory and (2) number of CPUs used by mpirun.
./run_example.sh <outdir> <ncpu>

For example, a valid command will be: ./run_example.sh . 2. Open the file run_example.sh to get an illustration of using Tejaas from command line. The script downloads some example input files in <outdir>/data and runs Tejaas on <ncpu> cores. The output is created in <outdir>/data.

  • If the example runs successfully, the output can be checked with the provided results
python compare_with_gold.py

This checks if the output matches with the results provided in the example/gold subdirectory. Note: To rerun the example, the output directory named example/data has to be deleted.