Skip to content

Parallelization

Victoria edited this page Apr 14, 2023 · 1 revision

Parallelization with Joblib

To run makestructuraldb and mapper in parallel, activate the option -p or --parallel and specify the number of cores you want to use at the same time with -j or --jobs.

Parallelization in a Computer Cluster

In cluster computing, parallelization with Joblib may not be the optimal approach. Instead, you can submit individual tasks as a strategy of parallelization using job arrays or greasy, where the results are appended to the same output file. The option for parallelization with job arrays or greasy is available for makestructuraldb and mapper in 3Dmapper. Below are two examples of how to run this:

makestructuraldb -pdb file1.pdb --blast_db target_proteome_db
makestructuraldb -pdb file2.pdb --blast_db target_proteome_db
makestructuraldb -pdb file3.pdb --blast_db target_proteome_db
...
makestructuraldb -pdb fileN.pdb --blast_db target_proteome_db
mapper -pid protID1  -psdb structuralDB -vdb varDB -ids dict.txt  -csv -l
mapper -pid protID2  -psdb structuralDB -vdb varDB -ids dict.txt  -csv -l
mapper -pid protID3  -psdb structuralDB -vdb varDB -ids dict.txt  -csv -l
...
mapper -pid protIDN  -psdb structuralDB -vdb varDB -ids dict.txt  -csv -l

Parallelization with GNU parallel

The splitting process of makevariantsdb can be greatly improved by using GNU parallel, a powerful command-line tool for executing shell commands in parallel. You can activate this option by adding -p or --parallel to your command and specifying the number of cores you want to use at the same time with -j or --jobs.

To use GNU parallel with makevariantsdb, you need to manually install it first. Here's how you can install it on Ubuntu or Debian:

sudo apt-get update
sudo apt-get install parallel

Once you have installed GNU parallel, you can use it to split your input files into smaller chunks that can be processed in parallel by makevariantsdb. This can significantly speed up the overall process, especially for larger input files.

Here's an example of how to use GNU parallel with makevariantsdb:

makevariantsdb -vf variants.vep -p -j 4

In this example, the -p option activates parallel processing, and -j 4 specifies that we want to use 4 cores at the same time. You can adjust this number based on the available resources on your system.

Note that you need to have a multi-core CPU or access to a high-performance computing cluster to take advantage of parallel processing.

Clone this wiki locally