Improved Meta-aligner and Minimap2 On Spark.
IMOS is an aligner for mapping noisy long reads to the reference genome. It can be used on a single node as well as on distributed nodes. In its single-node mode, IMOS is an Improved version of Meta-aligner (IM) enhancing both its accuracy and speed. IM is up to 6x faster than the original Meta-aligner. It is also implemented to run IM and Minimap2 on Apache Spark for deploying on a cluster of nodes. Moreover, multi-node IMOS is faster than SparkBWA while executing both IM (1.5x) and Minimap2 (25x)
Hadadian Nejad Youesfi, Mostafa, et al. "IMOS: improved Meta-aligner and Minimap2 On Spark". BMC Bioinformatics. (2019): link.
Contact : m.hadadian@rug.nl
IMOS can be downloaded from here.
Pre-Generated human genome index files can be downloaded from here. (in command line enter hg.fa as index after -REF)
For building index files from an FA file, place SureMap-IndexBuilder and Reference file in FASTA format in the same directory as IMOS.jar. Currently, it is tested on 64 bit Linux.
Usage: java -cp IMOS.jar IndexBuilder [FA File]
FA File : FastA Reference File
Before putting file to the HDFS, use the load balancer to reach better performance. The program will build a .fastm file which is balanced base on the HDFS operations. In case you used this, add -FM in the command when submitting job to spark.
Usage: java -cp IMOS.jar LoadBalancer [aligner] [filename] [node] [isIllumina]
aligner: [mini,meta]
filename [string]: path to the input FastQ file.
node [int]: indicates number of nodes in the cluster
isIllumina: yes, if it is illumina, No or leaving it blank for pacbio
This mode is designed and developed for single node use. When you do not want to use Apache Spark, use this mode.
Usage: java -cp IMOS.jar IM [OPTIONS] -I [inputFQ] -REF [index]
inputFQ: Input reads in FastQ format
index: Index files name built with index builder
OPTIONS:
-C [int]: Number of cores
-ER [float]: Tolerable error rate, 0<=rate<=1
-O [String]: Output file path
-RF [int]: Refine Factor 1<=factor<=10 [default=4]
-X [String]: Sequencer Machine : {"Pacbio","Illumina"}
EXAMPLE: java -cp IMOS.jar IMOSClient -c 4 -x Pacbio -O out.sam -I Read.fq -REF chr19.fa
First, you must set up an apache spark cluster. Note that IMOS can operate on any Spark cluster. It only requires running an IMOSWorker on every Spark worker node. If you want to run Spark locally, we recommend you to use IMOSClient for better performance. When the cluster setup completed, submit IMOS to the Spark cluster. Currently, it is tested on Linux.
Usage: java -cp IMOS.jar IMOSWorker [ALIGNER] [OPTIONS] -REF [INDEX]
Warning: port 7777 and 7778 must be open
Warning: use -Xmx18G for human genome
INDEX:
Index files name built with index builder
ALIGNER:
IM : Improved Meta-aligner
Mini : Minimap2
Third : 3rd party aligner
OPTIONS:
Minimap2:
The arguments give directly to the Minimap2. See its help for more details.
Third:
The arguments give directly to the Third party aligner.
IM:
-C [int]: Number of cores
-ER [float]: Tolerable error rate, 0<=rate<=1
-RF [int]: Refine Factor, 1<=rate<=10 [default=4]
-X [String]: Sequencer Machine : {"Pacbio","Illumina"}
EXAMPLE: java -cp IMOS.jar IMOSWorker im -c 4 -x Pacbio -REF chr19.fa
For compiling Minimap2 in order to work with IMOSWorker, download main.c form here and the minimap2 package from Github. Copy our modified main.c into the main folder of minimap2 downloaded from GitHub and do the rest as before to compile minimap2. Finally, put minimap2 and IMOSWorker in the same directory.
Usage: spark-submit --class IMOS --master [MASTER] --executor-memory 10G --dirver-memory 2G IMOS.jar [ALIGNER] [OPTIONS] -I [inputFQ]
MASTER: Identify Spark Master local, yarn or ip of spark standalone master
inputFQ: Input reads in FastQ format
ALIGNER: IM for Improved Meta-aligner and ThirdParty, Mini for Minimap2</p>
OPTIONS:
-FM : if load balancer is used and the file in the hdfs is a fastm format
Mini:
No Option is required. The options must be set at the worker nodes.</p>
IM:
-ER [float]: Tolerable error rate, 0<=rate<=1
-O [String]: Output file path
-X [String]: Sequencer Machine : {"Pacbio","Illumina"}
EXAMPLE: spark-submit --class IMOS --master local --executor-memory 10G --dirver-memory 2G IMOS.jar IM -X Pacbio -I Read.fq -O out.sam
This work is licensed under a Creative Commons Attribution 4.0 International License.