Another Bisulfite Mapping Algorithm (abismal) is a read mapping program for bisulfite sequencing in DNA methylation studies.
Download the latest stable release here.
See how to get started and the full program documentation.
Currently abismal requires a C++ compiler that supports the C++11 standard and OpenMP. The default compiler assumed is g++ (comes with GCC, available on your Linux or macOS machine). The g++ compiler has supported the C++11 standard since roughly 2012 (GCC 4.7) so this should not cause any problems. It also requires an OMP library and headers to be available, which rarely causes problems. Instructions to get HTSlib, for macOS or Linux systems, can be found below.
If you have trouble with the make
part of the installation procedure
described below, please contact us via e-mail or through a GitHub
issue.
The full documentation for abismal can be found here. This explains the use of each parameter in full detail. Below, after installation instructions, we describe the most common use cases: indexing a genome and mapping single-end and paired-end reads.
These instructions are for building abismal from source, rather than obtaining it through a package manager like conda.
These instructions assume you have access to apt
which is installed
on Ubuntu-based and Debian-based distributions. The only difference
for other linux distributions is how you get the dependencies. Likely
all you need is:
$ sudo apt-get install -y libhts-dev
If you don't have adminstrator privileges, there are other options.
If you have the libhts-dev
installed, to build abismal
the
following should work:
$ wget https://github.com/smithlabcode/abismal/releases/download/v3.2.4/abismal-3.2.4.tar.gz
$ tar -zxvf abismal-3.2.4.tar.gz
$ cd abismal-3.2.4
$ mkdir build && cd build
$ ../configure --prefix=/where/you/want/abismal
$ make
$ make install
Be sure that you have permissions to write files to
/where/you/want/abismal
. This will install abismal
, abismalidx
and simreads
inside the bin
directory of the specified location.
The GitHub repo for abismal includes tests that run on macOS 13 (Ventura), and we use the following steps. Although our tests begin with a "fresh" macOS installation, they have certain tools already available. In particular, Homebrew is already available and possibly some other tools. Homebrew is necessary as the first step to get the tools and dependencies:
$ brew update
$ brew install gcc
$ brew install htslib gsl
$ brew list --versions gcc
At this point, keep the version of gcc
in mind, because it will probably
be needed below. If you don't already have abismal
downloaded, the next
step is to download it. Here we will assume you are using a release rather
than a clone. To build from a clone involves at least one more step.
$ wget https://github.com/smithlabcode/abismal/releases/download/v3.2.4/abismal-3.2.4.tar.gz
$ tar -zxvf abismal-3.2.4.tar.gz
$ cd abismal-3.2.4
Finally, these steps build the software:
$ mkdir build && build
$ ../configure \
--prefix=/path/to/install \
CXX="g++-13" \
CPPFLAGS="-I$(brew --prefix)/include" \
LDFLAGS="-L$(brew --prefix)/lib"
$ make
$ make install
Notice the g++-13
in the ../configure
command. This is the version
number referenced above. If you have a different version number (e.g.,
when gcc-14 is the default), you will need to update that number to
correspond to the major version number. Be sure you have permissions
to write to the directory /path/to/install
.
If you are on linux and do not have adminstrator privileges to get the dependencies (e.g., HTSlib), you can get them either by building them directly from source, or through conda. In particular, for obtaining HTSlib through conda, do the following:
$ conda install -c bioconda htslib
as explained here at htslib.
I used conda obtained through miniconda3, which means the default
location for HTSlib to be installed is ~/miniconda3
and then inside
the lib
and include
subdirectores. So once this is done, you can
build abismal
by replacing the configure
step in the earlier
explanations by
../configure --prefix=/path/to/install \
CPPFLAGS="-I${HOME}/miniconda3/include" \
LDFLAGS="-L${HOME}/miniconda3/lib"
Note that you can use this approach with both Linux or macOS, but in
the case of macOS you can replace the LDFLAGS
and CPPFLAGS
for
conda, but keep the CXX
variable. Remember not to use tilde (~
) in
place of the ${HOME}
variable above. It might work, but shouldn't.
This method is likely only useful if you need the most recent update,
and is not recommended for most users. The only difference from the
above explanations for linux and macos is that you will need to clone
the repo, which means you need git
installed, and you will also need
to build the sources in place and without much reporting in case of any
problems.
$ cd /where/you_want/the_code
$ git clone --recursive git@github.com:smithlabcode/abismal.git
$ cd abismal
$ make
$ make install
If you are building from the source in a cloned repo, you will likely see other ways to accomplish it by examining the files in the root of the repo.
The index can be constructed as follows, based on a genome existing entirely in a single FASTA format file:
$ abismalidx <genome.fa> <index-file>
single-end reads
$ abismal [options] -i <index-file> -o <output-file> <reads.fq>
paired-end reads
$ abismal [options] -i <index-file> -o <output-file> <read_1.fq> <read_2.fq>
option | long version | arg type | default | description |
---|---|---|---|---|
-i | -index | string | genome index file | |
-g | -genome | string | genome file (FASTA) | |
-o | -outfile | string | output file (default SAM format) | |
-s | -stats | string | mapping statistics output file (YAML) | |
-c | -max-candidates | integer | 100 | max candidates per seed* |
-l | -min-frag | integer | 32 | minimum fragment length (PE mode) |
-L | -max-frag | integer | 3000 | maximum fragment length (PE mode) |
-m | -max-distance | double | 0.1 | max relative number of errors |
-a | -ambig | boolean | report a position for ambiguous reads | |
-P | -pbat | boolean | input follows the PBAT protocol | |
-R | -random-pbat | boolean | input follows the random PBAT protocol | |
-A | -a-rich | boolean | reads are A-rich (SE mode) | |
-t | -threads | integer | 1 | number of mapping threads |
-v | -verbose | boolean | print more run info | |
-B | -bam | boolean | output SAM format | write output in BAM format |
* the max candidates parameter controls the amount of "effort" in
mapping. In the "sensitive" step, which aligns reads with smaller
exact match seeds, abismal skips seeds that retrieves more than c
candidates. The higher the value of c
, the more alignments abismal
performs. Note that abismal still aligns reads to every exact match
hit that spans more than half of the read ("specific step"). The
specific step does not change with the value set by c
.
(1) Indexing the genome
To make an index for hg38:
$ abismalidx hg38.fa hg38.abismalidx
In the process of building the index, the names of chromosomes will be truncated at the first whitespace character.
(2) Bisulfite Mapping
To map single-end reads in file reads.fq
to human genome hg38:
$ abismal -i hg38.abismalidx -o reads.sam reads.fq
To map paired-end reads in files reads-1.fq
and reads-2.fq
to human genome hg38:
$ abismal -i hg38.abismalidx -o reads.sam reads-1.fq reads-2.fq
To map reads in BAM format:
$ abismal -B -i hg38.abismalidx -o reads.bam reads.fq
To map reads to human genome without requiring a separate index file (i.e. run both indexing and mapping simultaneously):
$ abismal -g hg38.fa -o reads.sam reads.fq
Mapping results are reported in SAM format. Some choices in the output are explicitly highlighted below:
- Reads are output identically to how they were read, regardless of mapped strand.
- the
NM
tag reports the edit distance between the read and the output, specifically the sum of mismatches, insertions and deletions to the best mapping position. - The
CV
tag reports the assumed bisulfite base used to map the read. Reads mapped as A-rich will be reported withCV:A:A
, and reads mapped as T-rich will be reported withCV:A:T
. This tag is independent of the strand the read was mapped to. If reads are not mapped in PBAT or random PBAT mode, the first end will always be T-rich and the second end will always be A-rich.
Andrew D Smith andrewds@usc.edu Guilherme Sena desenabr@usc.edu
The abismal manuscript is available
here.
If you used abismal
to analyze your data, please cite us as follows.
de Sena Brandine, G., & Smith, A. D. (2021).
Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet.
NAR Genomics and Bioinformatics, 3(4), lqab115.
Copyright (C) 2018-2023 Andrew D. Smith and Guilherme de Sena Brandine
Authors: Andrew D. Smith and Guilherme de Sena Brandine
abismal is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
abismal is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.