WFA-GPU is a CUDA library for pairwise gap-affine global DNA sequence alignment on Nvidia GPUs. It implements the WFA algorithm
Make sure you have installed an up-to-date CUDA toolkit, and a CUDA-capable device (i.e. an NVidia GPU). To compile the library and tools, run the following commands:
git clone git@github.com:quim0/WFA-GPU.git && \
cd WFA-GPU && \
./build.sh
The build.sh
script notifies if there is any missing necessary software for compiling the library and the tools.
WFA-GPU comes with a tool to test its functionality, it is compiled (with the instruction in the section "Build") to the bin/wfa.affine.gpu
binary.
Some usage examples are the following, note that depending on the amount of memory available on the GPU, batch size may be increased or decreased:
.seq
file, banded:./bin/wfa.affine.gpu -i PacBioHiFi.seq -b 100000 -e 3000 -t 512 -x -B auto -o hifi.out
.seq
file, exact:./bin/wfa.affine.gpu -i PacBioHiFi.seq -b 100000 -e 3000 -t 512 -x -o hifi.out
.fasta
files, exact:./bin/wfa.affine.gpu -Q query.PacBioHiFi.fasta -T target.PacBioHiFi.fasta -b 100000 -e 3000 -t 512 -x -o hifi.out
Running the binary without any arguments lists the help menu:
[Input/Output]
-i, --input-seq (string) Input sequences file in .seq format: File containing the sequences to align in .seq format.
-Q, --input-fasta-query (string) Input query file in .fasta format: File containing the query sequences to align (if not using a .seq file).
-T, --input-fasta-target (string) Input target file in .fasta format: File containing the target sequences to align (if not using a .seq file).
-n, --num-alignments (int) Number of alignments: Number of alignments to read from the file (default=all alignments)
-o, --output-file (string) Output File: File where alignment output is saved.
-p, --print-output Print: Print output to stderr
-O, --output-verbose Verbose output: Add the query/target information on the output
[Alignment Options]
-g, --affine-penalties (string) Affine penalties: Gap-affine penalties for the alignment, in format x,o,e
-x, --compute-cigar Compute CIGAR: Compute the optimal alignment path (CIGAR) of all the alignments, otherwise, only the distance is computed.
-e, --max-distance (int) Maximum error allowed: Maximum error that the kernel will be able to compute (default = maximum possible error of first alignment)
-b, --batch-size (int) Batch size: Number of alignments per batch.
-B, --band (int) Banded execution: If this parameter is present, a banded approach is used (heuristic).The parameter tells how many steps to wait until the band is re-centered. Use "auto" to use an automatically generated band.
[System]
-c, --check Check: Check for alignment correctness
-t, --threads-per-block (int) Number of CUDA threads per alginment: Number of CUDA threads per block, each block computes one or multiple alignment
-w, --workers (int) GPU workers: Number of blocks ('workers') to be running on the GPU.
[Examples]
./bin/wfa.affine.gpu -i sequences.seq -b <batch_size> -o scores.out
./bin/wfa.affine.gpu -i sequences.seq -b <batch_size> -B auto -o scores-banded.out
./bin/wfa.affine.gpu -Q queries.fasta -T targets.fasta -b <batch_size> -o scores.out
./bin/wfa.affine.gpu -Q queries.fasta -T targets.fasta -b <batch_size> -x -o cigars.out
Choosing the correct alignment and system options is key for performance. The tool tries to automatically choose adequate parameters, but the user
may have additional information to make a better choice. It is especially important to limit the maximum error supported by the kernel as much as
possible (-e
parameter), this constrains the memory used per alignment and helps the program to choose better block and grid sizes. Keep in mind that any alignment having an error higher than the specified with the -e
argument will be computed on the CPU, so, if this argument is too small, performance can decrease.
For big alignments, setting a band (i.e. limiting maximum wavefront size) with the -B
argument can give significant
speedups, at the expense of potentially losing some accuracy in corner cases.
This is a simple example of how to use WFA-GPU on your projects, for extended examples, see the files in examples/
.
#include "include/wfa_gpu.h"
// ...
// Create and initialize the aligner structure
wfagpu_aligner_t aligner = {0};
wfagpu_initialize_aligner(&aligner);
// Add the sequences to align: wfagpu_add_sequences(&aligner, query, target)
wfagpu_add_sequences(&aligner, "GAATA", "GATACA");
wfagpu_add_sequences(&aligner, "CATTAATCTT", "CAGTAAT");
// ... Add as many sequence pairs as you need
// Initialize the alignment parameters (*after* the sequences have been
// added)
affine_penalties_t penalties = {.x = 2, .o = 3, .e = 1};
wfagpu_initialize_parameters(&aligner, penalties);
// Optionally set batch size (number of sequences aligned in parallel)
wfagpu_set_batch_size(&aligner, 10);
// Compute the backtrace and generate the alignment CIGAR, if this is set
// to false (default), only the affine distance is computed.
aligner.alignment_options.compute_cigar = true;
// Align all sequence pairs
wfagpu_align(&aligner);
// Get error (score) of first alignment
aligner.results[0].error
// Get ASCII CIGAR of first alignment (char*)
aligner.results[0].cigar.buffer
To compile, use the following options, where $WFAGPU_PATH
is the path of this repository.
gcc test_wfagpu.c -o test-wfagpu -I $WFAGPU_PATH/lib/ -I $WFAGPU_PATH -L $WFAGPU_PATH/build/ -L $WFAGPU_PATH/external/WFA/lib/ -lwfagpu -lwfa -lm -fopenmp
Then, to execute the generated binary, the OS needs to be able to find the dynamic library $WFAGPU_PATH/build/libwfagpu.so
. A fast and easy way of doing so is adding it into the LD_LIBRAY_PATH
environment variable (export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$WFAGPU_PATH/build/
).
When a screen is connected, the maximum kernel time is 5 seconds. Disable the GUI or choose a smaller batch size to reduce kernel execution time.
On Ubuntu-based systems, the GUI can be disabled with the command sudo systemctl isolate multi-user.target
(keep in mind that this will close all applications on your desktop environment such as browsers, text editors... etc).
The program is trying to use too much memory. Decrease batch size or maximum error supported by the kernel. The aligner tool stores all sequences in the main memory before starting the alignment on the GPU, if your machine does not have enough memory, it can also raise an out-of-memory error on the CPU side.
There is a tradeoff between the (β,λ) parameters and the recall and execution time of the method. When using a large bandwidth (β), a λ=100 is sufficient and, when using a smaller β, setting λ around 50 is recommended. In any case, the accuracy drop is usually small (<3%). The following table shows an example of how time and recall are affected by (β,λ) changes. Unless optimal accuracy is paramount, default (β,λ) parameters will yield a good balance between execution time and minimal loss in recall.
Time (in seconds) |
Recall (%) |
Time and recall achieved with different heuristic parameters for the Nanopore dataset.
Open an issue on GitHub or contact the main developer: Quim Aguado-Puig (quim.aguado.p@gmail.com)
WFA-GPU is distributed under the MIT license.
Quim Aguado-Puig, Max Doblas, Christos Matzoros, Antonio Espinosa, Juan Carlos Moure, Santiago Marco-Sola, Miquel Moreto. WFA-GPU: gap-affine pairwise read-alignment using GPUs. Bioinformatics (2023). DOI bioinformatics/btad701