dbghaplo is a method that separates long reads (Nanopore or PacBio) of a mixture of sequences into groups with similar alleles. This is called "phasing" or "haplotyping".
dbghaplo is a "local haplotyping" method, so it works best when the sequence of interest is approximately the size of the reads. For genome-scale haplotyping, consider another tool such as floria.
- mixed viral long-read samples (e.g. co-infections)
- amplicon/enriched sequencing of specific genes
- haplotyping small sections of multi-strain bacterial communities
High-depth, heterogeneous sequencing that spans a 1kb gene.
Separated groups ("haplotypes") after running dbghaplo.
Similar tools exist for detection of similar haplotypes in mixtures. dbghaplo was developed to fill the following gaps:
- Speed and low-memory - dbghaplo scales approximately linearly with sequencing depth and # of SNPs. > 30,000x coverage genes can be haplotyped in minutes.
- High heterogeneity and coverage - dbghaplo uses a de Bruijn Graph approach, which works with very diverse samples (> 10 haplotypes)
- Ease-of-use + interpretable outputs - conda installable, engineered in rust, simple command line. Outputs are easy to interpret (haplotagged BAM or MSA).
mamba install -c bioconda dbghaplo
dbghaplo -h
See the installation instructions on the wiki if you want to compile directly or want a static binary. This is necessary if you're not on x86 architectures.
git clone https://github.com/bluenote-1577/dbghaplo
cd dbghaplo
dbghaplo -b hiv_test/3000_95_3.bam -v hiv_test/3000_95_3.vcf.gz -r hiv_test/OR483991.1.fasta
# results folder
ls dbghaplo_output
git clone https://github.com/bluenote-1577/dbghaplo
cd dbghaplo
run_dbghaplo_pipeline -i hiv_test/3000_95_3.fastq.gz -r hiv_test/OR483991.1.fasta --overwrite -o pipeline_output/
# results folder
ls pipeline_output
# intermediate files (bam + vcf files)
ls pipeline_output/pipeline_files
Note
If you did not install via conda, do the following instead.
mamba install -c bioconda tabix samtools lofreq minimap2
git clone https://github.com/bluenote-1577/dbghaplo
./dbghaplo/scripts/run_dbghaplo_pipeline -i reads.fq.gz -r reference.fa -o pipeline_output
- Output format - for more information on how to interpret outputs.
- Cookbook - see here for usage examples.
- Forthcoming.
Forthcoming.