dbghaplo - long-read haplotypes from mixtures of "small" sequences

dbghaplo is a method that separates long reads (Nanopore or PacBio) of a mixture of sequences into groups with similar alleles. This is called "phasing" or "haplotyping".

dbghaplo is a "local haplotyping" method, so it works best when the sequence of interest is approximately the size of the reads. For genome-scale haplotyping, consider another tool such as floria.

Example use cases:

mixed viral long-read samples (e.g. co-infections)
amplicon/enriched sequencing of specific genes
haplotyping small sections of multi-strain bacterial communities

High-depth, heterogeneous sequencing that spans a 1kb gene.

Separated groups ("haplotypes") after running dbghaplo.

Why dbghaplo?

Similar tools exist for detection of similar haplotypes in mixtures. dbghaplo was developed to fill the following gaps:

Speed and low-memory - dbghaplo scales approximately linearly with sequencing depth and # of SNPs. > 30,000x coverage genes can be haplotyped in minutes.
High heterogeneity and coverage - dbghaplo uses a de Bruijn Graph approach, which works with very diverse samples (> 10 haplotypes)
Ease-of-use + interpretable outputs - conda installable, engineered in rust, simple command line. Outputs are easy to interpret (haplotagged BAM or MSA).

Install

mamba install -c bioconda dbghaplo
dbghaplo -h

See the installation instructions on the wiki if you want to compile directly or want a static binary. This is necessary if you're not on x86 architectures.

Quick Start after install

Option 1 (more flexible): Running dbghaplo with VCF + BAM

git clone https://github.com/bluenote-1577/dbghaplo
cd dbghaplo
dbghaplo -b hiv_test/3000_95_3.bam  -v hiv_test/3000_95_3.vcf.gz  -r hiv_test/OR483991.1.fasta

# results folder
ls dbghaplo_output

Option 2 (easier): Running dbghaplo with reads

git clone https://github.com/bluenote-1577/dbghaplo
cd dbghaplo
run_dbghaplo_pipeline -i hiv_test/3000_95_3.fastq.gz  -r hiv_test/OR483991.1.fasta --overwrite -o pipeline_output/ 

# results folder
ls pipeline_output

# intermediate files (bam + vcf files)
ls pipeline_output/pipeline_files

Note

If you did not install via conda, do the following instead.

mamba install -c bioconda tabix samtools lofreq minimap2
git clone https://github.com/bluenote-1577/dbghaplo
./dbghaplo/scripts/run_dbghaplo_pipeline -i reads.fq.gz -r reference.fa -o pipeline_output

Manuals, tutorials, and cookbook

How to use dbghaplo

Output format - for more information on how to interpret outputs.
Cookbook - see here for usage examples.

Tutorials

Forthcoming.

Citation

Forthcoming.

Name		Name	Last commit message	Last commit date
Latest commit History 259 Commits
.github/workflows		.github/workflows
hiv_test		hiv_test
scripts		scripts
src		src
.gitattributes		.gitattributes
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dbghaplo - long-read haplotypes from mixtures of "small" sequences

Example use cases:

Why dbghaplo?

Install

Quick Start after install

Option 1 (more flexible): Running dbghaplo with VCF + BAM

Option 2 (easier): Running dbghaplo with reads

Manuals, tutorials, and cookbook

How to use dbghaplo

Tutorials

Citation

About

Releases 3

Languages

License

bluenote-1577/dbghaplo

Folders and files

Latest commit

History

Repository files navigation

dbghaplo - long-read haplotypes from mixtures of "small" sequences

Example use cases:

Why dbghaplo?

Install

Quick Start after install

Option 1 (more flexible): Running dbghaplo with VCF + BAM

Option 2 (easier): Running dbghaplo with reads

Manuals, tutorials, and cookbook

How to use dbghaplo

Tutorials

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Languages