Skip to content

This repo is for backup only. Please check the parent repo for details.

License

Notifications You must be signed in to change notification settings

hlilab/longcallD

 
 

Repository files navigation

C/C++ CI C/C++ CI License

Updates (pre-release v0.0.3)

  • Fix a couple of corner cases

Getting Started

# git clone --recursive https://github.com/yangao07/longcallD
# cd longcallD && make

# Download pre-built executables and test data (recommended)
# Linux
wget https://github.com/yangao07/longcallD/releases/download/v0.0.3/longcallD-v0.0.3_x64-linux.tar.gz
tar -zxvf longcallD-v0.0.3_x64-linux.tar.gz && cd longcallD-v0.0.3_x64-linux
# MacOS
wget https://github.com/yangao07/longcallD/releases/download/v0.0.3/longcallD-v0.0.3_arm64-macos.tar.gz
tar -zxvf longcallD-v0.0.3_arm64-macos.tar.gz && cd longcallD-v0.0.3_arm64-macos

# PacBio HiFi reads
./longcallD call ./test_data/chr11_2M.fa ./test_data/HG002_chr11_hifi_test.bam --hifi > HG002_hifi_test.vcf
# Oxford Nanopore reads
./longcallD call ./test_data/chr11_2M.fa ./test_data/HG002_chr11_ont_test.bam --ont > HG002_ont_test.vcf

Table of Contents

Introduction

LongcallD is a local-haplotagging-based variant caller designed for detecting small variants and structural variants (SVs) using long-read sequencing data. It supports both PacBio HiFi and Oxford Nanopore reads.

LongcallD phases long reads into haplotypes using SNPs and small indels before calling SVs. It outputs phased variant calls in VCF format, including SNPs, small indels, and large SVs (currently only supporting insertions and deletions).

Installation

Pre-built executables (recommended)

For Linux:

wget https://github.com/yangao07/longcallD/releases/download/v0.0.3/longcallD-v0.0.3_x64-linux.tar.gz
tar -zxvf longcallD-v0.0.3_x64-linux.tar.gz

For macOS:

wget https://github.com/yangao07/longcallD/releases/download/v0.0.3/longcallD-v0.0.3_arm64-macos.tar.gz
tar -zxvf longcallD-v0.0.3_arm64-macos.tar.gz

Build from source

To compile longcallD from source, ensure you have GCC/clang(9.0+) and zlib installed. It is recommended to use the latest release.

wget https://github.com/yangao07/longcallD/releases/download/v0.0.3/longcallD-v0.0.3.tar.gz
tar -zxvf longcallD-v0.0.3.tar.gz
cd longcallD-v0.0.3; make

Usage

LongcallD requires a reference genome (FASTA) and a long-read SAM/BAM/CRAM file as inputs. It outputs phased variant calls in VCF format.

Variant calling with HiFi/Nanopore long reads

longcallD call -t16 ref.fa hifi.bam > hifi.vcf         # default for PacBio HiFi reads (--hifi)
longcallD call -t16 ref.fa ont.bam --ont > ont.vcf     # for ONT reads

Region-specific variant calling

LongcallD supports region-based variant calling, similar to samtools view.

longcallD call -t16 ref.fa hifi.bam chr11:10,229,956-10,256,221 > hifi_reg.vcf
longcallD call -t16 ref.fa hifi.bam chr11:10,229,956-10,256,221 chr12:10,576,356-10,583,438 > hifi_regs.vcf

Variant calling and output phased long reads

longcallD call -t16 ref.fa hifi.bam --hifi -b hifi_phased.bam > hifi.vcf  # output phased HiFi reads (BAM tag: HP & PS)
longcallD call -t16 ref.fa ont.bam --ont -b ont_phased.bam > ont.vcf      # output phased ONT reads (BAM tag: HP & PS)

Variant calling from remote files

ref=https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/references/GRCh38/GRCh38_GIABv3_no_alt_analysis_set_maskedGRC_decoys_MAP2K3_KMT2C_KCNJ18.fasta.gz
bam=https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/PacBio_HiFi-Revio_20231031/HG002_PacBio-HiFi-Revio_20231031_48x_GRCh38-GIABv3.bam
longcallD call -t16 $ref $bam chr11:10,229,956-10,256,221 chr12:10,576,356-10,583,438 > hifi_regs.vcf

Contact

For any questions or support, please contact:

About

This repo is for backup only. Please check the parent repo for details.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 87.0%
  • C++ 12.1%
  • Makefile 0.9%