Skip to content

Latest commit

 

History

History
43 lines (21 loc) · 2.39 KB

rel5.md

File metadata and controls

43 lines (21 loc) · 2.39 KB

rel5 (genomic DNA)

rel5 is a merger of NA12878 DNA sequencing data from rel3 (regular sequencing protocols) and rel4 (ultra-read set), recalled with the latest generation callers (Albacore 2.1 and guppy 0.3).

Notes on chunk size

Mike Schatz (Johns Hopkins) and Fritz Sedlazeck (CSHL) noticed that the Albacore 2.1 had a high frequency of long false positive deletions that were confounding SV prediction. This was tracked down with the help of Chris Wright and Tim Massingham at ONT to the "chunk size" setting and the computation of signal scaling. Changing this value to 10000 should remove this problem and was performed for the Guppy calls.

Reference

GRCh38 with decoys was used as the reference file: GRCh38_full_analysis_set_plus_decoy_hla.fa.

Guppy

Data was downloaded from the ENA raw submission. Guppy was run on the GridION X5. Calling took approximately 48 hours on dual GPUs (1080 Ti), therefore basecalling speed was ~2.4Gb/hour.

Downloads

Minimap2 alignments (minimap2 -t 12 -ax map-ont -L GRCh38_full_analysis_set_plus_decoy_hla.fa) and samtools 1.6 with new -L flag:

Albacore 2.1

These basecalls are not recommended due to the above mentioned chunk size problem, but are included for completeness.

Downloads

Minimap2 alignments (minimap2 -t 12 -ax map-ont -L GRCh38_full_analysis_set_plus_decoy_hla.fa):

Assembly

Adam Phillippy and Sergey Koren have posted a new Canu 1.7 + WTDBG + Nanopolish assembly using a dataset equivalent to the Albacore 2.1 reads above over on their blog.