If you use this pipeline or extract and reuse original code/rules from this repository, please cite the following two papers:
Porubsky and Ebert et al.
"Fully Phased Human Genome Assembly without Parental Data Using Single-Cell Strand Sequencing and Long Reads."
Nature Biotechnology, December 2020
DOI: 10.1038/s41587-020-0719-5
Ebert, Audano, Zhu and Rodriguez-Martin et al.
"Haplotype-resolved diverse human genomes and integrated analysis of structural variation"
Science, February 2021
DOI: 10.1126/science.abf7117
Please do not reference the preprints (10.1101/855049 and 10.1101/2020.12.16.423102) anymore.
This repository contains the Snakemake pipeline code plus some auxiliary scripts to go from raw input data to polished haploid assemblies. Any self-contained, general purpose software tool used in the pipeline is either available via conda/bioconda, or via github. In any case, the pipeline implementation covers the entire software setup required for a complete pipeline run.
In particular, the code for the SaaRclust
, StrandPhaseR
and breakpointR
R packages is
available in David Porubsky's github.
There are several step-by-step manuals available that describe all use cases currently supported for this pipeline. First-time users should start by reading the tutorial. If you encounter any problems or "strange behaviour" during pipeline execution, please check the FAQ for explanations and solutions. If this does not help, please open a github issue.