Skip to content

Pipeline code for creating a fully haplotype-resolved assembly from a combination of PacBio/ONT long reads and Illumina Strand-seq data

License

Notifications You must be signed in to change notification settings

ptrebert/project-diploid-assembly

Repository files navigation

Project repository: Phased Genome Assembly using Strand-seq (PGAS)

Citation

If you use this pipeline or extract and reuse original code/rules from this repository, please cite the following two papers:

Porubsky and Ebert et al.
"Fully Phased Human Genome Assembly without Parental Data Using Single-Cell Strand Sequencing and Long Reads."
Nature Biotechnology, December 2020
DOI: 10.1038/s41587-020-0719-5

Ebert, Audano, Zhu and Rodriguez-Martin et al.
"Haplotype-resolved diverse human genomes and integrated analysis of structural variation"
Science, February 2021
DOI: 10.1126/science.abf7117

Deprecated citations

Please do not reference the preprints (10.1101/855049 and 10.1101/2020.12.16.423102) anymore.

Scope of this repository

This repository contains the Snakemake pipeline code plus some auxiliary scripts to go from raw input data to polished haploid assemblies. Any self-contained, general purpose software tool used in the pipeline is either available via conda/bioconda, or via github. In any case, the pipeline implementation covers the entire software setup required for a complete pipeline run.

In particular, the code for the SaaRclust, StrandPhaseR and breakpointR R packages is available in David Porubsky's github.

Documentation

There are several step-by-step manuals available that describe all use cases currently supported for this pipeline. First-time users should start by reading the tutorial. If you encounter any problems or "strange behaviour" during pipeline execution, please check the FAQ for explanations and solutions. If this does not help, please open a github issue.

About

Pipeline code for creating a fully haplotype-resolved assembly from a combination of PacBio/ONT long reads and Illumina Strand-seq data

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published