ConsHMM provides tools for parsing a multiple species alignment and training a Hidden Markov Model (HMM) to learn a conservation state annotation of the reference genome in the alignment, at single nucleotide resolution. The HMM is learned using an updated version of the ChromHMM software, which is included in this repository. Tools for visualizing and interpreting ConsHMM output are also provided.
The segmentation and browser files mentioned in the paper are available here. The link provides the intermediate files produced by the pipeline using the hg19 Multiz 100-way alignment.
For pre-generated ConsHMM annotations in multiple species, visit the ConsHMM Atlas.
Files from the analysis of bases prioritized by various variant scores in the paper are available here.
v1.1 updates:
- Allele-specific annotations
- parseMAF can now work on MAF files split by the chromosomes of a different species than the target one
If you are in a conda environment, the following lines will install the necessary python libraries
conda install -c conda-forge biopython
conda install -c anaconda numpy
The Wiki contains useful tutorials, including how to reproduce the model and segmentation from the original ConsHMM paper or create your own based on a different reference species and/or multiple-sequence alignment.
For any use of the ConsHMM software or ConsHMM state annotations, please cite:
Arneson A, Ernst J. Systematic discovery of conservation states for single-nucleotide annotation of the human genome. Communications Biology, 248, 2019. doi: https://doi.org/10.1038/s42003-019-0488-1
Adriana Arneson (University of California, Los Angeles)
Jason Ernst (University of California, Los Angeles)
Bruins In Genomics students Brooke Felsheim (Washington University in St. Louis) and Jennifer Chien (Wellesley College) helped test the pipeline during the summer of 2018 and implemented several additional features.