We developed a variational EM algorithm for a hierarchical Bayesian model to identify rare variants in heterogeneous next-generation sequencing data. Our algorithm is able to identify variants in a broad range of read depths and non-reference allele frequencies with high sensitivity and specificity. In an analysis of a directed evolution longitudinal yeast data set, we are able to identify a time-series trend in non-reference allele frequency and detect novel variants that have not yet been reported. Our model also detects the emergence of a beneficial variant earlier than was previously shown, and a pair of concomitant variants.
- Zhang, F. and Flaherty, P. Variational inference for rare variant detection in deep, heterogeneous next-generation sequencing data. BMC Bioinformatics, 18(1), 45, 2017
- He, Y., Zhang, F., and Flaherty, P. RVD2: an ultra-sensitive variant detection model for low-depth heterogeneous next-generation sequencing data. Bioinformatics, 31(17), 2785-2793, 2015