Dmitry Penzar, Daria Nogina et al., LegNet: a best-in-class deep learning model for short DNA regulatory regions, Bioinformatics, 2023; doi: 10.1093/bioinformatics/btad457
[Paper
] [Preprint
] [Application to the human data
]
Here we present a convolutional network for predicting gene expression and sequence variant effects based on data obtained by large-scale parallel reporter assays.
Our approach secured 1st place in the recent DREAM 2022 challenge in predicting gene expression from millions of promoter sequences. To achieve the top performance, we drew inspiration from EfficientNetV2, a recent state-of-the-art in image analysis, and rephrased the initial sequence-to-expression regression problem as a soft-classification task. In the framework of the DREAM challenge, our model outperformed both attention transformers and recurrent neural networks.
Furthermore, we demonstrate how LegNet can be used in diffusion generative modeling as a step toward the rational design of gene regulatory sequences.
-
A tutorial Jupyter notebook demonstrating how LegNet can be practically used with the data from yeast gigantic parallel reporter assays.
-
A tutorial Jupyter notebook demonstrating changes in the optimized LegNet.
-
Code for diffusion generative modeling.
-
Scripts to reproduce the analysis presented in the LegNet manuscript based on the public GPRA data of Vaishnav et al., Zenodo.
-
Scripts to reproduce the autosome.org solution for the DREAM 2022 promoter expression challenge.