Skip to content

accurate prediction of promoter activity and variant effects from massive parallel reporter assays

License

Notifications You must be signed in to change notification settings

autosome-ru/LegNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LegNet: solving the sequence-to-expression problem with SOTA convolutional networks

Dmitry Penzar, Daria Nogina et al., LegNet: a best-in-class deep learning model for short DNA regulatory regions, Bioinformatics, 2023; doi: 10.1093/bioinformatics/btad457

[Paper] [Preprint] [Application to the human data]

Here we present a convolutional network for predicting gene expression and sequence variant effects based on data obtained by large-scale parallel reporter assays.

Our approach secured 1st place in the recent DREAM 2022 challenge in predicting gene expression from millions of promoter sequences. To achieve the top performance, we drew inspiration from EfficientNetV2, a recent state-of-the-art in image analysis, and rephrased the initial sequence-to-expression regression problem as a soft-classification task. In the framework of the DREAM challenge, our model outperformed both attention transformers and recurrent neural networks.

Furthermore, we demonstrate how LegNet can be used in diffusion generative modeling as a step toward the rational design of gene regulatory sequences.

This repository provides several resources: