Effective gene expression prediction from sequence by integrating long-range interactions
This paper introduces Enformer, a transformer-based deep learning model that predicts gene expression and chromatin states directly from DNA sequence by integrating regulatory information from up to ~100 kb away. By using self-attention to capture long-range interactions, it substantially improves prediction accuracy over prior convolutional models. The approach also improves prediction of the effects of non-coding genetic variants on expression.