Effective gene expression prediction from sequence by integrating long-range interactions
Žiga Avsec, Vikram Agarwal, David R. Kelley · Avsec et al. (DeepMind/Calico); model known as Enformer
Summary
This paper introduces Enformer, a transformer-based deep learning model that predicts gene expression and chromatin states directly from DNA sequence by integrating regulatory information from up to ~100 kb away. By using self-attention to capture long-range interactions, it substantially improves prediction accuracy over prior convolutional models. The approach also improves prediction of the effects of non-coding genetic variants on expression.
Key findings
- Uses transformer self-attention to incorporate long-range DNA interactions (up to ~100 kb), greatly increasing the receptive field over earlier CNN models.
- Substantially improves accuracy of predicting gene expression and epigenetic tracks from sequence across human and mouse genomes.
- Improves prediction of non-coding variant effects, including better prioritization of causal regulatory variants and fine-mapped eQTLs.
Subjects & keywords
Cite this paper
Žiga Avsec, Vikram Agarwal, & David R. Kelley [Avsec et al. (DeepMind/Calico); model known as Enformer] (2021). Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods. https://doi.org/10.1038/s41592-021-01252-x
@article{avsec2021effective,
author = {Žiga Avsec and Vikram Agarwal and David R. Kelley and {Avsec et al. (DeepMind/Calico); model known as Enformer}},
title = {Effective gene expression prediction from sequence by integrating long-range interactions},
journal = {Nature Methods},
year = {2021},
doi = {10.1038/s41592-021-01252-x},
url = {https://doi.org/10.1038/s41592-021-01252-x}
}