An open index of research

A status.lu publication

Biology & Genetics

Effective gene expression prediction from sequence by integrating long-range interactions

Žiga Avsec, Vikram Agarwal, David R. Kelley · Avsec et al. (DeepMind/Calico); model known as Enformer

Published 4 October 2021 · Nature Methods · Journal article

Summary

This paper introduces Enformer, a transformer-based deep learning model that predicts gene expression and chromatin states directly from DNA sequence by integrating regulatory information from up to ~100 kb away. By using self-attention to capture long-range interactions, it substantially improves prediction accuracy over prior convolutional models. The approach also improves prediction of the effects of non-coding genetic variants on expression.

Key findings

  • Uses transformer self-attention to incorporate long-range DNA interactions (up to ~100 kb), greatly increasing the receptive field over earlier CNN models.
  • Substantially improves accuracy of predicting gene expression and epigenetic tracks from sequence across human and mouse genomes.
  • Improves prediction of non-coding variant effects, including better prioritization of causal regulatory variants and fine-mapped eQTLs.

Subjects & keywords

Cite this paper

APA

Žiga Avsec, Vikram Agarwal, & David R. Kelley [Avsec et al. (DeepMind/Calico); model known as Enformer] (2021). Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods. https://doi.org/10.1038/s41592-021-01252-x

BibTeX
@article{avsec2021effective,
  author    = {Žiga Avsec and Vikram Agarwal and David R. Kelley and {Avsec et al. (DeepMind/Calico); model known as Enformer}},
  title     = {Effective gene expression prediction from sequence by integrating long-range interactions},
  journal   = {Nature Methods},
  year      = {2021},
  doi       = {10.1038/s41592-021-01252-x},
  url       = {https://doi.org/10.1038/s41592-021-01252-x}
}

Related in Biology & Genetics

Accurate structure prediction of biomolecular interactions with AlphaFold 3

Josh Abramson, Jonas Adler and John M. Jumper

This paper introduced AlphaFold 3, a unified deep learning model that predicts the joint structure of complexes containing proteins, nucleic acids, small-molecule ligands, ions, and modified residues. It replaces much of the prior architecture with a diffusion-based module that directly generates atomic coordinates. The model achieved substantially improved accuracy over specialized tools across many interaction types, including protein-ligand and protein-nucleic acid complexes.

Nature Open access

De novo design of protein structure and function with RFdiffusion

Joseph L. Watson, David Juergens and David Baker

This paper introduced RFdiffusion, a generative diffusion model built on the RoseTTAFold network for de novo protein design. It enables a range of design tasks, including unconditional generation, symmetric oligomer design, functional motif scaffolding, and binder design. Many designs were experimentally validated, with solved structures closely matching the intended models.

Nature Open access

A draft human pangenome reference

Wen-Wei Liao, Mobin Asri and Jana Ebler

The Human Pangenome Reference Consortium presents a first draft human pangenome built from 47 phased, diploid genome assemblies of genetically diverse individuals. The assemblies cover more than 99% of the expected sequence per genome at over 99% base-level and structural accuracy, and are combined into a graph-based reference. Relative to GRCh38, the pangenome adds about 119 million base pairs of euchromatic polymorphic sequence and 1,115 gene duplications, improving representation of variation at structurally complex loci.

Nature Open access