An open index of research

A status.lu publication

Biology & Genetics

Highly accurate protein structure prediction for the human proteome

Kathryn Tunyasuvunakool, John Jumper, Demis Hassabis · Collaboration of 33 authors from DeepMind and EMBL-EBI; senior/corresponding authors include John Jumper, Demis Hassabis, and Pushmeet Kohli. Other authors include Jonas Adler, Zachary Wu, Tim Green, Sameer Velankar, Ewan Birney.

Published 22 July 2021 · Nature · Journal article

Summary

This companion paper applied AlphaFold to predict structures for nearly the entire human proteome and 20 other key organisms, producing a large public database of predicted models. It assessed coverage and confidence across the human proteome, showing that a substantial fraction of residues could be modeled with high or very high confidence. The work created the AlphaFold Protein Structure Database, greatly expanding structural coverage beyond experimentally determined structures.

Key findings

  • Predicted structures were generated for 98.5% of human protein residues, roughly doubling the proportion of the proteome with confident structural annotation.
  • About 58% of residues were predicted with confidence, and ~36% at very high confidence, with confidence metrics correlating with disorder and experimental coverage.
  • Results were released as the openly accessible AlphaFold Protein Structure Database covering humans and 20 additional organisms.

Subjects & keywords

Cite this paper

APA

Kathryn Tunyasuvunakool, John Jumper, & Demis Hassabis [Collaboration of 33 authors from DeepMind and EMBL-EBI; senior/corresponding authors include John Jumper, Demis Hassabis, and Pushmeet Kohli. Other authors include Jonas Adler, Zachary Wu, Tim Green, Sameer Velankar, Ewan Birney.] (2021). Highly accurate protein structure prediction for the human proteome. Nature. https://doi.org/10.1038/s41586-021-03828-1

BibTeX
@article{tunyasuvunakool2021highly,
  author    = {Kathryn Tunyasuvunakool and John Jumper and Demis Hassabis and {Collaboration of 33 authors from DeepMind and EMBL-EBI; senior/corresponding authors include John Jumper, Demis Hassabis, and Pushmeet Kohli. Other authors include Jonas Adler, Zachary Wu, Tim Green, Sameer Velankar, Ewan Birney.}},
  title     = {Highly accurate protein structure prediction for the human proteome},
  journal   = {Nature},
  year      = {2021},
  doi       = {10.1038/s41586-021-03828-1},
  url       = {https://doi.org/10.1038/s41586-021-03828-1}
}

Related in Biology & Genetics

Accurate structure prediction of biomolecular interactions with AlphaFold 3

Josh Abramson, Jonas Adler and John M. Jumper

This paper introduced AlphaFold 3, a unified deep learning model that predicts the joint structure of complexes containing proteins, nucleic acids, small-molecule ligands, ions, and modified residues. It replaces much of the prior architecture with a diffusion-based module that directly generates atomic coordinates. The model achieved substantially improved accuracy over specialized tools across many interaction types, including protein-ligand and protein-nucleic acid complexes.

Nature Open access

De novo design of protein structure and function with RFdiffusion

Joseph L. Watson, David Juergens and David Baker

This paper introduced RFdiffusion, a generative diffusion model built on the RoseTTAFold network for de novo protein design. It enables a range of design tasks, including unconditional generation, symmetric oligomer design, functional motif scaffolding, and binder design. Many designs were experimentally validated, with solved structures closely matching the intended models.

Nature Open access

A draft human pangenome reference

Wen-Wei Liao, Mobin Asri and Jana Ebler

The Human Pangenome Reference Consortium presents a first draft human pangenome built from 47 phased, diploid genome assemblies of genetically diverse individuals. The assemblies cover more than 99% of the expected sequence per genome at over 99% base-level and structural accuracy, and are combined into a graph-based reference. Relative to GRCh38, the pangenome adds about 119 million base pairs of euchromatic polymorphic sequence and 1,115 gene duplications, improving representation of variation at structurally complex loci.

Nature Open access