The Human Pangenome Reference Consortium presents a first draft human pangenome built from 47 phased, diploid genome assemblies of genetically diverse individuals. The assemblies cover more than 99% of the expected sequence per genome at over 99% base-level and structural accuracy, and are combined into a graph-based reference. Relative to GRCh38, the pangenome adds about 119 million base pairs of euchromatic polymorphic sequence and 1,115 gene duplications, improving representation of variation at structurally complex loci.
The Telomere-to-Telomere (T2T) Consortium reports T2T-CHM13, the first essentially gapless assembly of a human genome (all chromosomes except Y), totaling about 3.055 billion base pairs. The assembly resolves previously unfinished heterochromatic and repetitive regions, including centromeric satellite arrays, segmental duplications, and the short arms of the acrocentric chromosomes. It adds nearly 200 million base pairs of new sequence and corrects errors in prior reference assemblies.
This paper introduces Enformer, a transformer-based deep learning model that predicts gene expression and chromatin states directly from DNA sequence by integrating regulatory information from up to ~100 kb away. By using self-attention to capture long-range interactions, it substantially improves prediction accuracy over prior convolutional models. The approach also improves prediction of the effects of non-coding genetic variants on expression.
Grace X. Y. Zheng, Jessica M. Terry, Phillip Belgrader and Jason H. Bielas
This paper introduced a droplet-based microfluidic platform (the 10x Genomics GemCode/Chromium system) for high-throughput single-cell RNA sequencing using barcoded gel beads. The method enables 3' digital expression profiling of thousands of cells per run at low cost. The authors profiled tens of thousands of cells, including ~68,000 PBMCs, demonstrating the ability to resolve immune cell subpopulations and detect rare cell types.
Patrik L. Ståhl, Fredrik Salmén, Joakim Lundeberg and Jonas Frisén
The authors introduce 'spatial transcriptomics,' a method that places thin histological tissue sections onto a glass surface arrayed with barcoded reverse-transcription primers, so that mRNA captured at each position retains its two-dimensional spatial coordinates. Sequencing the barcoded cDNA reconstructs genome-wide expression maps directly on the tissue image. They demonstrate the approach on mouse brain and human breast cancer sections, recovering spatially resolved transcriptomes that align with tissue morphology.
Eric S. Lander, Lauren M. Linton, Bruce Birren, Chad Nusbaum and Michael C. Zody
This paper reported the results of the publicly funded Human Genome Project, presenting and making freely available a draft sequence covering the great majority of the human genome along with an initial analysis. The consortium described the broad genomic landscape—including gene content, repeat elements, GC content, and recombination rates—and estimated a surprisingly low number of protein-coding genes, on the order of roughly 30,000–40,000. The work provided a foundational reference for human biology, medicine, and evolutionary studies.