Subject

Artificial Intelligence

Machine learning and AI — the architectures, models and methods behind modern deep learning, from vision and language to protein folding and game-play.

25 papers in this field

AIIEEE/CVF International Conference on Computer Vision (ICCV) · Apr 2023 Open access

Segment Anything

Alexander Kirillov, Eric Mintun and Nikhila Ravi

This paper introduces the Segment Anything project: a promptable image segmentation task, the Segment Anything Model (SAM), and the SA-1B dataset. SAM combines an image encoder, a flexible prompt encoder (points, boxes, masks, text), and a fast mask decoder to produce valid segmentation masks from arbitrary prompts. Trained on over 1 billion masks across 11 million images, SAM shows strong zero-shot transfer to many segmentation tasks without additional training.

computer vision image segmentation foundation model sam

AIarXiv · Mar 2023 Open access

GPT-4 Technical Report

OpenAI

This technical report describes GPT-4, a large-scale multimodal Transformer model that accepts image and text inputs and produces text outputs. The report emphasizes that GPT-4 achieves human-level performance on a range of professional and academic benchmarks, and details infrastructure and optimization methods that allowed performance to be predicted from much smaller models. For competitive and safety reasons, the report withholds architecture, dataset, and training details.

llm multimodal gpt-4 foundation model

AIarXiv preprint (arXiv:2302.13971) · Feb 2023 Open access

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril and Gautier Izacard

The paper presents LLaMA, a family of foundation language models ranging from 7B to 65B parameters trained exclusively on publicly available datasets. It argues that strong performance can be reached without proprietary data and at smaller parameter counts than prior models. LLaMA-13B outperforms the much larger GPT-3 175B on most benchmarks, and LLaMA-65B is competitive with the best contemporary models such as Chinchilla-70B and PaLM-540B.

foundation models large language models open models efficient training

AIInternational Conference on Learning Representations (ICLR 2022) · Apr 2022 Open access

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, et al.

The paper introduces LoRA, a parameter-efficient fine-tuning method that keeps the pretrained model weights frozen and instead learns small trainable low-rank decomposition matrices injected into the Transformer layers. This drastically cuts the number of trainable parameters and optimizer memory needed to adapt very large models to downstream tasks. The authors show LoRA matches or exceeds full fine-tuning quality across several models including GPT-3 175B while adding no extra inference latency.

parameter-efficient fine-tuning large language models low-rank adaptation transformers

AIAdvances in Neural Information Processing Systems (NeurIPS) · Mar 2022 Open access

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud and Arthur Mensch

This paper (the 'Chinchilla' paper) investigates the compute-optimal trade-off between model size and training-token count for large language models. By training over 400 models from 70M to 16B parameters on 5B to 500B tokens, the authors find that model size and training data should be scaled in roughly equal proportion—implying that prior large models were significantly undertrained. Their 70B-parameter Chinchilla model, trained on far more data under the same compute budget as Gopher, outperformed much larger models.

llm scaling laws compute-optimal chinchilla

AIAdvances in Neural Information Processing Systems 35 (NeurIPS 2022) · Mar 2022 Open access

Training language models to follow instructions with human feedback

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, et al.

The paper (InstructGPT) shows how to align language models with user intent by fine-tuning GPT-3 on human-written demonstrations and then optimizing against a learned reward model with reinforcement learning from human feedback (RLHF). Human evaluators preferred outputs from a 1.3B-parameter InstructGPT model over the 175B GPT-3 model, despite the large size difference. The approach improves truthfulness and reduces toxic generations while causing only minimal regressions on standard NLP benchmarks.

instructgpt rlhf language models instruction following

AIAdvances in Neural Information Processing Systems 35 (NeurIPS 2022) · Jan 2022 Open access

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, et al.

The paper shows that prompting a large language model with a few exemplars that include intermediate reasoning steps (a 'chain of thought') substantially improves its ability to solve multi-step reasoning problems. This reasoning ability emerges only in sufficiently large models and requires no fine-tuning. Across arithmetic, commonsense, and symbolic reasoning tasks, chain-of-thought prompting produces large gains, including a new state of the art on the GSM8K math word-problem benchmark.

chain-of-thought prompting reasoning large language models

AIIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022 · Dec 2021 Open access

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer

The paper proposes latent diffusion models (LDMs), which apply the diffusion process in the compressed latent space of a pretrained autoencoder rather than directly in pixel space, greatly reducing compute. A cross-attention conditioning mechanism enables flexible inputs such as text and bounding boxes for tasks including text-to-image generation, inpainting, and super-resolution. LDMs achieve strong or state-of-the-art results across these tasks while being far more efficient to train and sample, and this architecture underlies Stable Diffusion.

latent diffusion image synthesis generative models computer vision

AINature · Jul 2021 Open access

Highly accurate protein structure prediction with AlphaFold

John Jumper, Richard Evans, Alexander Pritzel, David Silver, Oriol Vinyals and Demis Hassabis

The paper introduces AlphaFold2, a deep-learning system that predicts three-dimensional protein structures directly from amino-acid sequence with near-experimental accuracy. It combines a novel attention-based Evoformer over multiple sequence alignments and pairwise representations with an end-to-end structure module that produces atomic coordinates. AlphaFold won the CASP14 assessment by a wide margin, delivering atomic-level accuracy for the majority of targets.

alphafold protein structure prediction deep learning structural biology

AIICLR 2021 (9th International Conference on Learning Representations) · May 2021 Open access

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov and Neil Houlsby

This paper introduced the Vision Transformer (ViT), applying a standard Transformer encoder directly to sequences of image patches treated as tokens, with minimal vision-specific inductive biases. When pre-trained on large datasets and transferred to downstream tasks, ViT matched or exceeded state-of-the-art convolutional networks while requiring fewer computational resources to train. It demonstrated that convolutions are not necessary for strong image recognition at scale.

vision transformer image classification transformers computer vision

AIProceedings of the 38th International Conference on Machine Learning (ICML 2021) · Feb 2021 Open access

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, et al.

The paper presents CLIP, which learns visual representations by contrastively matching images to their natural-language captions over a 400-million-pair web dataset. The pretrained model can be applied zero-shot to many downstream vision tasks by framing class labels as text prompts, without task-specific fine-tuning. It matches the accuracy of a supervised ImageNet ResNet-50 zero-shot and transfers robustly across a broad benchmark suite.

clip vision-language zero-shot learning contrastive learning

AIAdvances in Neural Information Processing Systems 33 (NeurIPS 2020) · Dec 2020 Open access

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann and Nick Ryder

This paper presented GPT-3, an autoregressive language model with 175 billion parameters, and studied its ability to perform tasks from natural-language descriptions and a few examples without gradient updates (in-context learning). Scaling the model dramatically improved few-shot performance across many NLP benchmarks, sometimes approaching fine-tuned systems. The authors also examined limitations, data contamination, and broader societal impacts of large language models.

gpt-3 language model few-shot learning in-context learning

AIAdvances in Neural Information Processing Systems 33 (NeurIPS 2020) · Jun 2020 Open access

Denoising Diffusion Probabilistic Models

Jonathan Ho, Ajay Jain and Pieter Abbeel

The paper introduces denoising diffusion probabilistic models (DDPMs), a class of latent-variable generative models trained to reverse a fixed Gaussian noising process. It establishes a connection between diffusion models and denoising score matching with Langevin dynamics, and proposes a simplified, reweighted training objective. The resulting models produce high-quality image samples, achieving competitive log-likelihoods and a strong FID on CIFAR-10.

diffusion models generative models image synthesis deep learning

AIarXiv · Jan 2020 Open access

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish and Tom Henighan

This paper establishes empirical scaling laws showing that the cross-entropy loss of Transformer language models follows smooth power-law relationships with model size, dataset size, and the amount of training compute. The relationships hold across many orders of magnitude, while architectural details such as width and depth have comparatively minor effects. The work provided a quantitative framework for predicting model performance and allocating compute budgets.

llm scaling laws language modeling compute

AIJournal of Machine Learning Research (JMLR) · Oct 2019 Open access

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer and Adam Roberts

This paper introduces T5 (Text-to-Text Transfer Transformer), a framework that casts every NLP problem—translation, classification, question answering, summarization—as a text-to-text task with a unified model, objective, and decoding procedure. The authors conduct a large-scale empirical study comparing pre-training objectives, architectures, datasets, and transfer strategies, and release the C4 corpus. Scaling the model up to 11 billion parameters achieved state-of-the-art results on many benchmarks.

llm transfer learning text-to-text t5

AIProceedings of NAACL-HLT 2019 · Jun 2019 Open access

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova

BERT is a language representation model pre-trained on large unlabeled corpora using masked language modeling and next-sentence prediction, yielding deeply bidirectional contextual representations. The pre-trained model can be fine-tuned with a single additional output layer to achieve strong performance across diverse downstream tasks. It set new state-of-the-art results on eleven NLP benchmarks at the time of publication.

bert nlp language model pre-training

AIOpenAI Technical Report · Feb 2019 Open access

Language Models are Unsupervised Multitask Learners

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever

This paper introduces GPT-2, a 1.5-billion-parameter Transformer language model trained on a large web-text corpus (WebText) with a simple next-token prediction objective. It demonstrates that a sufficiently large language model can perform many NLP tasks in a zero-shot setting, without task-specific training data or fine-tuning. The work argued that unsupervised language modeling at scale implicitly learns to perform downstream tasks from naturally occurring demonstrations.

llm language modeling unsupervised learning zero-shot

AINature · Oct 2017

Mastering the game of Go without human knowledge

David Silver, Julian Schrittwieser, Karen Simonyan and Demis Hassabis

This paper presented AlphaGo Zero, which learned to play Go solely through self-play reinforcement learning without any human game data or handcrafted features, using a single neural network and a simpler tree search. Starting from random play, it discovered Go knowledge and novel strategies on its own. AlphaGo Zero surpassed all previous versions of AlphaGo, including the one that beat Lee Sedol.

reinforcement learning self-play monte carlo tree search alphago zero

AIAdvances in Neural Information Processing Systems 30 (NeurIPS 2017) · Jun 2017 Open access

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, et al.

The paper introduced the Transformer, a sequence-transduction architecture based entirely on attention mechanisms, dispensing with the recurrence and convolutions used by prior state-of-the-art models. By relying on multi-head self-attention, the model is more parallelizable and trains substantially faster, while achieving new state-of-the-art results on machine translation. The architecture became the foundation for subsequent large language models and much of modern deep learning.

deep learning transformers attention natural language processing

AI2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) · Jun 2016 Open access

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun

The authors introduced a residual learning framework that reformulates network layers to learn residual functions with reference to their inputs (via identity 'shortcut' connections), making very deep networks substantially easier to optimize. They showed that such residual networks gain accuracy from greatly increased depth, evaluating models up to 152 layers deep on ImageNet at lower complexity than VGG networks. The approach won first place in the ILSVRC 2015 classification task and yielded large improvements on detection and localization benchmarks.

deep learning computer vision convolutional neural networks image classification

AINature · Jan 2016

Mastering the game of Go with deep neural networks and tree search

David Silver, Aja Huang, Chris J. Maddison and Demis Hassabis

This paper introduced AlphaGo, a system combining deep convolutional neural networks (policy and value networks) trained by supervised learning from human games and reinforcement learning by self-play, integrated with Monte Carlo tree search. The networks reduce the breadth and depth of the search needed to evaluate Go positions. AlphaGo defeated other Go programs and became the first program to beat a professional human Go player (Fan Hui) on a full-size board.

deep learning reinforcement learning monte carlo tree search game of go

AIMedical Image Computing and Computer-Assisted Intervention (MICCAI 2015), LNCS vol. 9351, pp. 234-241 · Oct 2015 Open access

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, Philipp Fischer and Thomas Brox

The paper introduces U-Net, an encoder-decoder convolutional network with a contracting path to capture context and a symmetric expanding path with skip connections for precise localization. Combined with heavy data augmentation, the architecture trains end-to-end from very few annotated images. It won the ISBI cell-tracking and neuronal-structure segmentation challenges and segments a 512x512 image in under a second on a GPU.

image segmentation convolutional neural networks biomedical imaging deep learning

AIICML 2015 (32nd International Conference on Machine Learning) · Jul 2015 Open access

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe and Christian Szegedy

This paper introduced batch normalization, a technique that normalizes layer inputs using mini-batch statistics to reduce internal covariate shift during training. It allows higher learning rates and less careful initialization, accelerates convergence, and acts as a regularizer. Applied to image classification networks, it dramatically reduced training steps and improved accuracy.

batch normalization deep learning neural network training regularization

AIICLR 2015 (3rd International Conference on Learning Representations) · May 2015 Open access

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba

This paper introduced Adam, a first-order gradient-based optimization algorithm for stochastic objective functions that computes adaptive per-parameter learning rates from estimates of the first and second moments of the gradients. The method is computationally efficient, has low memory requirements, and is well suited to large-scale and noisy/sparse-gradient problems. It became one of the most widely used optimizers in deep learning.

optimization stochastic gradient descent deep learning adaptive learning rate

AINature · Feb 2015

Human-level control through deep reinforcement learning

Volodymyr Mnih, Koray Kavukcuoglu and David Silver

The paper introduced the Deep Q-Network (DQN), which combines Q-learning with deep convolutional networks and stabilizing techniques such as experience replay and a target network. Trained end-to-end from raw pixels and game scores, a single architecture and hyperparameter set learned to play 49 Atari 2600 games. It reached or exceeded the level of a professional human games tester on the majority of titles, demonstrating a general agent learning directly from high-dimensional sensory input.

deep reinforcement learning dqn atari q-learning