Language Models are Unsupervised Multitask Learners

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever · OpenAI

Published 14 February 2019 · OpenAI Technical Report · Preprint

Summary

This paper introduces GPT-2, a 1.5-billion-parameter Transformer language model trained on a large web-text corpus (WebText) with a simple next-token prediction objective. It demonstrates that a sufficiently large language model can perform many NLP tasks in a zero-shot setting, without task-specific training data or fine-tuning. The work argued that unsupervised language modeling at scale implicitly learns to perform downstream tasks from naturally occurring demonstrations.

Key findings

GPT-2 (1.5B parameters) achieved state-of-the-art results on 7 of 8 tested language modeling datasets in a zero-shot setting.
Performance on language modeling and downstream tasks improved log-linearly with model capacity, motivating further scaling.
A single unsupervised model could perform tasks such as reading comprehension, summarization, translation, and question answering without explicit supervision.

Subjects & keywords

Artificial Intelligence llm language modeling unsupervised learning zero-shot gpt-2

Cite this paper

APA

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, & Ilya Sutskever [OpenAI] (2019). Language Models are Unsupervised Multitask Learners. OpenAI Technical Report. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

BibTeX

@misc{radford2019language,
  author    = {Alec Radford and Jeffrey Wu and Rewon Child and David Luan and Dario Amodei and Ilya Sutskever and {OpenAI}},
  title     = {Language Models are Unsupervised Multitask Learners},
  journal   = {OpenAI Technical Report},
  year      = {2019},
  url       = {https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf}
}

Related in Artificial Intelligence

AI2023

Segment Anything

Alexander Kirillov, Eric Mintun and Nikhila Ravi

This paper introduces the Segment Anything project: a promptable image segmentation task, the Segment Anything Model (SAM), and the SA-1B dataset. SAM combines an image encoder, a flexible prompt encoder (points, boxes, masks, text), and a fast mask decoder to produce valid segmentation masks from arbitrary prompts. Trained on over 1 billion masks across 11 million images, SAM shows strong zero-shot transfer to many segmentation tasks without additional training.

IEEE/CVF International Conference on Computer Vision (ICCV) Open access

AI2023

GPT-4 Technical Report

OpenAI

This technical report describes GPT-4, a large-scale multimodal Transformer model that accepts image and text inputs and produces text outputs. The report emphasizes that GPT-4 achieves human-level performance on a range of professional and academic benchmarks, and details infrastructure and optimization methods that allowed performance to be predicted from much smaller models. For competitive and safety reasons, the report withholds architecture, dataset, and training details.

arXiv Open access

AI2023

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril and Gautier Izacard

The paper presents LLaMA, a family of foundation language models ranging from 7B to 65B parameters trained exclusively on publicly available datasets. It argues that strong performance can be reached without proprietary data and at smaller parameter counts than prior models. LLaMA-13B outperforms the much larger GPT-3 175B on most benchmarks, and LLaMA-65B is competitive with the best contemporary models such as Chinchilla-70B and PaLM-540B.

arXiv preprint (arXiv:2302.13971) Open access