Language Models are Unsupervised Multitask Learners
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever · OpenAI
Summary
This paper introduces GPT-2, a 1.5-billion-parameter Transformer language model trained on a large web-text corpus (WebText) with a simple next-token prediction objective. It demonstrates that a sufficiently large language model can perform many NLP tasks in a zero-shot setting, without task-specific training data or fine-tuning. The work argued that unsupervised language modeling at scale implicitly learns to perform downstream tasks from naturally occurring demonstrations.
Key findings
- GPT-2 (1.5B parameters) achieved state-of-the-art results on 7 of 8 tested language modeling datasets in a zero-shot setting.
- Performance on language modeling and downstream tasks improved log-linearly with model capacity, motivating further scaling.
- A single unsupervised model could perform tasks such as reading comprehension, summarization, translation, and question answering without explicit supervision.
Subjects & keywords
Cite this paper
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, & Ilya Sutskever [OpenAI] (2019). Language Models are Unsupervised Multitask Learners. OpenAI Technical Report. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
@misc{radford2019language,
author = {Alec Radford and Jeffrey Wu and Rewon Child and David Luan and Dario Amodei and Ilya Sutskever and {OpenAI}},
title = {Language Models are Unsupervised Multitask Learners},
journal = {OpenAI Technical Report},
year = {2019},
url = {https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf}
}