Language Models are Few-Shot Learners
Tom B. Brown, Benjamin Mann, Nick Ryder · 31 authors total (OpenAI). Full list: Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei.
Summary
This paper presented GPT-3, an autoregressive language model with 175 billion parameters, and studied its ability to perform tasks from natural-language descriptions and a few examples without gradient updates (in-context learning). Scaling the model dramatically improved few-shot performance across many NLP benchmarks, sometimes approaching fine-tuned systems. The authors also examined limitations, data contamination, and broader societal impacts of large language models.
Key findings
- Demonstrated that scaling to 175B parameters enables strong few-shot/zero-shot in-context learning without task-specific fine-tuning
- Achieved competitive or state-of-the-art results on numerous benchmarks via prompting alone
- Documented limitations including arithmetic, contamination concerns, and bias/societal risks of large LMs
Subjects & keywords
Cite this paper
Tom B. Brown, Benjamin Mann, & Nick Ryder [31 authors total (OpenAI). Full list: Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei.] (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems 33 (NeurIPS 2020). https://arxiv.org/abs/2005.14165
@inproceedings{brown2020language,
author = {Tom B. Brown and Benjamin Mann and Nick Ryder and {31 authors total (OpenAI). Full list: Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei.}},
title = {Language Models are Few-Shot Learners},
booktitle = {Advances in Neural Information Processing Systems 33 (NeurIPS 2020)},
year = {2020},
url = {https://arxiv.org/abs/2005.14165}
}