language modeling — papers

AIarXiv · Jan 2020 Open access

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish and Tom Henighan

This paper establishes empirical scaling laws showing that the cross-entropy loss of Transformer language models follows smooth power-law relationships with model size, dataset size, and the amount of training compute. The relationships hold across many orders of magnitude, while architectural details such as width and depth have comparatively minor effects. The work provided a quantitative framework for predicting model performance and allocating compute budgets.

llm scaling laws language modeling compute

AIOpenAI Technical Report · Feb 2019 Open access

Language Models are Unsupervised Multitask Learners

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever

This paper introduces GPT-2, a 1.5-billion-parameter Transformer language model trained on a large web-text corpus (WebText) with a simple next-token prediction objective. It demonstrates that a sufficiently large language model can perform many NLP tasks in a zero-shot setting, without task-specific training data or fine-tuning. The work argued that unsupervised language modeling at scale implicitly learns to perform downstream tasks from naturally occurring demonstrations.

llm language modeling unsupervised learning zero-shot