compute — papers · Status Papers

AIarXiv · Jan 2020 Open access

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish and Tom Henighan

This paper establishes empirical scaling laws showing that the cross-entropy loss of Transformer language models follows smooth power-law relationships with model size, dataset size, and the amount of training compute. The relationships hold across many orders of magnitude, while architectural details such as width and depth have comparatively minor effects. The work provided a quantitative framework for predicting model performance and allocating compute budgets.

llm scaling laws language modeling compute