An open index of research

A status.lu publication

Keyword

llm

5 papers tagged “llm

AIarXiv · Mar 2023 Open access

GPT-4 Technical Report

OpenAI

This technical report describes GPT-4, a large-scale multimodal Transformer model that accepts image and text inputs and produces text outputs. The report emphasizes that GPT-4 achieves human-level performance on a range of professional and academic benchmarks, and details infrastructure and optimization methods that allowed performance to be predicted from much smaller models. For competitive and safety reasons, the report withholds architecture, dataset, and training details.

AIAdvances in Neural Information Processing Systems (NeurIPS) · Mar 2022 Open access

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud and Arthur Mensch

This paper (the 'Chinchilla' paper) investigates the compute-optimal trade-off between model size and training-token count for large language models. By training over 400 models from 70M to 16B parameters on 5B to 500B tokens, the authors find that model size and training data should be scaled in roughly equal proportion—implying that prior large models were significantly undertrained. Their 70B-parameter Chinchilla model, trained on far more data under the same compute budget as Gopher, outperformed much larger models.

AIarXiv · Jan 2020 Open access

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish and Tom Henighan

This paper establishes empirical scaling laws showing that the cross-entropy loss of Transformer language models follows smooth power-law relationships with model size, dataset size, and the amount of training compute. The relationships hold across many orders of magnitude, while architectural details such as width and depth have comparatively minor effects. The work provided a quantitative framework for predicting model performance and allocating compute budgets.

AIJournal of Machine Learning Research (JMLR) · Oct 2019 Open access

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer and Adam Roberts

This paper introduces T5 (Text-to-Text Transfer Transformer), a framework that casts every NLP problem—translation, classification, question answering, summarization—as a text-to-text task with a unified model, objective, and decoding procedure. The authors conduct a large-scale empirical study comparing pre-training objectives, architectures, datasets, and transfer strategies, and release the C4 corpus. Scaling the model up to 11 billion parameters achieved state-of-the-art results on many benchmarks.

AIOpenAI Technical Report · Feb 2019 Open access

Language Models are Unsupervised Multitask Learners

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever

This paper introduces GPT-2, a 1.5-billion-parameter Transformer language model trained on a large web-text corpus (WebText) with a simple next-token prediction objective. It demonstrates that a sufficiently large language model can perform many NLP tasks in a zero-shot setting, without task-specific training data or fine-tuning. The work argued that unsupervised language modeling at scale implicitly learns to perform downstream tasks from naturally occurring demonstrations.