Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, Denny Zhou

Published 28 January 2022 · Advances in Neural Information Processing Systems 35 (NeurIPS 2022) · Conference paper

Summary

The paper shows that prompting a large language model with a few exemplars that include intermediate reasoning steps (a 'chain of thought') substantially improves its ability to solve multi-step reasoning problems. This reasoning ability emerges only in sufficiently large models and requires no fine-tuning. Across arithmetic, commonsense, and symbolic reasoning tasks, chain-of-thought prompting produces large gains, including a new state of the art on the GSM8K math word-problem benchmark.

Key findings

A handful of chain-of-thought exemplars unlocks strong multi-step reasoning without any model fine-tuning.
The benefit is an emergent property of model scale, appearing only in sufficiently large models (~100B+ parameters).
Using a 540B-parameter model (PaLM), CoT prompting set state-of-the-art accuracy on GSM8K and improved commonsense and symbolic reasoning tasks.

Subjects & keywords

Artificial Intelligence chain-of-thought prompting reasoning large language models in-context learning

Cite this paper

APA

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, & Denny Zhou (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems 35 (NeurIPS 2022). https://arxiv.org/abs/2201.11903

BibTeX

@inproceedings{wei2022chainofthought,
  author    = {Jason Wei and Xuezhi Wang and Dale Schuurmans and Maarten Bosma and Brian Ichter and Fei Xia and Ed H. Chi and Quoc V. Le and Denny Zhou},
  title     = {Chain-of-Thought Prompting Elicits Reasoning in Large Language Models},
  booktitle = {Advances in Neural Information Processing Systems 35 (NeurIPS 2022)},
  year      = {2022},
  url       = {https://arxiv.org/abs/2201.11903}
}

Related in Artificial Intelligence

AI2023

Segment Anything

Alexander Kirillov, Eric Mintun and Nikhila Ravi

This paper introduces the Segment Anything project: a promptable image segmentation task, the Segment Anything Model (SAM), and the SA-1B dataset. SAM combines an image encoder, a flexible prompt encoder (points, boxes, masks, text), and a fast mask decoder to produce valid segmentation masks from arbitrary prompts. Trained on over 1 billion masks across 11 million images, SAM shows strong zero-shot transfer to many segmentation tasks without additional training.

IEEE/CVF International Conference on Computer Vision (ICCV) Open access

AI2023

GPT-4 Technical Report

OpenAI

This technical report describes GPT-4, a large-scale multimodal Transformer model that accepts image and text inputs and produces text outputs. The report emphasizes that GPT-4 achieves human-level performance on a range of professional and academic benchmarks, and details infrastructure and optimization methods that allowed performance to be predicted from much smaller models. For competitive and safety reasons, the report withholds architecture, dataset, and training details.

arXiv Open access

AI2023

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril and Gautier Izacard

The paper presents LLaMA, a family of foundation language models ranging from 7B to 65B parameters trained exclusively on publicly available datasets. It argues that strong performance can be reached without proprietary data and at smaller parameter counts than prior models. LLaMA-13B outperforms the much larger GPT-3 175B on most benchmarks, and LLaMA-65B is competitive with the best contemporary models such as Chinchilla-70B and PaLM-540B.

arXiv preprint (arXiv:2302.13971) Open access