Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
The paper shows that prompting a large language model with a few exemplars that include intermediate reasoning steps (a 'chain of thought') substantially improves its ability to solve multi-step reasoning problems. This reasoning ability emerges only in sufficiently large models and requires no fine-tuning. Across arithmetic, commonsense, and symbolic reasoning tasks, chain-of-thought prompting produces large gains, including a new state of the art on the GSM8K math word-problem benchmark.