Keyword

large language models

3 papers tagged “large language models”

AIarXiv preprint (arXiv:2302.13971) · Feb 2023 Open access

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril and Gautier Izacard

The paper presents LLaMA, a family of foundation language models ranging from 7B to 65B parameters trained exclusively on publicly available datasets. It argues that strong performance can be reached without proprietary data and at smaller parameter counts than prior models. LLaMA-13B outperforms the much larger GPT-3 175B on most benchmarks, and LLaMA-65B is competitive with the best contemporary models such as Chinchilla-70B and PaLM-540B.

foundation models large language models open models efficient training

AIInternational Conference on Learning Representations (ICLR 2022) · Apr 2022 Open access

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, et al.

The paper introduces LoRA, a parameter-efficient fine-tuning method that keeps the pretrained model weights frozen and instead learns small trainable low-rank decomposition matrices injected into the Transformer layers. This drastically cuts the number of trainable parameters and optimizer memory needed to adapt very large models to downstream tasks. The authors show LoRA matches or exceeds full fine-tuning quality across several models including GPT-3 175B while adding no extra inference latency.

parameter-efficient fine-tuning large language models low-rank adaptation transformers

AIAdvances in Neural Information Processing Systems 35 (NeurIPS 2022) · Jan 2022 Open access

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, et al.

The paper shows that prompting a large language model with a few exemplars that include intermediate reasoning steps (a 'chain of thought') substantially improves its ability to solve multi-step reasoning problems. This reasoning ability emerges only in sufficiently large models and requires no fine-tuning. Across arithmetic, commonsense, and symbolic reasoning tasks, chain-of-thought prompting produces large gains, including a new state of the art on the GSM8K math word-problem benchmark.

chain-of-thought prompting reasoning large language models