low-rank adaptation — papers

AIInternational Conference on Learning Representations (ICLR 2022) · Apr 2022 Open access

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, et al.

The paper introduces LoRA, a parameter-efficient fine-tuning method that keeps the pretrained model weights frozen and instead learns small trainable low-rank decomposition matrices injected into the Transformer layers. This drastically cuts the number of trainable parameters and optimizer memory needed to adapt very large models to downstream tasks. The authors show LoRA matches or exceeds full fine-tuning quality across several models including GPT-3 175B while adding no extra inference latency.

parameter-efficient fine-tuning large language models low-rank adaptation transformers