LoRA: Low-Rank Adaptation of Large Language Models
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen
Summary
The paper introduces LoRA, a parameter-efficient fine-tuning method that keeps the pretrained model weights frozen and instead learns small trainable low-rank decomposition matrices injected into the Transformer layers. This drastically cuts the number of trainable parameters and optimizer memory needed to adapt very large models to downstream tasks. The authors show LoRA matches or exceeds full fine-tuning quality across several models including GPT-3 175B while adding no extra inference latency.
Key findings
- Reduces trainable parameters by up to ~10,000x and GPU memory needs by ~3x versus full fine-tuning of GPT-3 175B with Adam.
- Achieves accuracy on par with or better than full fine-tuning across RoBERTa, DeBERTa, GPT-2, and GPT-3 benchmarks.
- Introduces no additional inference latency because the low-rank updates can be merged back into the frozen weights.
Subjects & keywords
Cite this paper
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, & Weizhu Chen (2022). LoRA: Low-Rank Adaptation of Large Language Models. International Conference on Learning Representations (ICLR 2022). https://arxiv.org/abs/2106.09685
@inproceedings{hu2022lora,
author = {Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen},
title = {LoRA: Low-Rank Adaptation of Large Language Models},
booktitle = {International Conference on Learning Representations (ICLR 2022)},
year = {2022},
url = {https://arxiv.org/abs/2106.09685}
}