Keyword

transformers

4 papers tagged “transformers”

AIInternational Conference on Learning Representations (ICLR 2022) · Apr 2022 Open access

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, et al.

The paper introduces LoRA, a parameter-efficient fine-tuning method that keeps the pretrained model weights frozen and instead learns small trainable low-rank decomposition matrices injected into the Transformer layers. This drastically cuts the number of trainable parameters and optimizer memory needed to adapt very large models to downstream tasks. The authors show LoRA matches or exceeds full fine-tuning quality across several models including GPT-3 175B while adding no extra inference latency.

parameter-efficient fine-tuning large language models low-rank adaptation transformers

AIICLR 2021 (9th International Conference on Learning Representations) · May 2021 Open access

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov and Neil Houlsby

This paper introduced the Vision Transformer (ViT), applying a standard Transformer encoder directly to sequences of image patches treated as tokens, with minimal vision-specific inductive biases. When pre-trained on large datasets and transferred to downstream tasks, ViT matched or exceeded state-of-the-art convolutional networks while requiring fewer computational resources to train. It demonstrated that convolutions are not necessary for strong image recognition at scale.

vision transformer image classification transformers computer vision

AIProceedings of NAACL-HLT 2019 · Jun 2019 Open access

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova

BERT is a language representation model pre-trained on large unlabeled corpora using masked language modeling and next-sentence prediction, yielding deeply bidirectional contextual representations. The pre-trained model can be fine-tuned with a single additional output layer to achieve strong performance across diverse downstream tasks. It set new state-of-the-art results on eleven NLP benchmarks at the time of publication.

bert nlp language model pre-training

AIAdvances in Neural Information Processing Systems 30 (NeurIPS 2017) · Jun 2017 Open access

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, et al.

The paper introduced the Transformer, a sequence-transduction architecture based entirely on attention mechanisms, dispensing with the recurrence and convolutions used by prior state-of-the-art models. By relying on multi-head self-attention, the model is more parallelizable and trains substantially faster, while achieving new state-of-the-art results on machine translation. The architecture became the foundation for subsequent large language models and much of modern deep learning.

deep learning transformers attention natural language processing