BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Published 2 June 2019 · Proceedings of NAACL-HLT 2019 · Conference paper

Read the original paper Open-access full text Cite

Summary

BERT is a language representation model pre-trained on large unlabeled corpora using masked language modeling and next-sentence prediction, yielding deeply bidirectional contextual representations. The pre-trained model can be fine-tuned with a single additional output layer to achieve strong performance across diverse downstream tasks. It set new state-of-the-art results on eleven NLP benchmarks at the time of publication.

Key findings

Introduced masked language modeling to enable jointly conditioning on left and right context (true bidirectionality)
A single pre-trained model fine-tuned per task achieved state-of-the-art on 11 NLP tasks including GLUE, SQuAD, and MultiNLI
Pushed the GLUE score to 80.5% and SQuAD v1.1 test F1 to 93.2, large gains over prior systems

Subjects & keywords

Artificial Intelligence bert nlp language model pre-training transformers

Cite this paper

APA

Jacob Devlin, Ming-Wei Chang, Kenton Lee, & Kristina Toutanova (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019. https://doi.org/10.18653/v1/N19-1423

BibTeX

@inproceedings{devlin2019bert,
  author    = {Jacob Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova},
  title     = {BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
  booktitle = {Proceedings of NAACL-HLT 2019},
  year      = {2019},
  doi       = {10.18653/v1/N19-1423},
  url       = {https://doi.org/10.18653/v1/N19-1423}
}

Related in Artificial Intelligence

AI2023

Segment Anything

Alexander Kirillov, Eric Mintun and Nikhila Ravi

This paper introduces the Segment Anything project: a promptable image segmentation task, the Segment Anything Model (SAM), and the SA-1B dataset. SAM combines an image encoder, a flexible prompt encoder (points, boxes, masks, text), and a fast mask decoder to produce valid segmentation masks from arbitrary prompts. Trained on over 1 billion masks across 11 million images, SAM shows strong zero-shot transfer to many segmentation tasks without additional training.

IEEE/CVF International Conference on Computer Vision (ICCV) Open access

AI2023

GPT-4 Technical Report

OpenAI

This technical report describes GPT-4, a large-scale multimodal Transformer model that accepts image and text inputs and produces text outputs. The report emphasizes that GPT-4 achieves human-level performance on a range of professional and academic benchmarks, and details infrastructure and optimization methods that allowed performance to be predicted from much smaller models. For competitive and safety reasons, the report withholds architecture, dataset, and training details.

arXiv Open access

AI2023

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril and Gautier Izacard

The paper presents LLaMA, a family of foundation language models ranging from 7B to 65B parameters trained exclusively on publicly available datasets. It argues that strong performance can be reached without proprietary data and at smaller parameter counts than prior models. LLaMA-13B outperforms the much larger GPT-3 175B on most benchmarks, and LLaMA-65B is competitive with the best contemporary models such as Chinchilla-70B and PaLM-540B.

arXiv preprint (arXiv:2302.13971) Open access