pre-training — papers · Status Papers

AIProceedings of NAACL-HLT 2019 · Jun 2019 Open access

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova

BERT is a language representation model pre-trained on large unlabeled corpora using masked language modeling and next-sentence prediction, yielding deeply bidirectional contextual representations. The pre-trained model can be fine-tuned with a single additional output layer to achieve strong performance across diverse downstream tasks. It set new state-of-the-art results on eleven NLP benchmarks at the time of publication.

bert nlp language model pre-training