Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel, Noam Shazeer, Adam Roberts · and 6 others (Google)
Summary
This paper introduces T5 (Text-to-Text Transfer Transformer), a framework that casts every NLP problem—translation, classification, question answering, summarization—as a text-to-text task with a unified model, objective, and decoding procedure. The authors conduct a large-scale empirical study comparing pre-training objectives, architectures, datasets, and transfer strategies, and release the C4 corpus. Scaling the model up to 11 billion parameters achieved state-of-the-art results on many benchmarks.
Key findings
- A unified text-to-text format allows a single model and training objective to be applied across diverse NLP tasks.
- Systematic comparison identified effective choices for pre-training objective (span corruption), architecture (encoder-decoder), and data scale.
- Combining the framework with scale and the new C4 dataset yielded state-of-the-art performance on benchmarks including GLUE, SuperGLUE, and SQuAD.
Subjects & keywords
Cite this paper
Colin Raffel, Noam Shazeer, & Adam Roberts [and 6 others (Google)] (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research (JMLR). https://doi.org/10.48550/arXiv.1910.10683
@article{raffel2020exploring,
author = {Colin Raffel and Noam Shazeer and Adam Roberts and {and 6 others (Google)}},
title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
journal = {Journal of Machine Learning Research (JMLR)},
year = {2020},
doi = {10.48550/arXiv.1910.10683},
url = {https://arxiv.org/abs/1910.10683}
}