Adam: A Method for Stochastic Optimization
Diederik P. Kingma, Jimmy Ba
Summary
This paper introduced Adam, a first-order gradient-based optimization algorithm for stochastic objective functions that computes adaptive per-parameter learning rates from estimates of the first and second moments of the gradients. The method is computationally efficient, has low memory requirements, and is well suited to large-scale and noisy/sparse-gradient problems. It became one of the most widely used optimizers in deep learning.
Key findings
- Proposed an adaptive optimizer combining momentum (first moment) and RMSProp-style second-moment scaling with bias correction.
- Requires little tuning and works well across a wide range of machine learning problems.
- Provided convergence analysis and empirical results showing favorable performance versus other optimizers.
Subjects & keywords
Cite this paper
Diederik P. Kingma, & Jimmy Ba (2015). Adam: A Method for Stochastic Optimization. ICLR 2015 (3rd International Conference on Learning Representations). https://arxiv.org/abs/1412.6980
@inproceedings{kingma2015adam,
author = {Diederik P. Kingma and Jimmy Ba},
title = {Adam: A Method for Stochastic Optimization},
booktitle = {ICLR 2015 (3rd International Conference on Learning Representations)},
year = {2015},
url = {https://arxiv.org/abs/1412.6980}
}