stochastic gradient descent — papers

AIICLR 2015 (3rd International Conference on Learning Representations) · May 2015 Open access

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba

This paper introduced Adam, a first-order gradient-based optimization algorithm for stochastic objective functions that computes adaptive per-parameter learning rates from estimates of the first and second moments of the gradients. The method is computationally efficient, has low memory requirements, and is well suited to large-scale and noisy/sparse-gradient problems. It became one of the most widely used optimizers in deep learning.

optimization stochastic gradient descent deep learning adaptive learning rate