Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe, Christian Szegedy
Summary
This paper introduced batch normalization, a technique that normalizes layer inputs using mini-batch statistics to reduce internal covariate shift during training. It allows higher learning rates and less careful initialization, accelerates convergence, and acts as a regularizer. Applied to image classification networks, it dramatically reduced training steps and improved accuracy.
Key findings
- Normalizing layer inputs per mini-batch stabilizes and speeds up deep network training.
- Enables much higher learning rates and reduces sensitivity to initialization.
- Achieved state-of-the-art ImageNet accuracy with far fewer training steps and acted as a regularizer.
Subjects & keywords
Cite this paper
Sergey Ioffe, & Christian Szegedy (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ICML 2015 (32nd International Conference on Machine Learning). https://arxiv.org/abs/1502.03167
@inproceedings{ioffe2015batch,
author = {Sergey Ioffe and Christian Szegedy},
title = {Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift},
booktitle = {ICML 2015 (32nd International Conference on Machine Learning)},
year = {2015},
url = {https://arxiv.org/abs/1502.03167}
}