Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
This paper introduced batch normalization, a technique that normalizes layer inputs using mini-batch statistics to reduce internal covariate shift during training. It allows higher learning rates and less careful initialization, accelerates convergence, and acts as a regularizer. Applied to image classification networks, it dramatically reduced training steps and improved accuracy.