Document worth reading: “Understanding Regularization in Batch Normalization”

Batch Normalization (BN) makes output of hidden neuron had zero indicate and unit variance, enhancing convergence and generalization when teaching neural networks. This work understands these phenomena theoretically. We analyze BN by means of using a setting up block of neural networks, which consists of a weight layer, a BN layer, and a nonlinear activation function. This simple group helps us understand the traits of BN, the place the outcomes are generalized to deep fashions in numerical analysis. We uncover BN in three factors. First, by viewing BN as a stochastic course of, an analytical kind of regularization inherited in BN is derived. Second, the optimization dynamic with this regularization reveals that BN permits teaching converged with huge most and environment friendly learning prices. Third, BN’s generalization with regularization is explored by means of using random matrix precept and statistical mechanics. Both simulations and experiments help our analyses. Understanding Regularization in Batch Normalization