Document worth reading: “Measure, Manifold, Learning, and Optimization: A Theory Of Neural Networks”

We present a correct measure-theoretical thought of neural networks (NN) constructed on chance coupling thought. Our most essential contributions are summarized as follows. * Built on the formalism of chance coupling thought, we derive an algorithm framework, named Hierarchical Measure Group and Approximate System (HMGAS), nicknamed S-System, that is designed to be taught the superior hierarchical, statistical dependency inside the bodily world. * We current that NNs are explicit cases of S-System when the chance kernels assume certain exponential family distributions. Activation Functions are derived formally. We extra endow geometry on NNs by the use of knowledge geometry, current that intermediate operate areas of NNs are stochastic manifolds, and present that ‘distance’ between samples is contracted as layers stack up. * S-System reveals NNs are inherently stochastic, and beneath a set of actual trying boundedness and selection circumstances, it permits us to indicate that for big dimension nonlinear deep NNs with a class of losses, along with the hinge loss, all native minima are worldwide minima with zero loss errors, and areas throughout the minima are flat basins the place all eigenvalues of Hessians are concentrated spherical zero, using devices and ideas from suggest topic thought, random matrix thought, and nonlinear operator equations. * S-System, the information-geometry development and the optimization behaviors combined completes the analog between Renormalization Group (RG) and NNs. It reveals {{that a}} NN is a elaborate adaptive system that estimates the statistic dependency of microscopic object, e.g., pixels, in various scales. Unlike clear-cut bodily quantity produced by RG in physics, e.g., temperature, NNs renormalize/recompose manifolds rising by the use of finding out/optimization that divide the sample space into extraordinarily semantically important groups which could be dictated by supervised labels (in supervised NNs). Measure, Manifold, Learning, and Optimization: A Theory Of Neural Networks