Document worth reading: “An Information-Theoretic View for Deep Learning”

Deep learning has reworked the laptop imaginative and prescient, pure language processing and speech recognition. However, the subsequent two important questions are remaining obscure: (1) why deep neural networks generalize increased than shallow networks (2) Does it on a regular basis preserve {{that a}} deeper group leads to increased effectivity Specifically, letting $L$ be the number of convolutional and pooling layers in a deep neural group, and $n$ be the scale of the teaching sample, we derive the upper sure on the anticipated generalization error for this group, i.e., begin{eqnarray*} mathbb{E}[R(W)-R_S(W)] leq exp{left(-frac{L}{2}log{frac{1}{eta}}correct)}sqrt{frac{2sigma^2}{n}I(S,W) } end{eqnarray*} the place $sigma >0$ is a seamless counting on the loss carry out, $0 An Information-Theoretic View for Deep Learning