Document worth reading: “A Theoretical Connection Between Statistical Physics and Reinforcement Learning”

Sequential decision making inside the presence of uncertainty and stochastic dynamics gives rise to distributions over state/movement trajectories in reinforcement finding out (RL) and optimum administration points. This commentary has led to a variety of connections between RL and inference in probabilistic graphical fashions (PGMs). Here we uncover a particular dimension to this relationship, analyzing reinforcement finding out using the devices and abstractions of statistical physics. The central object inside the statistical physics abstraction is the idea of a partition carry out $mathcal{Z}$, and proper right here we assemble a partition carry out from the ensemble of doable trajectories that an agent might take in a Markov decision course of. Although worth capabilities and $Q$-functions is perhaps derived from this partition carry out and interpreted by frequent energies, the $mathcal{Z}$-function provides an object with its private Bellman equation that will sort the premise of different dynamic programming approaches. Moreover, when the MDP dynamics are deterministic, the Bellman equation for $mathcal{Z}$ is linear, allowing direct choices which may be unavailable for the nonlinear equations associated to traditional worth capabilities. The insurance coverage insurance policies found by these $mathcal{Z}$-based Bellman updates are tightly linked to Boltzmann-like protection parameterizations. In addition to sampling actions proportionally to the exponential of the anticipated cumulative reward as Boltzmann insurance coverage insurance policies would, these insurance coverage insurance policies take entropy into account favoring states from which many outcomes are doable. A Theoretical Connection Between Statistical Physics and Reinforcement Learning