Gradient descent learning in and out of equilibrium
read the original abstract
Relations between the off thermal equilibrium dynamical process of on-line learning and the thermally equilibrated off-line learning are studied for potential gradient descent learning. The approach of Opper to study on-line Bayesian algorithms is extended to potential based or maximum likelihood learning. We look at the on-line learning algorithm that best approximates the off-line algorithm in the sense of least Kullback-Leibler information loss. It works by updating the weights along the gradient of an effective potential different from the parent off-line potential. The interpretation of this off equilibrium dynamics holds some similarities to the cavity approach of Griniasty. We are able to analyze networks with non-smooth transfer functions and transfer the smoothness requirement to the potential.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.