A Closer Look at Double Backpropagation
pith:7QGH23A4 Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{7QGH23A4}
Prints a linked pith:7QGH23A4 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
read the original abstract
In recent years, an increasing number of neural network models have included derivatives with respect to inputs in their loss functions, resulting in so-called double backpropagation for first-order optimization. However, so far no general description of the involved derivatives exists. Here, we cover a wide array of special cases in a very general Hilbert space framework, which allows us to provide optimized backpropagation rules for many real-world scenarios. This includes the reduction of calculations for Frobenius-norm-penalties on Jacobians by roughly a third for locally linear activation functions. Furthermore, we provide a description of the discontinuous loss surface of ReLU networks both in the inputs and the parameters and demonstrate why the discontinuities do not pose a big problem in reality.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Layer-wise Derivative Controlled Networks
ChainzRule with DREG regularization claims 15.5x fewer parameters than standard models, 23.1% lower peak gradient volatility on MNIST, and 70.17% accuracy on Yelp Full ordinal regression.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.