A Closer Look at Double Backpropagation

· 2019 · cs.LG · arXiv 1906.06637

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

In recent years, an increasing number of neural network models have included derivatives with respect to inputs in their loss functions, resulting in so-called double backpropagation for first-order optimization. However, so far no general description of the involved derivatives exists. Here, we cover a wide array of special cases in a very general Hilbert space framework, which allows us to provide optimized backpropagation rules for many real-world scenarios. This includes the reduction of calculations for Frobenius-norm-penalties on Jacobians by roughly a third for locally linear activation functions. Furthermore, we provide a description of the discontinuous loss surface of ReLU networks both in the inputs and the parameters and demonstrate why the discontinuities do not pose a big problem in reality.

representative citing papers

Layer-wise Derivative Controlled Networks

cs.LG · 2026-05-14 · unverdicted · novelty 4.0

ChainzRule with DREG regularization claims 15.5x fewer parameters than standard models, 23.1% lower peak gradient volatility on MNIST, and 70.17% accuracy on Yelp Full ordinal regression.

citing papers explorer

Showing 1 of 1 citing paper.

Layer-wise Derivative Controlled Networks cs.LG · 2026-05-14 · unverdicted · none · ref 7 · internal anchor
ChainzRule with DREG regularization claims 15.5x fewer parameters than standard models, 23.1% lower peak gradient volatility on MNIST, and 70.17% accuracy on Yelp Full ordinal regression.

A Closer Look at Double Backpropagation

fields

years

verdicts

representative citing papers

citing papers explorer