LipKernel: Lipschitz-Bounded Convolutional Neural Networks via Dissipative Layers

Frank Allg\"ower; Ian Manchester; Patricia Pauli; Ruigang Wang

arxiv: 2410.22258 · v2 · submitted 2024-10-29 · 💻 cs.LG · cs.SY· eess.IV· eess.SY· stat.ML

LipKernel: Lipschitz-Bounded Convolutional Neural Networks via Dissipative Layers

Patricia Pauli , Ruigang Wang , Ian Manchester , Frank Allg\"ower This is my paper

Pith reviewed 2026-05-23 18:34 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.IVeess.SYstat.ML

keywords Lipschitz boundednessdissipative systemsconvolutional neural networkslinear matrix inequalitiesRoesser modelrobustnessparameterization

0 comments

The pith

Each convolutional layer satisfies a linear matrix inequality implying dissipativity, so the full network obeys a prescribed Lipschitz bound.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a parameterization for CNN layers that builds a Lipschitz bound into the architecture by construction. Each layer is required to satisfy an LMI that makes it dissipative with respect to a chosen supply rate. These per-layer conditions compose across the network to enforce the global bound. The parameterization uses a two-dimensional Roesser state-space model so that the kernels remain ordinary convolutions after training and support standard operations such as pooling, striding, and dilation.

Core claim

The central claim is that a layer-wise parameterization of convolutional kernels via a 2-D Roesser-type state-space model allows each layer to satisfy an LMI enforcing dissipativity with respect to a specific supply rate. The composition of such layers then guarantees that the input-output mapping of the entire network has a Lipschitz constant no larger than a user-specified value, while the trained layers can be evaluated in standard convolutional form without added cost.

What carries the argument

The 2-D Roesser-type state-space model that directly parameterizes dissipative convolution kernels so each layer satisfies its LMI.

Load-bearing premise

That a layer satisfying its individual LMI is dissipative with respect to the chosen supply rate, and that these local dissipativity properties compose to bound the global Lipschitz constant.

What would settle it

A concrete network in which every layer meets its LMI yet the measured end-to-end Lipschitz constant exceeds the prescribed value.

Figures

Figures reproduced from arXiv: 2410.22258 by Frank Allg\"ower, Ian Manchester, Patricia Pauli, Ruigang Wang.

**Figure 1.** Figure 1: For F2 ◦ σ ◦ F1 with c0 = c1 = c2 = 2, we compare overapproximations for reachability sets shown in blue, we obtain ellipsoidal sets using incrementally dissipative layers (top) and circles using Lipschitz bounds (bottom). In [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Fit of a cosine function using NN from LMI-based parameterization [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Both parameterizations Sandwich and LipKernel use the Cayley transform and require the computation of inverses [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 3.** Figure 3: Differences between convolutional layers using LipKernel (ours) and [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Inference times for LipKernel, Sandwich, and Orthogon layers with [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Robustness accuracy trade-off for 2C2F (left) 2CP2F (right) for NNs [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

read the original abstract

We propose a novel layer-wise parameterization for convolutional neural networks (CNNs) that includes built-in robustness guarantees by enforcing a prescribed Lipschitz bound. Each layer in our parameterization is designed to satisfy a linear matrix inequality (LMI), which in turn implies dissipativity with respect to a specific supply rate. Collectively, these layer-wise LMIs ensure Lipschitz boundedness for the input-output mapping of the neural network, yielding a more expressive parameterization than through spectral bounds or orthogonal layers. Our new method LipKernel directly parameterizes dissipative convolution kernels using a 2-D Roesser-type state space model. This means that the convolutional layers are given in standard form after training and can be evaluated without computational overhead. In numerical experiments, we show that the run-time using our method is orders of magnitude faster than state-of-the-art Lipschitz-bounded networks that parameterize convolutions in the Fourier domain, making our approach particularly attractive for improving the robustness of learning-based real-time perception or control in robotics, autonomous vehicles, or automation systems. We focus on CNNs, and in contrast to previous works, our approach accommodates a wide variety of layers typically used in CNNs, including 1-D and 2-D convolutional layers, maximum and average pooling layers, as well as strided and dilated convolutions and zero padding. However, our approach naturally extends beyond CNNs as we can incorporate any layer that is incrementally dissipative.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LipKernel gives a Roesser-model parameterization for dissipative conv layers that stays in standard form at inference, but the abstract does not show how per-layer supply rates chain to a guaranteed global Lipschitz bound.

read the letter

The new piece is the direct use of 2-D Roesser state-space models to parameterize the convolution kernels so each layer satisfies an LMI for dissipativity. Once trained, the layers revert to ordinary convolution, which removes the test-time cost of Fourier methods. The approach also claims to handle pooling, striding, dilation, and padding without special treatment. That combination is not in the cited prior work and could matter for real-time robustness applications. The experiments reportedly show large speed gains over existing Lipschitz CNNs, which is the practical payoff. The soft spot is the composition step. The abstract states that the layer-wise LMIs “collectively ensure” the network Lipschitz bound, yet gives no supply-rate choice, no explicit LMI, and no chaining argument for heterogeneous layers. If the output supply term of one layer does not cancel with the input term of the next, the telescoping fails and the global bound is not automatic. Without those details it is hard to judge whether the parameterization is actually more expressive or just shifts the tuning burden. This is aimed at people building certified-robust CNNs for control or perception. It deserves a serious referee to check the missing equations, the supply-rate propagation, and whether the numerical results actually confirm the bound holds without post-training adjustment.

Referee Report

3 major / 1 minor

Summary. The paper proposes LipKernel, a parameterization of CNN layers (including 1D/2D convolutions, pooling, strided/dilated variants, and padding) via 2-D Roesser-type state-space models. Each layer is constrained to satisfy an LMI that implies dissipativity w.r.t. a chosen supply rate; the authors claim that the collection of such layer-wise conditions guarantees a global Lipschitz bound on the network input-output map, while remaining more expressive than spectral-norm or orthogonal constraints and incurring no inference overhead.

Significance. If the composition argument holds and the parameterization is shown to be strictly more expressive, the method would supply a practical route to training provably Lipschitz-bounded CNNs that natively support the layer types used in modern vision pipelines, with runtime advantages over Fourier-domain approaches.

major comments (3)

[Abstract] Abstract (and the paragraph beginning 'Collectively, these layer-wise LMIs ensure...'): the claim that layer-wise LMIs 'collectively ensure Lipschitz boundedness' requires an explicit statement of the supply rate, the precise LMI, and the telescoping argument that shows the output supply term of layer k cancels with the input supply term of layer k+1 for heterogeneous layers (convolution, max-pool, strided, dilated). No such chaining rule or compatibility condition is supplied in the abstract or indicated in the provided text.
[Abstract] The abstract asserts that the parameterization 'accommodates' max/average pooling, strided and dilated convolutions, and zero padding, yet supplies no indication of how the supply-rate matrices or Roesser parameters are chosen or propagated for these non-standard layers so that the global supply rate remains of the form γ²‖u‖² − ‖y‖².
[Abstract] The abstract states that the method yields 'a more expressive parameterization than through spectral bounds or orthogonal layers,' but the manuscript excerpt contains neither a formal comparison of the feasible sets nor any numerical verification that the prescribed Lipschitz bound is attained without post-hoc scaling or tuning.

minor comments (1)

[Abstract] The abstract refers to 'a specific supply rate' without defining it or its relation to the Lipschitz constant γ; this notation should be introduced at first use.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address each point below and will revise the abstract to improve clarity while preserving the technical content already present in the full manuscript.

read point-by-point responses

Referee: [Abstract] Abstract (and the paragraph beginning 'Collectively, these layer-wise LMIs ensure...'): the claim that layer-wise LMIs 'collectively ensure Lipschitz boundedness' requires an explicit statement of the supply rate, the precise LMI, and the telescoping argument that shows the output supply term of layer k cancels with the input supply term of layer k+1 for heterogeneous layers (convolution, max-pool, strided, dilated). No such chaining rule or compatibility condition is supplied in the abstract or indicated in the provided text.

Authors: The full manuscript states the supply rate explicitly as γ²‖u‖² − ‖y‖² and gives the corresponding LMI for dissipativity in Section 3. The telescoping composition argument for heterogeneous layers appears in the proof of Theorem 1 (Section 3.3), which relies on the uniform supply rate across layer types to ensure cancellation. We will revise the abstract to name the supply rate and cite the theorem for the chaining argument. revision: yes
Referee: [Abstract] The abstract asserts that the parameterization 'accommodates' max/average pooling, strided and dilated convolutions, and zero padding, yet supplies no indication of how the supply-rate matrices or Roesser parameters are chosen or propagated for these non-standard layers so that the global supply rate remains of the form γ²‖u‖² − ‖y‖².

Authors: Section 4.2 derives the Roesser parameters and supply-rate matrices for max/average pooling, strided/dilated convolutions, and zero padding so that each remains dissipative w.r.t. the same supply rate γ²‖u‖² − ‖y‖². This preserves the global form. We will update the abstract to indicate that these layers are parameterized to maintain the required supply-rate compatibility. revision: yes
Referee: [Abstract] The abstract states that the method yields 'a more expressive parameterization than through spectral bounds or orthogonal layers,' but the manuscript excerpt contains neither a formal comparison of the feasible sets nor any numerical verification that the prescribed Lipschitz bound is attained without post-hoc scaling or tuning.

Authors: Section 5 reports numerical experiments in which the prescribed Lipschitz bounds are attained directly by the trained networks without post-hoc scaling. We agree, however, that an explicit comparison of feasible sets versus spectral-norm and orthogonal parameterizations is not provided and would strengthen the expressiveness claim. We will add this comparison in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation rests on external dissipativity theory

full rationale

The paper's central claim is that layer-wise LMIs imply per-layer dissipativity w.r.t. chosen supply rates, and that the composition of such layers yields a global Lipschitz bound. This composition step follows from standard results in dissipativity theory for interconnected systems (supply-rate telescoping under compatible quadratic forms), which are not derived or fitted inside the paper. The new parameterization (2-D Roesser model for kernels) is independent of the bound; the LMIs are feasibility constraints solved during training rather than tautological redefinitions of the target Lipschitz constant. No load-bearing step reduces by construction to a fitted quantity or self-citation chain defined within the manuscript. The approach is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Ledger is provisional because only the abstract is available; full derivations may introduce additional fitted quantities or domain assumptions.

free parameters (2)

Lipschitz bound
User-specified global bound that the network must satisfy; appears as a design parameter in the abstract.
Supply rate
Specific supply rate chosen so that LMI satisfaction implies the desired dissipativity; not numerically fixed in the abstract.

axioms (2)

domain assumption Satisfaction of the layer LMI implies dissipativity w.r.t. the chosen supply rate
Stated directly in the abstract as the link between LMI and dissipativity.
domain assumption Composition of layer-wise dissipative maps yields a global Lipschitz bound
Invoked when the abstract claims collective LMIs ensure network Lipschitz boundedness.

pith-pipeline@v0.9.0 · 5802 in / 1423 out tokens · 35385 ms · 2026-05-23T18:34:02.950952+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

[1]

Deep learning,

Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015

work page 2015
[2]

Neural networks and their applications,

C. M. Bishop, “Neural networks and their applications,” Review of scientific instruments, vol. 65, no. 6, pp. 1803–1832, 1994

work page 1994
[3]

A survey of convolutional neural networks: analysis, applications, and prospects,

Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolutional neural networks: analysis, applications, and prospects,” IEEE Transac- tions on Neural Networks and Learning Systems , vol. 33, no. 12, pp. 6999–7019, 2021

work page 2021
[4]

Intriguing properties of neural networks

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Good- fellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv:1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[5]

Lipschitz regularity of deep neural networks: analysis and efficient estimation,

A. Virmaux and K. Scaman, “Lipschitz regularity of deep neural networks: analysis and efficient estimation,” Advances in Neural Infor- mation Processing Systems , vol. 31, 2018

work page 2018
[6]

Lipschitz certificates for layered network structures driven by averaged activation operators,

P. L. Combettes and J.-C. Pesquet, “Lipschitz certificates for layered network structures driven by averaged activation operators,” SIAM Journal on Mathematics of Data Science , vol. 2, no. 2, pp. 529–557, 2020

work page 2020
[7]

Efficient and accurate estimation of Lipschitz constants for deep neural networks,

M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas, “Efficient and accurate estimation of Lipschitz constants for deep neural networks,” Advances in Neural Information Processing Systems , vol. 32, 2019

work page 2019
[8]

Lipschitz constant estimation of neural networks via sparse polynomial optimization,

F. Latorre, P. Rolland, and V . Cevher, “Lipschitz constant estimation of neural networks via sparse polynomial optimization,” in International Conference on Learning Representations , 2020

work page 2020
[9]

Exactly computing the local Lipschitz constant of ReLU networks,

M. Jordan and A. G. Dimakis, “Exactly computing the local Lipschitz constant of ReLU networks,” in Advances in Neural Information Pro- cessing Systems, 2020, pp. 7344–7353

work page 2020
[10]

A convex parameterization of robust recurrent neural networks,

M. Revay, R. Wang, and I. R. Manchester, “A convex parameterization of robust recurrent neural networks,” IEEE Control Systems Letters, vol. 5, no. 4, pp. 1363–1368, 2020

work page 2020
[11]

Lipschitz constant estimation for 1d convolutional neural networks,

P. Pauli, D. Gramlich, and F. Allg ¨ower, “Lipschitz constant estimation for 1d convolutional neural networks,” in Learning for Dynamics and Control Conference. PMLR, 2023, pp. 1321–1332

work page 2023
[12]

Lipschitz constant estimation for general neural network archi- tectures using control tools,

——, “Lipschitz constant estimation for general neural network archi- tectures using control tools,” arXiv:2405.01125, 2024

work page arXiv 2024
[13]

Sorting out Lipschitz function approximation,

C. Anil, J. Lucas, and R. Grosse, “Sorting out Lipschitz function approximation,” in International Conference on Machine Learning . PMLR, 2019, pp. 291–301

work page 2019
[14]

Almost-orthogonal layers for efficient general-purpose Lipschitz networks,

B. Prach and C. H. Lampert, “Almost-orthogonal layers for efficient general-purpose Lipschitz networks,” in Computer Vision–ECCV 2022: 17th European Conference, 2022

work page 2022
[15]

Training robust neural networks using Lipschitz bounds,

P. Pauli, A. Koch, J. Berberich, P. Kohler, and F. Allg ¨ower, “Training robust neural networks using Lipschitz bounds,” IEEE Control Systems Letters, vol. 6, pp. 121–126, 2021

work page 2021
[16]

Neu- ral network training under semidefinite constraints,

P. Pauli, N. Funcke, D. Gramlich, M. A. Msalmi, and F. Allg ¨ower, “Neu- ral network training under semidefinite constraints,” in 61st Conference on Decision and Control . IEEE, 2022, pp. 2731–2736

work page 2022
[17]

Regularisation of neural networks by enforcing Lipschitz continuity,

H. Gouk, E. Frank, B. Pfahringer, and M. J. Cree, “Regularisation of neural networks by enforcing Lipschitz continuity,” Machine Learning, vol. 110, pp. 393–416, 2021

work page 2021
[18]

Lipschitz bounded equilib- rium networks,

M. Revay, R. Wang, and I. R. Manchester, “Lipschitz bounded equilib- rium networks,” arXiv:2010.01732, 2020

work page arXiv 2010
[19]

Recurrent equilibrium networks: Flexible dynamic models with guaranteed stability and robustness,

——, “Recurrent equilibrium networks: Flexible dynamic models with guaranteed stability and robustness,” IEEE Transactions on Automatic Control, 2023

work page 2023
[20]

Direct parameterization of Lipschitz- bounded deep networks,

R. Wang and I. Manchester, “Direct parameterization of Lipschitz- bounded deep networks,” in International Conference on Machine Learning. PMLR, 2023, pp. 36 093–36 110

work page 2023
[21]

Lipschitz- bounded 1D convolutional neural networks using the Cayley transform and the controllability Gramian,

P. Pauli, R. Wang, I. R. Manchester, and F. Allg ¨ower, “Lipschitz- bounded 1D convolutional neural networks using the Cayley transform and the controllability Gramian,” in 62nd Conference on Decision and Control. IEEE, 2023, pp. 5345–5350

work page 2023
[22]

Orthogonalizing convolutional layers with the cayley transform,

A. Trockman and J. Z. Kolter, “Orthogonalizing convolutional layers with the cayley transform,” in International Conference on Learning Representations, 2021. 13 TABLE I EMPIRICAL LOWER LIPSCHITZ BOUNDS , CLEAN ACCURACY , CERTIFIED ROBUST ACCURACY AND ADVERSARIAL ROBUSTNESS UNDER ℓ2 PGD ATTACK FOR VANILLA , AOL, O RTHOGON , SANDWICH , AND LIPKERNEL NNS US...

work page 2021
[23]

A discrete state-space model for linear image processing,

R. Roesser, “A discrete state-space model for linear image processing,” IEEE Transactions on Automatic Control , vol. 20, no. 1, 1975

work page 1975
[24]

Convolutional neural networks as 2-d systems,

D. Gramlich, P. Pauli, C. W. Scherer, F. Allg ¨ower, and C. Ebenbauer, “Convolutional neural networks as 2-d systems,” arXiv:2303.03042, 2023

work page arXiv 2023
[25]

State space representations of the Roesser type for convolutional layers,

P. Pauli, D. Gramlich, and F. Allg ¨ower, “State space representations of the Roesser type for convolutional layers,” arXiv:2403.11938, 2024

work page arXiv 2024
[26]

Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100,

S. Singla, S. Singla, and S. Feizi, “Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100,” inInternational Conference on Learning Representations, 2022

work page 2022
[27]

Residual flows for invertible generative modeling,

R. T. Chen, J. Behrmann, D. K. Duvenaud, and J.-H. Jacobsen, “Residual flows for invertible generative modeling,” Advances in Neural Informa- tion Processing Systems , vol. 32, 2019

work page 2019
[28]

Invertible residual networks,

J. Behrmann, W. Grathwohl, R. T. Chen, D. Duvenaud, and J.-H. Jacobsen, “Invertible residual networks,” in International conference on machine learning. PMLR, 2019, pp. 573–582

work page 2019
[29]

Invertible densenets with concatenated lipswish,

Y . Perugachi-Diaz, J. Tomczak, and S. Bhulai, “Invertible densenets with concatenated lipswish,”Advances in Neural Information Processing Systems, vol. 34, pp. 17 246–17 257, 2021

work page 2021
[30]

Monotone, bi-Lipschitz, and Polyak- Łojasiewicz networks,

R. Wang, K. Dvijotham, and I. R. Manchester, “Monotone, bi-Lipschitz, and Polyak- Łojasiewicz networks,” in International Conference on Machine Learning. PMLR, 2024

work page 2024
[31]

Losslessness, feedback equivalence, and the global stabilization of discrete-time nonlinear systems,

C. I. Byrnes and W. Lin, “Losslessness, feedback equivalence, and the global stabilization of discrete-time nonlinear systems,” IEEE Transac- tions on Automatic Control , vol. 39, no. 1, pp. 83–98, 1994

work page 1994
[32]

Goodfellow, Y

I. Goodfellow, Y . Bengio, and A. Courville, Deep Learning. MIT Press, 2016

work page 2016
[33]

Robustness against adversarial attacks in neural networks using incremental dissi- pativity,

B. Aquino, A. Rahnama, P. Seiler, L. Lin, and V . Gupta, “Robustness against adversarial attacks in neural networks using incremental dissi- pativity,” IEEE Control Systems Letters , vol. 6, pp. 2341–2346, 2022

work page 2022
[34]

Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming,

M. Fazlyab, M. Morari, and G. J. Pappas, “Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming,” IEEE Transactions on Automatic Control , 2020

work page 2020
[35]

Incremental positivity nonpreserva- tion by stability multipliers,

V . V . Kulkarni and M. G. Safonov, “Incremental positivity nonpreserva- tion by stability multipliers,” IEEE Transactions on Automatic Control , vol. 47, no. 1, pp. 173–177, 2002

work page 2002
[36]

On the relation between stability of continuous-and discrete-time evolution equations via the cayley trans- form,

B.-Z. Guo and H. Zwart, “On the relation between stability of continuous-and discrete-time evolution equations via the cayley trans- form,” Integral Equations and Operator Theory , vol. 54, pp. 349–383, 2006

work page 2006
[37]

Orthogonal recurrent neural networks with scaled Cayley transform,

K. Helfrich, D. Willmott, and Q. Ye, “Orthogonal recurrent neural networks with scaled Cayley transform,” in International Conference on Machine Learning . PMLR, 2018, pp. 1969–1978

work page 2018
[38]

The representation and parametrization of orthogonal matrices,

R. Shepard, S. R. Brozell, and G. Gidofalvi, “The representation and parametrization of orthogonal matrices,” The Journal of Physical Chemistry A, vol. 119, no. 28, pp. 7924–7939, 2015

work page 2015
[39]

Stability and the matrix lyapunov equation for discrete 2-dimensional systems,

B. Anderson, P. Agathoklis, E. Jury, and M. Mansour, “Stability and the matrix lyapunov equation for discrete 2-dimensional systems,” IEEE Transactions on Circuits and Systems, vol. 33, no. 3, pp. 261–267, 1986

work page 1986
[40]

A unified algebraic perspective on Lipschitz neural networks,

A. Araujo, A. J. Havens, B. Delattre, A. Allauzen, and B. Hu, “A unified algebraic perspective on Lipschitz neural networks,” in International Conference on Learning Representations , 2023

work page 2023
[41]

Deep learning on mobile and embedded devices: State-of-the-art, challenges, and future directions,

Y . Chen, B. Zheng, Z. Zhang, Q. Wang, C. Shen, and Q. Zhang, “Deep learning on mobile and embedded devices: State-of-the-art, challenges, and future directions,” ACM Computing Surveys (CSUR), vol. 53, no. 4, pp. 1–37, 2020

work page 2020
[42]

MNIST handwritten digit database,

Y . LeCun and C. Cortes, “MNIST handwritten digit database,” 2010

work page 2010
[43]

Towards Deep Learning Models Resistant to Adversarial Attacks

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,”arXiv:1706.06083, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

Deep learning,

Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015

work page 2015

[2] [2]

Neural networks and their applications,

C. M. Bishop, “Neural networks and their applications,” Review of scientific instruments, vol. 65, no. 6, pp. 1803–1832, 1994

work page 1994

[3] [3]

A survey of convolutional neural networks: analysis, applications, and prospects,

Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolutional neural networks: analysis, applications, and prospects,” IEEE Transac- tions on Neural Networks and Learning Systems , vol. 33, no. 12, pp. 6999–7019, 2021

work page 2021

[4] [4]

Intriguing properties of neural networks

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Good- fellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv:1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[5] [5]

Lipschitz regularity of deep neural networks: analysis and efficient estimation,

A. Virmaux and K. Scaman, “Lipschitz regularity of deep neural networks: analysis and efficient estimation,” Advances in Neural Infor- mation Processing Systems , vol. 31, 2018

work page 2018

[6] [6]

Lipschitz certificates for layered network structures driven by averaged activation operators,

P. L. Combettes and J.-C. Pesquet, “Lipschitz certificates for layered network structures driven by averaged activation operators,” SIAM Journal on Mathematics of Data Science , vol. 2, no. 2, pp. 529–557, 2020

work page 2020

[7] [7]

Efficient and accurate estimation of Lipschitz constants for deep neural networks,

M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas, “Efficient and accurate estimation of Lipschitz constants for deep neural networks,” Advances in Neural Information Processing Systems , vol. 32, 2019

work page 2019

[8] [8]

Lipschitz constant estimation of neural networks via sparse polynomial optimization,

F. Latorre, P. Rolland, and V . Cevher, “Lipschitz constant estimation of neural networks via sparse polynomial optimization,” in International Conference on Learning Representations , 2020

work page 2020

[9] [9]

Exactly computing the local Lipschitz constant of ReLU networks,

M. Jordan and A. G. Dimakis, “Exactly computing the local Lipschitz constant of ReLU networks,” in Advances in Neural Information Pro- cessing Systems, 2020, pp. 7344–7353

work page 2020

[10] [10]

A convex parameterization of robust recurrent neural networks,

M. Revay, R. Wang, and I. R. Manchester, “A convex parameterization of robust recurrent neural networks,” IEEE Control Systems Letters, vol. 5, no. 4, pp. 1363–1368, 2020

work page 2020

[11] [11]

Lipschitz constant estimation for 1d convolutional neural networks,

P. Pauli, D. Gramlich, and F. Allg ¨ower, “Lipschitz constant estimation for 1d convolutional neural networks,” in Learning for Dynamics and Control Conference. PMLR, 2023, pp. 1321–1332

work page 2023

[12] [12]

Lipschitz constant estimation for general neural network archi- tectures using control tools,

——, “Lipschitz constant estimation for general neural network archi- tectures using control tools,” arXiv:2405.01125, 2024

work page arXiv 2024

[13] [13]

Sorting out Lipschitz function approximation,

C. Anil, J. Lucas, and R. Grosse, “Sorting out Lipschitz function approximation,” in International Conference on Machine Learning . PMLR, 2019, pp. 291–301

work page 2019

[14] [14]

Almost-orthogonal layers for efficient general-purpose Lipschitz networks,

B. Prach and C. H. Lampert, “Almost-orthogonal layers for efficient general-purpose Lipschitz networks,” in Computer Vision–ECCV 2022: 17th European Conference, 2022

work page 2022

[15] [15]

Training robust neural networks using Lipschitz bounds,

P. Pauli, A. Koch, J. Berberich, P. Kohler, and F. Allg ¨ower, “Training robust neural networks using Lipschitz bounds,” IEEE Control Systems Letters, vol. 6, pp. 121–126, 2021

work page 2021

[16] [16]

Neu- ral network training under semidefinite constraints,

P. Pauli, N. Funcke, D. Gramlich, M. A. Msalmi, and F. Allg ¨ower, “Neu- ral network training under semidefinite constraints,” in 61st Conference on Decision and Control . IEEE, 2022, pp. 2731–2736

work page 2022

[17] [17]

Regularisation of neural networks by enforcing Lipschitz continuity,

H. Gouk, E. Frank, B. Pfahringer, and M. J. Cree, “Regularisation of neural networks by enforcing Lipschitz continuity,” Machine Learning, vol. 110, pp. 393–416, 2021

work page 2021

[18] [18]

Lipschitz bounded equilib- rium networks,

M. Revay, R. Wang, and I. R. Manchester, “Lipschitz bounded equilib- rium networks,” arXiv:2010.01732, 2020

work page arXiv 2010

[19] [19]

Recurrent equilibrium networks: Flexible dynamic models with guaranteed stability and robustness,

——, “Recurrent equilibrium networks: Flexible dynamic models with guaranteed stability and robustness,” IEEE Transactions on Automatic Control, 2023

work page 2023

[20] [20]

Direct parameterization of Lipschitz- bounded deep networks,

R. Wang and I. Manchester, “Direct parameterization of Lipschitz- bounded deep networks,” in International Conference on Machine Learning. PMLR, 2023, pp. 36 093–36 110

work page 2023

[21] [21]

Lipschitz- bounded 1D convolutional neural networks using the Cayley transform and the controllability Gramian,

P. Pauli, R. Wang, I. R. Manchester, and F. Allg ¨ower, “Lipschitz- bounded 1D convolutional neural networks using the Cayley transform and the controllability Gramian,” in 62nd Conference on Decision and Control. IEEE, 2023, pp. 5345–5350

work page 2023

[22] [22]

Orthogonalizing convolutional layers with the cayley transform,

A. Trockman and J. Z. Kolter, “Orthogonalizing convolutional layers with the cayley transform,” in International Conference on Learning Representations, 2021. 13 TABLE I EMPIRICAL LOWER LIPSCHITZ BOUNDS , CLEAN ACCURACY , CERTIFIED ROBUST ACCURACY AND ADVERSARIAL ROBUSTNESS UNDER ℓ2 PGD ATTACK FOR VANILLA , AOL, O RTHOGON , SANDWICH , AND LIPKERNEL NNS US...

work page 2021

[23] [23]

A discrete state-space model for linear image processing,

R. Roesser, “A discrete state-space model for linear image processing,” IEEE Transactions on Automatic Control , vol. 20, no. 1, 1975

work page 1975

[24] [24]

Convolutional neural networks as 2-d systems,

D. Gramlich, P. Pauli, C. W. Scherer, F. Allg ¨ower, and C. Ebenbauer, “Convolutional neural networks as 2-d systems,” arXiv:2303.03042, 2023

work page arXiv 2023

[25] [25]

State space representations of the Roesser type for convolutional layers,

P. Pauli, D. Gramlich, and F. Allg ¨ower, “State space representations of the Roesser type for convolutional layers,” arXiv:2403.11938, 2024

work page arXiv 2024

[26] [26]

Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100,

S. Singla, S. Singla, and S. Feizi, “Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100,” inInternational Conference on Learning Representations, 2022

work page 2022

[27] [27]

Residual flows for invertible generative modeling,

R. T. Chen, J. Behrmann, D. K. Duvenaud, and J.-H. Jacobsen, “Residual flows for invertible generative modeling,” Advances in Neural Informa- tion Processing Systems , vol. 32, 2019

work page 2019

[28] [28]

Invertible residual networks,

J. Behrmann, W. Grathwohl, R. T. Chen, D. Duvenaud, and J.-H. Jacobsen, “Invertible residual networks,” in International conference on machine learning. PMLR, 2019, pp. 573–582

work page 2019

[29] [29]

Invertible densenets with concatenated lipswish,

Y . Perugachi-Diaz, J. Tomczak, and S. Bhulai, “Invertible densenets with concatenated lipswish,”Advances in Neural Information Processing Systems, vol. 34, pp. 17 246–17 257, 2021

work page 2021

[30] [30]

Monotone, bi-Lipschitz, and Polyak- Łojasiewicz networks,

R. Wang, K. Dvijotham, and I. R. Manchester, “Monotone, bi-Lipschitz, and Polyak- Łojasiewicz networks,” in International Conference on Machine Learning. PMLR, 2024

work page 2024

[31] [31]

Losslessness, feedback equivalence, and the global stabilization of discrete-time nonlinear systems,

C. I. Byrnes and W. Lin, “Losslessness, feedback equivalence, and the global stabilization of discrete-time nonlinear systems,” IEEE Transac- tions on Automatic Control , vol. 39, no. 1, pp. 83–98, 1994

work page 1994

[32] [32]

Goodfellow, Y

I. Goodfellow, Y . Bengio, and A. Courville, Deep Learning. MIT Press, 2016

work page 2016

[33] [33]

Robustness against adversarial attacks in neural networks using incremental dissi- pativity,

B. Aquino, A. Rahnama, P. Seiler, L. Lin, and V . Gupta, “Robustness against adversarial attacks in neural networks using incremental dissi- pativity,” IEEE Control Systems Letters , vol. 6, pp. 2341–2346, 2022

work page 2022

[34] [34]

Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming,

M. Fazlyab, M. Morari, and G. J. Pappas, “Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming,” IEEE Transactions on Automatic Control , 2020

work page 2020

[35] [35]

Incremental positivity nonpreserva- tion by stability multipliers,

V . V . Kulkarni and M. G. Safonov, “Incremental positivity nonpreserva- tion by stability multipliers,” IEEE Transactions on Automatic Control , vol. 47, no. 1, pp. 173–177, 2002

work page 2002

[36] [36]

On the relation between stability of continuous-and discrete-time evolution equations via the cayley trans- form,

B.-Z. Guo and H. Zwart, “On the relation between stability of continuous-and discrete-time evolution equations via the cayley trans- form,” Integral Equations and Operator Theory , vol. 54, pp. 349–383, 2006

work page 2006

[37] [37]

Orthogonal recurrent neural networks with scaled Cayley transform,

K. Helfrich, D. Willmott, and Q. Ye, “Orthogonal recurrent neural networks with scaled Cayley transform,” in International Conference on Machine Learning . PMLR, 2018, pp. 1969–1978

work page 2018

[38] [38]

The representation and parametrization of orthogonal matrices,

R. Shepard, S. R. Brozell, and G. Gidofalvi, “The representation and parametrization of orthogonal matrices,” The Journal of Physical Chemistry A, vol. 119, no. 28, pp. 7924–7939, 2015

work page 2015

[39] [39]

Stability and the matrix lyapunov equation for discrete 2-dimensional systems,

B. Anderson, P. Agathoklis, E. Jury, and M. Mansour, “Stability and the matrix lyapunov equation for discrete 2-dimensional systems,” IEEE Transactions on Circuits and Systems, vol. 33, no. 3, pp. 261–267, 1986

work page 1986

[40] [40]

A unified algebraic perspective on Lipschitz neural networks,

A. Araujo, A. J. Havens, B. Delattre, A. Allauzen, and B. Hu, “A unified algebraic perspective on Lipschitz neural networks,” in International Conference on Learning Representations , 2023

work page 2023

[41] [41]

Deep learning on mobile and embedded devices: State-of-the-art, challenges, and future directions,

Y . Chen, B. Zheng, Z. Zhang, Q. Wang, C. Shen, and Q. Zhang, “Deep learning on mobile and embedded devices: State-of-the-art, challenges, and future directions,” ACM Computing Surveys (CSUR), vol. 53, no. 4, pp. 1–37, 2020

work page 2020

[42] [42]

MNIST handwritten digit database,

Y . LeCun and C. Cortes, “MNIST handwritten digit database,” 2010

work page 2010

[43] [43]

Towards Deep Learning Models Resistant to Adversarial Attacks

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,”arXiv:1706.06083, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017