pith. sign in

arxiv: 2410.22258 · v2 · submitted 2024-10-29 · 💻 cs.LG · cs.SY· eess.IV· eess.SY· stat.ML

LipKernel: Lipschitz-Bounded Convolutional Neural Networks via Dissipative Layers

Pith reviewed 2026-05-23 18:34 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.IVeess.SYstat.ML
keywords Lipschitz boundednessdissipative systemsconvolutional neural networkslinear matrix inequalitiesRoesser modelrobustnessparameterization
0
0 comments X

The pith

Each convolutional layer satisfies a linear matrix inequality implying dissipativity, so the full network obeys a prescribed Lipschitz bound.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a parameterization for CNN layers that builds a Lipschitz bound into the architecture by construction. Each layer is required to satisfy an LMI that makes it dissipative with respect to a chosen supply rate. These per-layer conditions compose across the network to enforce the global bound. The parameterization uses a two-dimensional Roesser state-space model so that the kernels remain ordinary convolutions after training and support standard operations such as pooling, striding, and dilation.

Core claim

The central claim is that a layer-wise parameterization of convolutional kernels via a 2-D Roesser-type state-space model allows each layer to satisfy an LMI enforcing dissipativity with respect to a specific supply rate. The composition of such layers then guarantees that the input-output mapping of the entire network has a Lipschitz constant no larger than a user-specified value, while the trained layers can be evaluated in standard convolutional form without added cost.

What carries the argument

The 2-D Roesser-type state-space model that directly parameterizes dissipative convolution kernels so each layer satisfies its LMI.

Load-bearing premise

That a layer satisfying its individual LMI is dissipative with respect to the chosen supply rate, and that these local dissipativity properties compose to bound the global Lipschitz constant.

What would settle it

A concrete network in which every layer meets its LMI yet the measured end-to-end Lipschitz constant exceeds the prescribed value.

Figures

Figures reproduced from arXiv: 2410.22258 by Frank Allg\"ower, Ian Manchester, Patricia Pauli, Ruigang Wang.

Figure 1
Figure 1. Figure 1: For F2 ◦ σ ◦ F1 with c0 = c1 = c2 = 2, we compare over￾approximations for reachability sets shown in blue, we obtain ellipsoidal sets using incrementally dissipative layers (top) and circles using Lipschitz bounds (bottom). In [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Fit of a cosine function using NN from LMI-based parameterization [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Both parameterizations Sandwich and LipKernel use the Cayley transform and require the computation of inverses [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 3
Figure 3. Figure 3: Differences between convolutional layers using LipKernel (ours) and [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Inference times for LipKernel, Sandwich, and Orthogon layers with [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Robustness accuracy trade-off for 2C2F (left) 2CP2F (right) for NNs [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
read the original abstract

We propose a novel layer-wise parameterization for convolutional neural networks (CNNs) that includes built-in robustness guarantees by enforcing a prescribed Lipschitz bound. Each layer in our parameterization is designed to satisfy a linear matrix inequality (LMI), which in turn implies dissipativity with respect to a specific supply rate. Collectively, these layer-wise LMIs ensure Lipschitz boundedness for the input-output mapping of the neural network, yielding a more expressive parameterization than through spectral bounds or orthogonal layers. Our new method LipKernel directly parameterizes dissipative convolution kernels using a 2-D Roesser-type state space model. This means that the convolutional layers are given in standard form after training and can be evaluated without computational overhead. In numerical experiments, we show that the run-time using our method is orders of magnitude faster than state-of-the-art Lipschitz-bounded networks that parameterize convolutions in the Fourier domain, making our approach particularly attractive for improving the robustness of learning-based real-time perception or control in robotics, autonomous vehicles, or automation systems. We focus on CNNs, and in contrast to previous works, our approach accommodates a wide variety of layers typically used in CNNs, including 1-D and 2-D convolutional layers, maximum and average pooling layers, as well as strided and dilated convolutions and zero padding. However, our approach naturally extends beyond CNNs as we can incorporate any layer that is incrementally dissipative.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper proposes LipKernel, a parameterization of CNN layers (including 1D/2D convolutions, pooling, strided/dilated variants, and padding) via 2-D Roesser-type state-space models. Each layer is constrained to satisfy an LMI that implies dissipativity w.r.t. a chosen supply rate; the authors claim that the collection of such layer-wise conditions guarantees a global Lipschitz bound on the network input-output map, while remaining more expressive than spectral-norm or orthogonal constraints and incurring no inference overhead.

Significance. If the composition argument holds and the parameterization is shown to be strictly more expressive, the method would supply a practical route to training provably Lipschitz-bounded CNNs that natively support the layer types used in modern vision pipelines, with runtime advantages over Fourier-domain approaches.

major comments (3)
  1. [Abstract] Abstract (and the paragraph beginning 'Collectively, these layer-wise LMIs ensure...'): the claim that layer-wise LMIs 'collectively ensure Lipschitz boundedness' requires an explicit statement of the supply rate, the precise LMI, and the telescoping argument that shows the output supply term of layer k cancels with the input supply term of layer k+1 for heterogeneous layers (convolution, max-pool, strided, dilated). No such chaining rule or compatibility condition is supplied in the abstract or indicated in the provided text.
  2. [Abstract] The abstract asserts that the parameterization 'accommodates' max/average pooling, strided and dilated convolutions, and zero padding, yet supplies no indication of how the supply-rate matrices or Roesser parameters are chosen or propagated for these non-standard layers so that the global supply rate remains of the form γ²‖u‖² − ‖y‖².
  3. [Abstract] The abstract states that the method yields 'a more expressive parameterization than through spectral bounds or orthogonal layers,' but the manuscript excerpt contains neither a formal comparison of the feasible sets nor any numerical verification that the prescribed Lipschitz bound is attained without post-hoc scaling or tuning.
minor comments (1)
  1. [Abstract] The abstract refers to 'a specific supply rate' without defining it or its relation to the Lipschitz constant γ; this notation should be introduced at first use.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address each point below and will revise the abstract to improve clarity while preserving the technical content already present in the full manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract (and the paragraph beginning 'Collectively, these layer-wise LMIs ensure...'): the claim that layer-wise LMIs 'collectively ensure Lipschitz boundedness' requires an explicit statement of the supply rate, the precise LMI, and the telescoping argument that shows the output supply term of layer k cancels with the input supply term of layer k+1 for heterogeneous layers (convolution, max-pool, strided, dilated). No such chaining rule or compatibility condition is supplied in the abstract or indicated in the provided text.

    Authors: The full manuscript states the supply rate explicitly as γ²‖u‖² − ‖y‖² and gives the corresponding LMI for dissipativity in Section 3. The telescoping composition argument for heterogeneous layers appears in the proof of Theorem 1 (Section 3.3), which relies on the uniform supply rate across layer types to ensure cancellation. We will revise the abstract to name the supply rate and cite the theorem for the chaining argument. revision: yes

  2. Referee: [Abstract] The abstract asserts that the parameterization 'accommodates' max/average pooling, strided and dilated convolutions, and zero padding, yet supplies no indication of how the supply-rate matrices or Roesser parameters are chosen or propagated for these non-standard layers so that the global supply rate remains of the form γ²‖u‖² − ‖y‖².

    Authors: Section 4.2 derives the Roesser parameters and supply-rate matrices for max/average pooling, strided/dilated convolutions, and zero padding so that each remains dissipative w.r.t. the same supply rate γ²‖u‖² − ‖y‖². This preserves the global form. We will update the abstract to indicate that these layers are parameterized to maintain the required supply-rate compatibility. revision: yes

  3. Referee: [Abstract] The abstract states that the method yields 'a more expressive parameterization than through spectral bounds or orthogonal layers,' but the manuscript excerpt contains neither a formal comparison of the feasible sets nor any numerical verification that the prescribed Lipschitz bound is attained without post-hoc scaling or tuning.

    Authors: Section 5 reports numerical experiments in which the prescribed Lipschitz bounds are attained directly by the trained networks without post-hoc scaling. We agree, however, that an explicit comparison of feasible sets versus spectral-norm and orthogonal parameterizations is not provided and would strengthen the expressiveness claim. We will add this comparison in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation rests on external dissipativity theory

full rationale

The paper's central claim is that layer-wise LMIs imply per-layer dissipativity w.r.t. chosen supply rates, and that the composition of such layers yields a global Lipschitz bound. This composition step follows from standard results in dissipativity theory for interconnected systems (supply-rate telescoping under compatible quadratic forms), which are not derived or fitted inside the paper. The new parameterization (2-D Roesser model for kernels) is independent of the bound; the LMIs are feasibility constraints solved during training rather than tautological redefinitions of the target Lipschitz constant. No load-bearing step reduces by construction to a fitted quantity or self-citation chain defined within the manuscript. The approach is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Ledger is provisional because only the abstract is available; full derivations may introduce additional fitted quantities or domain assumptions.

free parameters (2)
  • Lipschitz bound
    User-specified global bound that the network must satisfy; appears as a design parameter in the abstract.
  • Supply rate
    Specific supply rate chosen so that LMI satisfaction implies the desired dissipativity; not numerically fixed in the abstract.
axioms (2)
  • domain assumption Satisfaction of the layer LMI implies dissipativity w.r.t. the chosen supply rate
    Stated directly in the abstract as the link between LMI and dissipativity.
  • domain assumption Composition of layer-wise dissipative maps yields a global Lipschitz bound
    Invoked when the abstract claims collective LMIs ensure network Lipschitz boundedness.

pith-pipeline@v0.9.0 · 5802 in / 1423 out tokens · 35385 ms · 2026-05-23T18:34:02.950952+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

  1. [1]

    Deep learning,

    Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015

  2. [2]

    Neural networks and their applications,

    C. M. Bishop, “Neural networks and their applications,” Review of scientific instruments, vol. 65, no. 6, pp. 1803–1832, 1994

  3. [3]

    A survey of convolutional neural networks: analysis, applications, and prospects,

    Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolutional neural networks: analysis, applications, and prospects,” IEEE Transac- tions on Neural Networks and Learning Systems , vol. 33, no. 12, pp. 6999–7019, 2021

  4. [4]

    Intriguing properties of neural networks

    C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Good- fellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv:1312.6199, 2013

  5. [5]

    Lipschitz regularity of deep neural networks: analysis and efficient estimation,

    A. Virmaux and K. Scaman, “Lipschitz regularity of deep neural networks: analysis and efficient estimation,” Advances in Neural Infor- mation Processing Systems , vol. 31, 2018

  6. [6]

    Lipschitz certificates for layered network structures driven by averaged activation operators,

    P. L. Combettes and J.-C. Pesquet, “Lipschitz certificates for layered network structures driven by averaged activation operators,” SIAM Journal on Mathematics of Data Science , vol. 2, no. 2, pp. 529–557, 2020

  7. [7]

    Efficient and accurate estimation of Lipschitz constants for deep neural networks,

    M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas, “Efficient and accurate estimation of Lipschitz constants for deep neural networks,” Advances in Neural Information Processing Systems , vol. 32, 2019

  8. [8]

    Lipschitz constant estimation of neural networks via sparse polynomial optimization,

    F. Latorre, P. Rolland, and V . Cevher, “Lipschitz constant estimation of neural networks via sparse polynomial optimization,” in International Conference on Learning Representations , 2020

  9. [9]

    Exactly computing the local Lipschitz constant of ReLU networks,

    M. Jordan and A. G. Dimakis, “Exactly computing the local Lipschitz constant of ReLU networks,” in Advances in Neural Information Pro- cessing Systems, 2020, pp. 7344–7353

  10. [10]

    A convex parameterization of robust recurrent neural networks,

    M. Revay, R. Wang, and I. R. Manchester, “A convex parameterization of robust recurrent neural networks,” IEEE Control Systems Letters, vol. 5, no. 4, pp. 1363–1368, 2020

  11. [11]

    Lipschitz constant estimation for 1d convolutional neural networks,

    P. Pauli, D. Gramlich, and F. Allg ¨ower, “Lipschitz constant estimation for 1d convolutional neural networks,” in Learning for Dynamics and Control Conference. PMLR, 2023, pp. 1321–1332

  12. [12]

    Lipschitz constant estimation for general neural network archi- tectures using control tools,

    ——, “Lipschitz constant estimation for general neural network archi- tectures using control tools,” arXiv:2405.01125, 2024

  13. [13]

    Sorting out Lipschitz function approximation,

    C. Anil, J. Lucas, and R. Grosse, “Sorting out Lipschitz function approximation,” in International Conference on Machine Learning . PMLR, 2019, pp. 291–301

  14. [14]

    Almost-orthogonal layers for efficient general-purpose Lipschitz networks,

    B. Prach and C. H. Lampert, “Almost-orthogonal layers for efficient general-purpose Lipschitz networks,” in Computer Vision–ECCV 2022: 17th European Conference, 2022

  15. [15]

    Training robust neural networks using Lipschitz bounds,

    P. Pauli, A. Koch, J. Berberich, P. Kohler, and F. Allg ¨ower, “Training robust neural networks using Lipschitz bounds,” IEEE Control Systems Letters, vol. 6, pp. 121–126, 2021

  16. [16]

    Neu- ral network training under semidefinite constraints,

    P. Pauli, N. Funcke, D. Gramlich, M. A. Msalmi, and F. Allg ¨ower, “Neu- ral network training under semidefinite constraints,” in 61st Conference on Decision and Control . IEEE, 2022, pp. 2731–2736

  17. [17]

    Regularisation of neural networks by enforcing Lipschitz continuity,

    H. Gouk, E. Frank, B. Pfahringer, and M. J. Cree, “Regularisation of neural networks by enforcing Lipschitz continuity,” Machine Learning, vol. 110, pp. 393–416, 2021

  18. [18]

    Lipschitz bounded equilib- rium networks,

    M. Revay, R. Wang, and I. R. Manchester, “Lipschitz bounded equilib- rium networks,” arXiv:2010.01732, 2020

  19. [19]

    Recurrent equilibrium networks: Flexible dynamic models with guaranteed stability and robustness,

    ——, “Recurrent equilibrium networks: Flexible dynamic models with guaranteed stability and robustness,” IEEE Transactions on Automatic Control, 2023

  20. [20]

    Direct parameterization of Lipschitz- bounded deep networks,

    R. Wang and I. Manchester, “Direct parameterization of Lipschitz- bounded deep networks,” in International Conference on Machine Learning. PMLR, 2023, pp. 36 093–36 110

  21. [21]

    Lipschitz- bounded 1D convolutional neural networks using the Cayley transform and the controllability Gramian,

    P. Pauli, R. Wang, I. R. Manchester, and F. Allg ¨ower, “Lipschitz- bounded 1D convolutional neural networks using the Cayley transform and the controllability Gramian,” in 62nd Conference on Decision and Control. IEEE, 2023, pp. 5345–5350

  22. [22]

    Orthogonalizing convolutional layers with the cayley transform,

    A. Trockman and J. Z. Kolter, “Orthogonalizing convolutional layers with the cayley transform,” in International Conference on Learning Representations, 2021. 13 TABLE I EMPIRICAL LOWER LIPSCHITZ BOUNDS , CLEAN ACCURACY , CERTIFIED ROBUST ACCURACY AND ADVERSARIAL ROBUSTNESS UNDER ℓ2 PGD ATTACK FOR VANILLA , AOL, O RTHOGON , SANDWICH , AND LIPKERNEL NNS US...

  23. [23]

    A discrete state-space model for linear image processing,

    R. Roesser, “A discrete state-space model for linear image processing,” IEEE Transactions on Automatic Control , vol. 20, no. 1, 1975

  24. [24]

    Convolutional neural networks as 2-d systems,

    D. Gramlich, P. Pauli, C. W. Scherer, F. Allg ¨ower, and C. Ebenbauer, “Convolutional neural networks as 2-d systems,” arXiv:2303.03042, 2023

  25. [25]

    State space representations of the Roesser type for convolutional layers,

    P. Pauli, D. Gramlich, and F. Allg ¨ower, “State space representations of the Roesser type for convolutional layers,” arXiv:2403.11938, 2024

  26. [26]

    Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100,

    S. Singla, S. Singla, and S. Feizi, “Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100,” inInternational Conference on Learning Representations, 2022

  27. [27]

    Residual flows for invertible generative modeling,

    R. T. Chen, J. Behrmann, D. K. Duvenaud, and J.-H. Jacobsen, “Residual flows for invertible generative modeling,” Advances in Neural Informa- tion Processing Systems , vol. 32, 2019

  28. [28]

    Invertible residual networks,

    J. Behrmann, W. Grathwohl, R. T. Chen, D. Duvenaud, and J.-H. Jacobsen, “Invertible residual networks,” in International conference on machine learning. PMLR, 2019, pp. 573–582

  29. [29]

    Invertible densenets with concatenated lipswish,

    Y . Perugachi-Diaz, J. Tomczak, and S. Bhulai, “Invertible densenets with concatenated lipswish,”Advances in Neural Information Processing Systems, vol. 34, pp. 17 246–17 257, 2021

  30. [30]

    Monotone, bi-Lipschitz, and Polyak- Łojasiewicz networks,

    R. Wang, K. Dvijotham, and I. R. Manchester, “Monotone, bi-Lipschitz, and Polyak- Łojasiewicz networks,” in International Conference on Machine Learning. PMLR, 2024

  31. [31]

    Losslessness, feedback equivalence, and the global stabilization of discrete-time nonlinear systems,

    C. I. Byrnes and W. Lin, “Losslessness, feedback equivalence, and the global stabilization of discrete-time nonlinear systems,” IEEE Transac- tions on Automatic Control , vol. 39, no. 1, pp. 83–98, 1994

  32. [32]

    Goodfellow, Y

    I. Goodfellow, Y . Bengio, and A. Courville, Deep Learning. MIT Press, 2016

  33. [33]

    Robustness against adversarial attacks in neural networks using incremental dissi- pativity,

    B. Aquino, A. Rahnama, P. Seiler, L. Lin, and V . Gupta, “Robustness against adversarial attacks in neural networks using incremental dissi- pativity,” IEEE Control Systems Letters , vol. 6, pp. 2341–2346, 2022

  34. [34]

    Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming,

    M. Fazlyab, M. Morari, and G. J. Pappas, “Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming,” IEEE Transactions on Automatic Control , 2020

  35. [35]

    Incremental positivity nonpreserva- tion by stability multipliers,

    V . V . Kulkarni and M. G. Safonov, “Incremental positivity nonpreserva- tion by stability multipliers,” IEEE Transactions on Automatic Control , vol. 47, no. 1, pp. 173–177, 2002

  36. [36]

    On the relation between stability of continuous-and discrete-time evolution equations via the cayley trans- form,

    B.-Z. Guo and H. Zwart, “On the relation between stability of continuous-and discrete-time evolution equations via the cayley trans- form,” Integral Equations and Operator Theory , vol. 54, pp. 349–383, 2006

  37. [37]

    Orthogonal recurrent neural networks with scaled Cayley transform,

    K. Helfrich, D. Willmott, and Q. Ye, “Orthogonal recurrent neural networks with scaled Cayley transform,” in International Conference on Machine Learning . PMLR, 2018, pp. 1969–1978

  38. [38]

    The representation and parametrization of orthogonal matrices,

    R. Shepard, S. R. Brozell, and G. Gidofalvi, “The representation and parametrization of orthogonal matrices,” The Journal of Physical Chemistry A, vol. 119, no. 28, pp. 7924–7939, 2015

  39. [39]

    Stability and the matrix lyapunov equation for discrete 2-dimensional systems,

    B. Anderson, P. Agathoklis, E. Jury, and M. Mansour, “Stability and the matrix lyapunov equation for discrete 2-dimensional systems,” IEEE Transactions on Circuits and Systems, vol. 33, no. 3, pp. 261–267, 1986

  40. [40]

    A unified algebraic perspective on Lipschitz neural networks,

    A. Araujo, A. J. Havens, B. Delattre, A. Allauzen, and B. Hu, “A unified algebraic perspective on Lipschitz neural networks,” in International Conference on Learning Representations , 2023

  41. [41]

    Deep learning on mobile and embedded devices: State-of-the-art, challenges, and future directions,

    Y . Chen, B. Zheng, Z. Zhang, Q. Wang, C. Shen, and Q. Zhang, “Deep learning on mobile and embedded devices: State-of-the-art, challenges, and future directions,” ACM Computing Surveys (CSUR), vol. 53, no. 4, pp. 1–37, 2020

  42. [42]

    MNIST handwritten digit database,

    Y . LeCun and C. Cortes, “MNIST handwritten digit database,” 2010

  43. [43]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,”arXiv:1706.06083, 2017