LipKernel: Lipschitz-Bounded Convolutional Neural Networks via Dissipative Layers
Pith reviewed 2026-05-23 18:34 UTC · model grok-4.3
The pith
Each convolutional layer satisfies a linear matrix inequality implying dissipativity, so the full network obeys a prescribed Lipschitz bound.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a layer-wise parameterization of convolutional kernels via a 2-D Roesser-type state-space model allows each layer to satisfy an LMI enforcing dissipativity with respect to a specific supply rate. The composition of such layers then guarantees that the input-output mapping of the entire network has a Lipschitz constant no larger than a user-specified value, while the trained layers can be evaluated in standard convolutional form without added cost.
What carries the argument
The 2-D Roesser-type state-space model that directly parameterizes dissipative convolution kernels so each layer satisfies its LMI.
Load-bearing premise
That a layer satisfying its individual LMI is dissipative with respect to the chosen supply rate, and that these local dissipativity properties compose to bound the global Lipschitz constant.
What would settle it
A concrete network in which every layer meets its LMI yet the measured end-to-end Lipschitz constant exceeds the prescribed value.
Figures
read the original abstract
We propose a novel layer-wise parameterization for convolutional neural networks (CNNs) that includes built-in robustness guarantees by enforcing a prescribed Lipschitz bound. Each layer in our parameterization is designed to satisfy a linear matrix inequality (LMI), which in turn implies dissipativity with respect to a specific supply rate. Collectively, these layer-wise LMIs ensure Lipschitz boundedness for the input-output mapping of the neural network, yielding a more expressive parameterization than through spectral bounds or orthogonal layers. Our new method LipKernel directly parameterizes dissipative convolution kernels using a 2-D Roesser-type state space model. This means that the convolutional layers are given in standard form after training and can be evaluated without computational overhead. In numerical experiments, we show that the run-time using our method is orders of magnitude faster than state-of-the-art Lipschitz-bounded networks that parameterize convolutions in the Fourier domain, making our approach particularly attractive for improving the robustness of learning-based real-time perception or control in robotics, autonomous vehicles, or automation systems. We focus on CNNs, and in contrast to previous works, our approach accommodates a wide variety of layers typically used in CNNs, including 1-D and 2-D convolutional layers, maximum and average pooling layers, as well as strided and dilated convolutions and zero padding. However, our approach naturally extends beyond CNNs as we can incorporate any layer that is incrementally dissipative.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes LipKernel, a parameterization of CNN layers (including 1D/2D convolutions, pooling, strided/dilated variants, and padding) via 2-D Roesser-type state-space models. Each layer is constrained to satisfy an LMI that implies dissipativity w.r.t. a chosen supply rate; the authors claim that the collection of such layer-wise conditions guarantees a global Lipschitz bound on the network input-output map, while remaining more expressive than spectral-norm or orthogonal constraints and incurring no inference overhead.
Significance. If the composition argument holds and the parameterization is shown to be strictly more expressive, the method would supply a practical route to training provably Lipschitz-bounded CNNs that natively support the layer types used in modern vision pipelines, with runtime advantages over Fourier-domain approaches.
major comments (3)
- [Abstract] Abstract (and the paragraph beginning 'Collectively, these layer-wise LMIs ensure...'): the claim that layer-wise LMIs 'collectively ensure Lipschitz boundedness' requires an explicit statement of the supply rate, the precise LMI, and the telescoping argument that shows the output supply term of layer k cancels with the input supply term of layer k+1 for heterogeneous layers (convolution, max-pool, strided, dilated). No such chaining rule or compatibility condition is supplied in the abstract or indicated in the provided text.
- [Abstract] The abstract asserts that the parameterization 'accommodates' max/average pooling, strided and dilated convolutions, and zero padding, yet supplies no indication of how the supply-rate matrices or Roesser parameters are chosen or propagated for these non-standard layers so that the global supply rate remains of the form γ²‖u‖² − ‖y‖².
- [Abstract] The abstract states that the method yields 'a more expressive parameterization than through spectral bounds or orthogonal layers,' but the manuscript excerpt contains neither a formal comparison of the feasible sets nor any numerical verification that the prescribed Lipschitz bound is attained without post-hoc scaling or tuning.
minor comments (1)
- [Abstract] The abstract refers to 'a specific supply rate' without defining it or its relation to the Lipschitz constant γ; this notation should be introduced at first use.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the abstract. We address each point below and will revise the abstract to improve clarity while preserving the technical content already present in the full manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract (and the paragraph beginning 'Collectively, these layer-wise LMIs ensure...'): the claim that layer-wise LMIs 'collectively ensure Lipschitz boundedness' requires an explicit statement of the supply rate, the precise LMI, and the telescoping argument that shows the output supply term of layer k cancels with the input supply term of layer k+1 for heterogeneous layers (convolution, max-pool, strided, dilated). No such chaining rule or compatibility condition is supplied in the abstract or indicated in the provided text.
Authors: The full manuscript states the supply rate explicitly as γ²‖u‖² − ‖y‖² and gives the corresponding LMI for dissipativity in Section 3. The telescoping composition argument for heterogeneous layers appears in the proof of Theorem 1 (Section 3.3), which relies on the uniform supply rate across layer types to ensure cancellation. We will revise the abstract to name the supply rate and cite the theorem for the chaining argument. revision: yes
-
Referee: [Abstract] The abstract asserts that the parameterization 'accommodates' max/average pooling, strided and dilated convolutions, and zero padding, yet supplies no indication of how the supply-rate matrices or Roesser parameters are chosen or propagated for these non-standard layers so that the global supply rate remains of the form γ²‖u‖² − ‖y‖².
Authors: Section 4.2 derives the Roesser parameters and supply-rate matrices for max/average pooling, strided/dilated convolutions, and zero padding so that each remains dissipative w.r.t. the same supply rate γ²‖u‖² − ‖y‖². This preserves the global form. We will update the abstract to indicate that these layers are parameterized to maintain the required supply-rate compatibility. revision: yes
-
Referee: [Abstract] The abstract states that the method yields 'a more expressive parameterization than through spectral bounds or orthogonal layers,' but the manuscript excerpt contains neither a formal comparison of the feasible sets nor any numerical verification that the prescribed Lipschitz bound is attained without post-hoc scaling or tuning.
Authors: Section 5 reports numerical experiments in which the prescribed Lipschitz bounds are attained directly by the trained networks without post-hoc scaling. We agree, however, that an explicit comparison of feasible sets versus spectral-norm and orthogonal parameterizations is not provided and would strengthen the expressiveness claim. We will add this comparison in the revised manuscript. revision: yes
Circularity Check
No circularity: derivation rests on external dissipativity theory
full rationale
The paper's central claim is that layer-wise LMIs imply per-layer dissipativity w.r.t. chosen supply rates, and that the composition of such layers yields a global Lipschitz bound. This composition step follows from standard results in dissipativity theory for interconnected systems (supply-rate telescoping under compatible quadratic forms), which are not derived or fitted inside the paper. The new parameterization (2-D Roesser model for kernels) is independent of the bound; the LMIs are feasibility constraints solved during training rather than tautological redefinitions of the target Lipschitz constant. No load-bearing step reduces by construction to a fitted quantity or self-citation chain defined within the manuscript. The approach is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- Lipschitz bound
- Supply rate
axioms (2)
- domain assumption Satisfaction of the layer LMI implies dissipativity w.r.t. the chosen supply rate
- domain assumption Composition of layer-wise dissipative maps yields a global Lipschitz bound
Reference graph
Works this paper leans on
-
[1]
Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015
work page 2015
-
[2]
Neural networks and their applications,
C. M. Bishop, “Neural networks and their applications,” Review of scientific instruments, vol. 65, no. 6, pp. 1803–1832, 1994
work page 1994
-
[3]
A survey of convolutional neural networks: analysis, applications, and prospects,
Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolutional neural networks: analysis, applications, and prospects,” IEEE Transac- tions on Neural Networks and Learning Systems , vol. 33, no. 12, pp. 6999–7019, 2021
work page 2021
-
[4]
Intriguing properties of neural networks
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Good- fellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv:1312.6199, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[5]
Lipschitz regularity of deep neural networks: analysis and efficient estimation,
A. Virmaux and K. Scaman, “Lipschitz regularity of deep neural networks: analysis and efficient estimation,” Advances in Neural Infor- mation Processing Systems , vol. 31, 2018
work page 2018
-
[6]
Lipschitz certificates for layered network structures driven by averaged activation operators,
P. L. Combettes and J.-C. Pesquet, “Lipschitz certificates for layered network structures driven by averaged activation operators,” SIAM Journal on Mathematics of Data Science , vol. 2, no. 2, pp. 529–557, 2020
work page 2020
-
[7]
Efficient and accurate estimation of Lipschitz constants for deep neural networks,
M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas, “Efficient and accurate estimation of Lipschitz constants for deep neural networks,” Advances in Neural Information Processing Systems , vol. 32, 2019
work page 2019
-
[8]
Lipschitz constant estimation of neural networks via sparse polynomial optimization,
F. Latorre, P. Rolland, and V . Cevher, “Lipschitz constant estimation of neural networks via sparse polynomial optimization,” in International Conference on Learning Representations , 2020
work page 2020
-
[9]
Exactly computing the local Lipschitz constant of ReLU networks,
M. Jordan and A. G. Dimakis, “Exactly computing the local Lipschitz constant of ReLU networks,” in Advances in Neural Information Pro- cessing Systems, 2020, pp. 7344–7353
work page 2020
-
[10]
A convex parameterization of robust recurrent neural networks,
M. Revay, R. Wang, and I. R. Manchester, “A convex parameterization of robust recurrent neural networks,” IEEE Control Systems Letters, vol. 5, no. 4, pp. 1363–1368, 2020
work page 2020
-
[11]
Lipschitz constant estimation for 1d convolutional neural networks,
P. Pauli, D. Gramlich, and F. Allg ¨ower, “Lipschitz constant estimation for 1d convolutional neural networks,” in Learning for Dynamics and Control Conference. PMLR, 2023, pp. 1321–1332
work page 2023
-
[12]
Lipschitz constant estimation for general neural network archi- tectures using control tools,
——, “Lipschitz constant estimation for general neural network archi- tectures using control tools,” arXiv:2405.01125, 2024
-
[13]
Sorting out Lipschitz function approximation,
C. Anil, J. Lucas, and R. Grosse, “Sorting out Lipschitz function approximation,” in International Conference on Machine Learning . PMLR, 2019, pp. 291–301
work page 2019
-
[14]
Almost-orthogonal layers for efficient general-purpose Lipschitz networks,
B. Prach and C. H. Lampert, “Almost-orthogonal layers for efficient general-purpose Lipschitz networks,” in Computer Vision–ECCV 2022: 17th European Conference, 2022
work page 2022
-
[15]
Training robust neural networks using Lipschitz bounds,
P. Pauli, A. Koch, J. Berberich, P. Kohler, and F. Allg ¨ower, “Training robust neural networks using Lipschitz bounds,” IEEE Control Systems Letters, vol. 6, pp. 121–126, 2021
work page 2021
-
[16]
Neu- ral network training under semidefinite constraints,
P. Pauli, N. Funcke, D. Gramlich, M. A. Msalmi, and F. Allg ¨ower, “Neu- ral network training under semidefinite constraints,” in 61st Conference on Decision and Control . IEEE, 2022, pp. 2731–2736
work page 2022
-
[17]
Regularisation of neural networks by enforcing Lipschitz continuity,
H. Gouk, E. Frank, B. Pfahringer, and M. J. Cree, “Regularisation of neural networks by enforcing Lipschitz continuity,” Machine Learning, vol. 110, pp. 393–416, 2021
work page 2021
-
[18]
Lipschitz bounded equilib- rium networks,
M. Revay, R. Wang, and I. R. Manchester, “Lipschitz bounded equilib- rium networks,” arXiv:2010.01732, 2020
-
[19]
Recurrent equilibrium networks: Flexible dynamic models with guaranteed stability and robustness,
——, “Recurrent equilibrium networks: Flexible dynamic models with guaranteed stability and robustness,” IEEE Transactions on Automatic Control, 2023
work page 2023
-
[20]
Direct parameterization of Lipschitz- bounded deep networks,
R. Wang and I. Manchester, “Direct parameterization of Lipschitz- bounded deep networks,” in International Conference on Machine Learning. PMLR, 2023, pp. 36 093–36 110
work page 2023
-
[21]
P. Pauli, R. Wang, I. R. Manchester, and F. Allg ¨ower, “Lipschitz- bounded 1D convolutional neural networks using the Cayley transform and the controllability Gramian,” in 62nd Conference on Decision and Control. IEEE, 2023, pp. 5345–5350
work page 2023
-
[22]
Orthogonalizing convolutional layers with the cayley transform,
A. Trockman and J. Z. Kolter, “Orthogonalizing convolutional layers with the cayley transform,” in International Conference on Learning Representations, 2021. 13 TABLE I EMPIRICAL LOWER LIPSCHITZ BOUNDS , CLEAN ACCURACY , CERTIFIED ROBUST ACCURACY AND ADVERSARIAL ROBUSTNESS UNDER ℓ2 PGD ATTACK FOR VANILLA , AOL, O RTHOGON , SANDWICH , AND LIPKERNEL NNS US...
work page 2021
-
[23]
A discrete state-space model for linear image processing,
R. Roesser, “A discrete state-space model for linear image processing,” IEEE Transactions on Automatic Control , vol. 20, no. 1, 1975
work page 1975
-
[24]
Convolutional neural networks as 2-d systems,
D. Gramlich, P. Pauli, C. W. Scherer, F. Allg ¨ower, and C. Ebenbauer, “Convolutional neural networks as 2-d systems,” arXiv:2303.03042, 2023
-
[25]
State space representations of the Roesser type for convolutional layers,
P. Pauli, D. Gramlich, and F. Allg ¨ower, “State space representations of the Roesser type for convolutional layers,” arXiv:2403.11938, 2024
-
[26]
Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100,
S. Singla, S. Singla, and S. Feizi, “Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100,” inInternational Conference on Learning Representations, 2022
work page 2022
-
[27]
Residual flows for invertible generative modeling,
R. T. Chen, J. Behrmann, D. K. Duvenaud, and J.-H. Jacobsen, “Residual flows for invertible generative modeling,” Advances in Neural Informa- tion Processing Systems , vol. 32, 2019
work page 2019
-
[28]
J. Behrmann, W. Grathwohl, R. T. Chen, D. Duvenaud, and J.-H. Jacobsen, “Invertible residual networks,” in International conference on machine learning. PMLR, 2019, pp. 573–582
work page 2019
-
[29]
Invertible densenets with concatenated lipswish,
Y . Perugachi-Diaz, J. Tomczak, and S. Bhulai, “Invertible densenets with concatenated lipswish,”Advances in Neural Information Processing Systems, vol. 34, pp. 17 246–17 257, 2021
work page 2021
-
[30]
Monotone, bi-Lipschitz, and Polyak- Łojasiewicz networks,
R. Wang, K. Dvijotham, and I. R. Manchester, “Monotone, bi-Lipschitz, and Polyak- Łojasiewicz networks,” in International Conference on Machine Learning. PMLR, 2024
work page 2024
-
[31]
Losslessness, feedback equivalence, and the global stabilization of discrete-time nonlinear systems,
C. I. Byrnes and W. Lin, “Losslessness, feedback equivalence, and the global stabilization of discrete-time nonlinear systems,” IEEE Transac- tions on Automatic Control , vol. 39, no. 1, pp. 83–98, 1994
work page 1994
-
[32]
I. Goodfellow, Y . Bengio, and A. Courville, Deep Learning. MIT Press, 2016
work page 2016
-
[33]
Robustness against adversarial attacks in neural networks using incremental dissi- pativity,
B. Aquino, A. Rahnama, P. Seiler, L. Lin, and V . Gupta, “Robustness against adversarial attacks in neural networks using incremental dissi- pativity,” IEEE Control Systems Letters , vol. 6, pp. 2341–2346, 2022
work page 2022
-
[34]
M. Fazlyab, M. Morari, and G. J. Pappas, “Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming,” IEEE Transactions on Automatic Control , 2020
work page 2020
-
[35]
Incremental positivity nonpreserva- tion by stability multipliers,
V . V . Kulkarni and M. G. Safonov, “Incremental positivity nonpreserva- tion by stability multipliers,” IEEE Transactions on Automatic Control , vol. 47, no. 1, pp. 173–177, 2002
work page 2002
-
[36]
B.-Z. Guo and H. Zwart, “On the relation between stability of continuous-and discrete-time evolution equations via the cayley trans- form,” Integral Equations and Operator Theory , vol. 54, pp. 349–383, 2006
work page 2006
-
[37]
Orthogonal recurrent neural networks with scaled Cayley transform,
K. Helfrich, D. Willmott, and Q. Ye, “Orthogonal recurrent neural networks with scaled Cayley transform,” in International Conference on Machine Learning . PMLR, 2018, pp. 1969–1978
work page 2018
-
[38]
The representation and parametrization of orthogonal matrices,
R. Shepard, S. R. Brozell, and G. Gidofalvi, “The representation and parametrization of orthogonal matrices,” The Journal of Physical Chemistry A, vol. 119, no. 28, pp. 7924–7939, 2015
work page 2015
-
[39]
Stability and the matrix lyapunov equation for discrete 2-dimensional systems,
B. Anderson, P. Agathoklis, E. Jury, and M. Mansour, “Stability and the matrix lyapunov equation for discrete 2-dimensional systems,” IEEE Transactions on Circuits and Systems, vol. 33, no. 3, pp. 261–267, 1986
work page 1986
-
[40]
A unified algebraic perspective on Lipschitz neural networks,
A. Araujo, A. J. Havens, B. Delattre, A. Allauzen, and B. Hu, “A unified algebraic perspective on Lipschitz neural networks,” in International Conference on Learning Representations , 2023
work page 2023
-
[41]
Deep learning on mobile and embedded devices: State-of-the-art, challenges, and future directions,
Y . Chen, B. Zheng, Z. Zhang, Q. Wang, C. Shen, and Q. Zhang, “Deep learning on mobile and embedded devices: State-of-the-art, challenges, and future directions,” ACM Computing Surveys (CSUR), vol. 53, no. 4, pp. 1–37, 2020
work page 2020
-
[42]
MNIST handwritten digit database,
Y . LeCun and C. Cortes, “MNIST handwritten digit database,” 2010
work page 2010
-
[43]
Towards Deep Learning Models Resistant to Adversarial Attacks
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,”arXiv:1706.06083, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.