L2RU: a Structured State Space Model with prescribed L2-bound
Pith reviewed 2026-05-22 22:34 UTC · model grok-4.3
The pith
L2RU provides SSMs with a prescribed L2-gain bound that holds for every parameter choice, ensuring input-output stability and robustness by design.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce L2RU, a class of SSMs endowed with a prescribed L2-gain bound, guaranteeing input-output stability and robustness for all parameter values. The L2RU architecture is derived from free parametrizations of LTI systems satisfying an L2 constraint, enabling unconstrained optimization via standard gradient-based methods while preserving rigorous stability guarantees. Specifically, we develop two complementary parametrizations: a non-conservative formulation that provides a complete characterization of square LTI systems with a given L2-bound, and a conservative formulation that extends the approach to general systems while improving computational efficiency through a structured
What carries the argument
Free parametrizations of LTI systems that satisfy an L2 constraint, with a non-conservative complete characterization for square systems and a conservative structured-matrix version for general systems.
If this is right
- Unconstrained gradient descent can be used for training while stability and robustness guarantees remain intact.
- Initialization schemes become available that support effective training of long-memory models.
- The models become suitable building blocks for system identification and optimal control applications that require certified robustness.
- Performance and training stability improve on nonlinear identification benchmarks relative to prior SSMs.
Where Pith is reading between the lines
- The same parametrization idea might be adapted to enforce other stability notions such as incremental or contraction properties.
- Deployment in safety-critical control loops could become simpler because stability need not be verified after each parameter update.
- The conservative parametrization may trade some expressiveness for speed, suggesting a tunable spectrum between the two formulations.
Load-bearing premise
The free parametrizations of the LTI blocks satisfy the L2 constraint for every choice of parameters and this property survives the addition of pointwise nonlinearities in the full SSM.
What would settle it
Train an L2RU model to convergence and then measure whether any input sequence produces an output whose L2 norm exceeds the prescribed gain bound times the input L2 norm.
Figures
read the original abstract
Structured state-space models (SSMs) have recently emerged as a powerful architecture at the intersection of machine learning and control, featuring layers composed of discrete-time linear time-invariant (LTI) systems followed by pointwise nonlinearities. These models combine the expressiveness of deep neural networks with the interpretability and inductive bias of dynamical systems, offering strong performance on long-sequence tasks with favorable computational complexity. However, their adoption in applications such as system identification and optimal control remains limited by the difficulty of enforcing stability and robustness in a principled and tractable manner. We introduce L2RU, a class of SSMs endowed with a prescribed $\mathcal{L}_2$-gain bound, guaranteeing input--output stability and robustness for all parameter values. The L2RU architecture is derived from free parametrizations of LTI systems satisfying an $\mathcal{L}_2$ constraint, enabling unconstrained optimization via standard gradient-based methods while preserving rigorous stability guarantees. Specifically, we develop two complementary parametrizations: a non-conservative formulation that provides a complete characterization of square LTI systems with a given $\mathcal{L}_2$-bound, and a conservative formulation that extends the approach to general (possibly non-square) systems while improving computational efficiency through a structured representation of the system matrices. Both parametrizations admit efficient initialization schemes that facilitate training long-memory models. We demonstrate the effectiveness of the proposed framework on a nonlinear system identification benchmark, where L2RU achieves improved performance and training stability compared to existing SSM architectures, highlighting its potential as a principled and robust building block for learning and control.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces L2RU, a class of structured state-space models (SSMs) that incorporate a prescribed L2-gain bound on the underlying LTI layers to guarantee input-output stability and robustness for every choice of parameters. It derives two free parametrizations of LTI systems (a non-conservative complete characterization for square systems and a conservative structured representation for general systems) that are claimed to enforce the L2 constraint by construction, enabling unconstrained gradient-based training while preserving the bound; efficient initialization schemes are also provided, and the approach is evaluated on a nonlinear system identification benchmark showing improved performance and stability over existing SSMs.
Significance. If the parametrizations are shown to enforce the L2 bound without gaps, the work would offer a principled mechanism for embedding hard stability guarantees into SSM architectures, addressing a key barrier to their use in control and identification tasks. The separation of the L2 constraint into free parameters is potentially valuable for training long-memory models via standard optimizers.
major comments (2)
- [Abstract / Introduction] Abstract and introduction: the central claim that both the non-conservative and conservative parametrizations map every admissible choice of free parameters to an LTI system whose induced L2-gain is at most the prescribed value (and that this carries through pointwise nonlinearities) is load-bearing for the 'for all parameter values' guarantee. The manuscript must supply the explicit matrix constructions (A, B, C, D) and the algebraic verification that the bound holds identically, as any slip in the structured representation would invalidate the unconstrained-optimization selling point.
- [Parametrization sections (non-conservative formulation)] The non-conservative formulation is asserted to provide a 'complete characterization' of square LTI systems with given L2 bound. The paper should demonstrate that the parametrization is surjective onto the set of all such systems (i.e., every qualifying LTI system can be realized by some choice of the free parameters) rather than only a subset; otherwise the 'complete' qualifier and the associated training flexibility are overstated.
minor comments (2)
- [Initialization schemes] Clarify the precise definition of the prescribed L2-gain bound (e.g., whether it is the induced norm from l2 to l2 or a finite-horizon variant) and how it is initialized in the efficient schemes for long-memory models.
- [Experiments] The experimental section should report the exact L2-gain values attained by the trained models (or an upper bound) to confirm that the theoretical guarantee is not violated in practice.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The comments highlight important points for strengthening the presentation of the parametrizations and their guarantees. We address each major comment below and will revise the manuscript to incorporate the requested clarifications and proofs.
read point-by-point responses
-
Referee: [Abstract / Introduction] Abstract and introduction: the central claim that both the non-conservative and conservative parametrizations map every admissible choice of free parameters to an LTI system whose induced L2-gain is at most the prescribed value (and that this carries through pointwise nonlinearities) is load-bearing for the 'for all parameter values' guarantee. The manuscript must supply the explicit matrix constructions (A, B, C, D) and the algebraic verification that the bound holds identically, as any slip in the structured representation would invalidate the unconstrained-optimization selling point.
Authors: We agree that the explicit matrix constructions and algebraic verification should be presented more prominently to make the 'for all parameter values' guarantee fully transparent. Although the parametrizations are derived in Sections 3 and 4, the revised manuscript will include a dedicated appendix containing the full explicit expressions for A, B, C, D in terms of the free parameters, together with the step-by-step algebraic verification that the induced L2-gain is bounded by the prescribed value identically (including the extension through pointwise nonlinearities). This will directly address the concern. revision: yes
-
Referee: [Parametrization sections (non-conservative formulation)] The non-conservative formulation is asserted to provide a 'complete characterization' of square LTI systems with given L2 bound. The paper should demonstrate that the parametrization is surjective onto the set of all such systems (i.e., every qualifying LTI system can be realized by some choice of the free parameters) rather than only a subset; otherwise the 'complete' qualifier and the associated training flexibility are overstated.
Authors: We acknowledge that an explicit demonstration of surjectivity is needed to fully substantiate the 'complete characterization' claim. In the revised manuscript we will add a theorem and its proof establishing that the non-conservative parametrization is surjective: for any square LTI system whose induced L2-gain is at most the prescribed bound, there exist values of the free parameters that recover the original system matrices exactly. This will confirm that the parametrization covers the entire admissible set. revision: yes
Circularity Check
No circularity: explicit algebraic parametrizations enforce L2 bound by construction
full rationale
The paper constructs two families of free parametrizations (non-conservative complete characterization for square LTI systems and conservative structured form for general systems) such that every admissible parameter choice yields an LTI system whose induced L2-gain is at most the prescribed value. These parametrizations are derived from first-principles matrix constructions in the LTI case and then lifted pointwise through nonlinearities; the resulting guarantees are algebraic identities rather than statistical fits, data-dependent quantities, or self-citations. No load-bearing step reduces to a fitted input renamed as prediction or to an unverified self-citation chain. The architecture therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We develop two complementary parametrizations: a non-conservative formulation that provides a complete characterization of square LTI systems with a given L2-bound, and a conservative formulation that extends the approach to general (possibly non-square) systems
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
A= L−⊤−(R−H11) Q L⊤−R … Q=(I−S+S⊤)(I+S−S⊤)−1 … β=γ²σ(α)/∥Z∥2
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
F. Bonassi, M. Farina, J. Xie, and R. Scattolini, “On Recurrent Neural Networks for learning-based control: Recent results and ideas for future developments,”Journal of Process Control, vol. 114, pp. 92– 104, June 2022
work page 2022
-
[2]
Recurrent Neural Network based MPC for Process Industries,
N. Lanzetti, Y . Z. Lian, A. Cortinovis, L. Dominguez, M. Mercangöz, and C. Jones, “Recurrent Neural Network based MPC for Process Industries,” in2019 18th European Control Conference (ECC), June 2019, pp. 1005–1010
work page 2019
-
[3]
Deep Convolutional Networks in System Identification,
C. Andersson, A. H. Ribeiro, K. Tiels, N. Wahlström, and T. B. Schön, “Deep Convolutional Networks in System Identification,” in 2019 IEEE 58th Conference on Decision and Control (CDC), Dec. 2019, pp. 3670–3676, iSSN: 2576-2370
work page 2019
-
[4]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention Is All You Need,” Aug. 2023, arXiv:1706.03762
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
Efficient Mask Attention-Based NARMAX (MAB-NARMAX) Model Identification,
Y . Sun and H.-L. Wei, “Efficient Mask Attention-Based NARMAX (MAB-NARMAX) Model Identification,” in2022 27th International Conference on Automation and Computing (ICAC). Bristol, United Kingdom: IEEE, Sept. 2022, pp. 1–6
work page 2022
-
[6]
Training Robust Neural Networks Using Lipschitz Bounds,
P. Pauli, A. Koch, J. Berberich, P. Kohler, and F. Allgöwer, “Training Robust Neural Networks Using Lipschitz Bounds,”IEEE Control Systems Letters, vol. 6, pp. 121–126, 2022
work page 2022
-
[7]
Direct Parameterization of Lipschitz- Bounded Deep Networks,
R. Wang and I. R. Manchester, “Direct Parameterization of Lipschitz- Bounded Deep Networks,” June 2023, arXiv:2301.11526
-
[8]
Recurrent Equilibrium Networks: Flexible Dynamic Models With Guaranteed Stability and Robustness,
M. Revay, R. Wang, and I. R. Manchester, “Recurrent Equilibrium Networks: Flexible Dynamic Models With Guaranteed Stability and Robustness,”IEEE Transactions on Automatic Control, vol. 69, no. 5, pp. 2855–2870, May 2024
work page 2024
-
[9]
Efficiently Modeling Long Sequences with Structured State Spaces
A. Gu, K. Goel, and C. Ré, “Efficiently Modeling Long Sequences with Structured State Spaces,” Aug. 2022, arXiv:2111.00396
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[10]
Prefix sums and their applications
G. E. Blelloch, “Prefix sums and their applications.” Carnegie Mellon University, 2004, p. 1294199 Bytes
work page 2004
-
[11]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
A. Gu and T. Dao, “Mamba: Linear-Time Sequence Modeling with Selective State Spaces,” May 2024, arXiv:2312.00752
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[12]
Simplified State Space Layers for Sequence Modeling
J. T. H. Smith, A. Warrington, and S. W. Linderman, “Simplified State Space Layers for Sequence Modeling,” Mar. 2023, arXiv:2208.04933
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[13]
State Space Models as Foundation Models: A Control Theoretic Overview,
C. A. Alonso, J. Sieber, and M. N. Zeilinger, “State Space Models as Foundation Models: A Control Theoretic Overview,” Mar. 2024, arXiv:2403.16899
-
[14]
Resurrecting Recurrent Neural Networks for Long Sequences,
A. Orvieto, S. L. Smith, A. Gu, A. Fernando, C. Gulcehre, R. Pas- canu, and S. De, “Resurrecting Recurrent Neural Networks for Long Sequences,” Mar. 2023, arXiv:2303.06349
-
[15]
Structured state-space models are deep Wiener models,
F. Bonassi, C. Andersson, P. Mattsson, and T. B. Schön, “Structured state-space models are deep Wiener models,”IFAC-PapersOnLine, vol. 58, no. 15, pp. 247–252, Jan. 2024
work page 2024
-
[16]
Learning to Boost the Performance of Stable Nonlinear Systems,
L. Furieri, C. L. Galimberti, and G. Ferrari-Trecate, “Learning to Boost the Performance of Stable Nonlinear Systems,”IEEE Open Journal of Control Systems, vol. 3, pp. 342–357, 2024
work page 2024
-
[17]
L. Furieri, C. L. Galimberti, M. Zakwan, and G. Ferrari-Trecate, “Distributed Neural Network Control with Dependability Guarantees: a Compositional Port-Hamiltonian Approach,” inProceedings of The 4th Annual Learning for Dynamics and Control Conference. PMLR, May 2022, pp. 571–583
work page 2022
-
[18]
L. Massai, D. Saccani, L. Furieri, and G. Ferrari-Trecate, “Un- constrained Learning of Networked Nonlinear Systems via Free Parametrization of Stable Interconnected Operators,” in2024 Euro- pean Control Conference (ECC), June 2024, pp. 651–656
work page 2024
-
[19]
D. Saccani, L. Massai, L. Furieri, and G. Ferrari-Trecate, “Optimal distributed control with stability guarantees by training a network of neural closed-loop maps,” July 2024, arXiv:2404.02820
-
[20]
Robust Classification Using Contractive Hamiltonian Neural ODEs,
M. Zakwan, L. Xu, and G. Ferrari-Trecate, “Robust Classification Using Contractive Hamiltonian Neural ODEs,”IEEE Control Systems Letters, vol. 7, pp. 145–150, 2023
work page 2023
-
[21]
Skip Connections Eliminate Singularities
A. E. Orhan and X. Pitkow, “Skip Connections Eliminate Singulari- ties,” Mar. 2018, arXiv:1701.09175
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[22]
LMI Properties and Applications in Systems, Stability, and Control Theory,
R. J. Caverly and J. R. Forbes, “LMI Properties and Applications in Systems, Stability, and Control Theory,” May 2024, arXiv:1903.08599
-
[23]
HiPPO: Re- current Memory with Optimal Polynomial Projections,
A. Gu, T. Dao, S. Ermon, A. Rudra, and C. Re, “HiPPO: Re- current Memory with Optimal Polynomial Projections,” Oct. 2020, arXiv:2008.07669
-
[24]
On the Parameterization and Initialization of Diagonal State Space Models,
A. Gu, A. Gupta, K. Goel, and C. Ré, “On the Parameterization and Initialization of Diagonal State Space Models,” Aug. 2022, arXiv:2206.11893
-
[25]
Three Benchmarks Addressing Open Challenges in Nonlinear System Identification*,
M. Schoukens and J. P. Noël, “Three Benchmarks Addressing Open Challenges in Nonlinear System Identification*,”IFAC-PapersOnLine, vol. 50, no. 1, pp. 446–451, July 2017. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S2405896317300915
work page 2017
-
[26]
A⊤P A−P+C ⊤C A ⊤P B+C ⊤D B⊤P A+D ⊤C B ⊤P B+D ⊤D−γ 2I # ≺0, or, equivalently
S. Lang,Undergraduate Algebra. Berlin, Heidelberg: Springer Berlin Heidelberg, 1990. APPENDIX A. Proof of Theorem 1 Let us start by proving thatψis a free parametrization. Notice thatθ∈R 6n2+2 and the mapψis defined and continuous onR 6n2+2 apart from those values for which H12 = √β X11X21 ⊤ + ˜C ⊤ ˜D is singular. We see that H12 is a genericn×nmatrix and...
work page 1990
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.