pith. sign in

arxiv: 2503.23818 · v3 · submitted 2025-03-31 · 📡 eess.SY · cs.LG· cs.SY

L2RU: a Structured State Space Model with prescribed L2-bound

Pith reviewed 2026-05-22 22:34 UTC · model grok-4.3

classification 📡 eess.SY cs.LGcs.SY
keywords structured state space modelsL2-gain boundinput-output stabilityrobustnesssystem identificationLTI parametrizationneural dynamical systems
0
0 comments X

The pith

L2RU provides SSMs with a prescribed L2-gain bound that holds for every parameter choice, ensuring input-output stability and robustness by design.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces L2RU as a class of structured state space models that carry a guaranteed L2-gain bound from the outset. This bound delivers input-output stability and robustness no matter which parameters are selected during training. The construction rests on free parametrizations of linear time-invariant systems that already obey the L2 constraint, so standard gradient-based optimization can proceed without added stability penalties. Two complementary versions appear: a non-conservative one that fully characterizes square systems and a conservative one that handles general systems more efficiently through structured matrices. On a nonlinear system identification benchmark the resulting models show improved performance and more reliable training than earlier SSM architectures.

Core claim

We introduce L2RU, a class of SSMs endowed with a prescribed L2-gain bound, guaranteeing input-output stability and robustness for all parameter values. The L2RU architecture is derived from free parametrizations of LTI systems satisfying an L2 constraint, enabling unconstrained optimization via standard gradient-based methods while preserving rigorous stability guarantees. Specifically, we develop two complementary parametrizations: a non-conservative formulation that provides a complete characterization of square LTI systems with a given L2-bound, and a conservative formulation that extends the approach to general systems while improving computational efficiency through a structured

What carries the argument

Free parametrizations of LTI systems that satisfy an L2 constraint, with a non-conservative complete characterization for square systems and a conservative structured-matrix version for general systems.

If this is right

  • Unconstrained gradient descent can be used for training while stability and robustness guarantees remain intact.
  • Initialization schemes become available that support effective training of long-memory models.
  • The models become suitable building blocks for system identification and optimal control applications that require certified robustness.
  • Performance and training stability improve on nonlinear identification benchmarks relative to prior SSMs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same parametrization idea might be adapted to enforce other stability notions such as incremental or contraction properties.
  • Deployment in safety-critical control loops could become simpler because stability need not be verified after each parameter update.
  • The conservative parametrization may trade some expressiveness for speed, suggesting a tunable spectrum between the two formulations.

Load-bearing premise

The free parametrizations of the LTI blocks satisfy the L2 constraint for every choice of parameters and this property survives the addition of pointwise nonlinearities in the full SSM.

What would settle it

Train an L2RU model to convergence and then measure whether any input sequence produces an output whose L2 norm exceeds the prescribed gain bound times the input L2 norm.

Figures

Figures reproduced from arXiv: 2503.23818 by Giancarlo Ferrari-Trecate, Leonardo Massai, Muhammad Zakwan.

Figure 1
Figure 1. Figure 1: L2RU architecture presented in this paper. The model consists of a [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Triple-tank system with recirculation pump. A1 38 cm2 A2 32 cm2 A3 21 cm2 a1 0.05 cm2 a2 0.03 cm2 a3 0.06 cm2 k1 0.32 k2 0.23 k3 0.52 kc 50 Tab. 2: List of parameters em￾ployed in the simulation. lustrative examples. The first employs the L2RU architecture to construct a distributed model for learning a networked dynamical system, where we explicitly exploit the ability to tune the L2-bound of the model. T… view at source ↗
Figure 4
Figure 4. Figure 4: Validation loss versus number of parameters for three models. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

Structured state-space models (SSMs) have recently emerged as a powerful architecture at the intersection of machine learning and control, featuring layers composed of discrete-time linear time-invariant (LTI) systems followed by pointwise nonlinearities. These models combine the expressiveness of deep neural networks with the interpretability and inductive bias of dynamical systems, offering strong performance on long-sequence tasks with favorable computational complexity. However, their adoption in applications such as system identification and optimal control remains limited by the difficulty of enforcing stability and robustness in a principled and tractable manner. We introduce L2RU, a class of SSMs endowed with a prescribed $\mathcal{L}_2$-gain bound, guaranteeing input--output stability and robustness for all parameter values. The L2RU architecture is derived from free parametrizations of LTI systems satisfying an $\mathcal{L}_2$ constraint, enabling unconstrained optimization via standard gradient-based methods while preserving rigorous stability guarantees. Specifically, we develop two complementary parametrizations: a non-conservative formulation that provides a complete characterization of square LTI systems with a given $\mathcal{L}_2$-bound, and a conservative formulation that extends the approach to general (possibly non-square) systems while improving computational efficiency through a structured representation of the system matrices. Both parametrizations admit efficient initialization schemes that facilitate training long-memory models. We demonstrate the effectiveness of the proposed framework on a nonlinear system identification benchmark, where L2RU achieves improved performance and training stability compared to existing SSM architectures, highlighting its potential as a principled and robust building block for learning and control.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces L2RU, a class of structured state-space models (SSMs) that incorporate a prescribed L2-gain bound on the underlying LTI layers to guarantee input-output stability and robustness for every choice of parameters. It derives two free parametrizations of LTI systems (a non-conservative complete characterization for square systems and a conservative structured representation for general systems) that are claimed to enforce the L2 constraint by construction, enabling unconstrained gradient-based training while preserving the bound; efficient initialization schemes are also provided, and the approach is evaluated on a nonlinear system identification benchmark showing improved performance and stability over existing SSMs.

Significance. If the parametrizations are shown to enforce the L2 bound without gaps, the work would offer a principled mechanism for embedding hard stability guarantees into SSM architectures, addressing a key barrier to their use in control and identification tasks. The separation of the L2 constraint into free parameters is potentially valuable for training long-memory models via standard optimizers.

major comments (2)
  1. [Abstract / Introduction] Abstract and introduction: the central claim that both the non-conservative and conservative parametrizations map every admissible choice of free parameters to an LTI system whose induced L2-gain is at most the prescribed value (and that this carries through pointwise nonlinearities) is load-bearing for the 'for all parameter values' guarantee. The manuscript must supply the explicit matrix constructions (A, B, C, D) and the algebraic verification that the bound holds identically, as any slip in the structured representation would invalidate the unconstrained-optimization selling point.
  2. [Parametrization sections (non-conservative formulation)] The non-conservative formulation is asserted to provide a 'complete characterization' of square LTI systems with given L2 bound. The paper should demonstrate that the parametrization is surjective onto the set of all such systems (i.e., every qualifying LTI system can be realized by some choice of the free parameters) rather than only a subset; otherwise the 'complete' qualifier and the associated training flexibility are overstated.
minor comments (2)
  1. [Initialization schemes] Clarify the precise definition of the prescribed L2-gain bound (e.g., whether it is the induced norm from l2 to l2 or a finite-horizon variant) and how it is initialized in the efficient schemes for long-memory models.
  2. [Experiments] The experimental section should report the exact L2-gain values attained by the trained models (or an upper bound) to confirm that the theoretical guarantee is not violated in practice.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments highlight important points for strengthening the presentation of the parametrizations and their guarantees. We address each major comment below and will revise the manuscript to incorporate the requested clarifications and proofs.

read point-by-point responses
  1. Referee: [Abstract / Introduction] Abstract and introduction: the central claim that both the non-conservative and conservative parametrizations map every admissible choice of free parameters to an LTI system whose induced L2-gain is at most the prescribed value (and that this carries through pointwise nonlinearities) is load-bearing for the 'for all parameter values' guarantee. The manuscript must supply the explicit matrix constructions (A, B, C, D) and the algebraic verification that the bound holds identically, as any slip in the structured representation would invalidate the unconstrained-optimization selling point.

    Authors: We agree that the explicit matrix constructions and algebraic verification should be presented more prominently to make the 'for all parameter values' guarantee fully transparent. Although the parametrizations are derived in Sections 3 and 4, the revised manuscript will include a dedicated appendix containing the full explicit expressions for A, B, C, D in terms of the free parameters, together with the step-by-step algebraic verification that the induced L2-gain is bounded by the prescribed value identically (including the extension through pointwise nonlinearities). This will directly address the concern. revision: yes

  2. Referee: [Parametrization sections (non-conservative formulation)] The non-conservative formulation is asserted to provide a 'complete characterization' of square LTI systems with given L2 bound. The paper should demonstrate that the parametrization is surjective onto the set of all such systems (i.e., every qualifying LTI system can be realized by some choice of the free parameters) rather than only a subset; otherwise the 'complete' qualifier and the associated training flexibility are overstated.

    Authors: We acknowledge that an explicit demonstration of surjectivity is needed to fully substantiate the 'complete characterization' claim. In the revised manuscript we will add a theorem and its proof establishing that the non-conservative parametrization is surjective: for any square LTI system whose induced L2-gain is at most the prescribed bound, there exist values of the free parameters that recover the original system matrices exactly. This will confirm that the parametrization covers the entire admissible set. revision: yes

Circularity Check

0 steps flagged

No circularity: explicit algebraic parametrizations enforce L2 bound by construction

full rationale

The paper constructs two families of free parametrizations (non-conservative complete characterization for square LTI systems and conservative structured form for general systems) such that every admissible parameter choice yields an LTI system whose induced L2-gain is at most the prescribed value. These parametrizations are derived from first-principles matrix constructions in the LTI case and then lifted pointwise through nonlinearities; the resulting guarantees are algebraic identities rather than statistical fits, data-dependent quantities, or self-citations. No load-bearing step reduces to a fitted input renamed as prediction or to an unverified self-citation chain. The architecture therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no visible free parameters, axioms, or invented entities; full text required to audit the derivation of the L2 parametrizations.

pith-pipeline@v0.9.0 · 5827 in / 1048 out tokens · 56870 ms · 2026-05-22T22:34:55.072650+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 5 internal anchors

  1. [1]

    On Recurrent Neural Networks for learning-based control: Recent results and ideas for future developments,

    F. Bonassi, M. Farina, J. Xie, and R. Scattolini, “On Recurrent Neural Networks for learning-based control: Recent results and ideas for future developments,”Journal of Process Control, vol. 114, pp. 92– 104, June 2022

  2. [2]

    Recurrent Neural Network based MPC for Process Industries,

    N. Lanzetti, Y . Z. Lian, A. Cortinovis, L. Dominguez, M. Mercangöz, and C. Jones, “Recurrent Neural Network based MPC for Process Industries,” in2019 18th European Control Conference (ECC), June 2019, pp. 1005–1010

  3. [3]

    Deep Convolutional Networks in System Identification,

    C. Andersson, A. H. Ribeiro, K. Tiels, N. Wahlström, and T. B. Schön, “Deep Convolutional Networks in System Identification,” in 2019 IEEE 58th Conference on Decision and Control (CDC), Dec. 2019, pp. 3670–3676, iSSN: 2576-2370

  4. [4]

    Attention Is All You Need

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention Is All You Need,” Aug. 2023, arXiv:1706.03762

  5. [5]

    Efficient Mask Attention-Based NARMAX (MAB-NARMAX) Model Identification,

    Y . Sun and H.-L. Wei, “Efficient Mask Attention-Based NARMAX (MAB-NARMAX) Model Identification,” in2022 27th International Conference on Automation and Computing (ICAC). Bristol, United Kingdom: IEEE, Sept. 2022, pp. 1–6

  6. [6]

    Training Robust Neural Networks Using Lipschitz Bounds,

    P. Pauli, A. Koch, J. Berberich, P. Kohler, and F. Allgöwer, “Training Robust Neural Networks Using Lipschitz Bounds,”IEEE Control Systems Letters, vol. 6, pp. 121–126, 2022

  7. [7]

    Direct Parameterization of Lipschitz- Bounded Deep Networks,

    R. Wang and I. R. Manchester, “Direct Parameterization of Lipschitz- Bounded Deep Networks,” June 2023, arXiv:2301.11526

  8. [8]

    Recurrent Equilibrium Networks: Flexible Dynamic Models With Guaranteed Stability and Robustness,

    M. Revay, R. Wang, and I. R. Manchester, “Recurrent Equilibrium Networks: Flexible Dynamic Models With Guaranteed Stability and Robustness,”IEEE Transactions on Automatic Control, vol. 69, no. 5, pp. 2855–2870, May 2024

  9. [9]

    Efficiently Modeling Long Sequences with Structured State Spaces

    A. Gu, K. Goel, and C. Ré, “Efficiently Modeling Long Sequences with Structured State Spaces,” Aug. 2022, arXiv:2111.00396

  10. [10]

    Prefix sums and their applications

    G. E. Blelloch, “Prefix sums and their applications.” Carnegie Mellon University, 2004, p. 1294199 Bytes

  11. [11]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    A. Gu and T. Dao, “Mamba: Linear-Time Sequence Modeling with Selective State Spaces,” May 2024, arXiv:2312.00752

  12. [12]

    Simplified State Space Layers for Sequence Modeling

    J. T. H. Smith, A. Warrington, and S. W. Linderman, “Simplified State Space Layers for Sequence Modeling,” Mar. 2023, arXiv:2208.04933

  13. [13]

    State Space Models as Foundation Models: A Control Theoretic Overview,

    C. A. Alonso, J. Sieber, and M. N. Zeilinger, “State Space Models as Foundation Models: A Control Theoretic Overview,” Mar. 2024, arXiv:2403.16899

  14. [14]

    Resurrecting Recurrent Neural Networks for Long Sequences,

    A. Orvieto, S. L. Smith, A. Gu, A. Fernando, C. Gulcehre, R. Pas- canu, and S. De, “Resurrecting Recurrent Neural Networks for Long Sequences,” Mar. 2023, arXiv:2303.06349

  15. [15]

    Structured state-space models are deep Wiener models,

    F. Bonassi, C. Andersson, P. Mattsson, and T. B. Schön, “Structured state-space models are deep Wiener models,”IFAC-PapersOnLine, vol. 58, no. 15, pp. 247–252, Jan. 2024

  16. [16]

    Learning to Boost the Performance of Stable Nonlinear Systems,

    L. Furieri, C. L. Galimberti, and G. Ferrari-Trecate, “Learning to Boost the Performance of Stable Nonlinear Systems,”IEEE Open Journal of Control Systems, vol. 3, pp. 342–357, 2024

  17. [17]

    Distributed Neural Network Control with Dependability Guarantees: a Compositional Port-Hamiltonian Approach,

    L. Furieri, C. L. Galimberti, M. Zakwan, and G. Ferrari-Trecate, “Distributed Neural Network Control with Dependability Guarantees: a Compositional Port-Hamiltonian Approach,” inProceedings of The 4th Annual Learning for Dynamics and Control Conference. PMLR, May 2022, pp. 571–583

  18. [18]

    Un- constrained Learning of Networked Nonlinear Systems via Free Parametrization of Stable Interconnected Operators,

    L. Massai, D. Saccani, L. Furieri, and G. Ferrari-Trecate, “Un- constrained Learning of Networked Nonlinear Systems via Free Parametrization of Stable Interconnected Operators,” in2024 Euro- pean Control Conference (ECC), June 2024, pp. 651–656

  19. [19]

    Optimal distributed control with stability guarantees by training a network of neural closed-loop maps,

    D. Saccani, L. Massai, L. Furieri, and G. Ferrari-Trecate, “Optimal distributed control with stability guarantees by training a network of neural closed-loop maps,” July 2024, arXiv:2404.02820

  20. [20]

    Robust Classification Using Contractive Hamiltonian Neural ODEs,

    M. Zakwan, L. Xu, and G. Ferrari-Trecate, “Robust Classification Using Contractive Hamiltonian Neural ODEs,”IEEE Control Systems Letters, vol. 7, pp. 145–150, 2023

  21. [21]

    Skip Connections Eliminate Singularities

    A. E. Orhan and X. Pitkow, “Skip Connections Eliminate Singulari- ties,” Mar. 2018, arXiv:1701.09175

  22. [22]

    LMI Properties and Applications in Systems, Stability, and Control Theory,

    R. J. Caverly and J. R. Forbes, “LMI Properties and Applications in Systems, Stability, and Control Theory,” May 2024, arXiv:1903.08599

  23. [23]

    HiPPO: Re- current Memory with Optimal Polynomial Projections,

    A. Gu, T. Dao, S. Ermon, A. Rudra, and C. Re, “HiPPO: Re- current Memory with Optimal Polynomial Projections,” Oct. 2020, arXiv:2008.07669

  24. [24]

    On the Parameterization and Initialization of Diagonal State Space Models,

    A. Gu, A. Gupta, K. Goel, and C. Ré, “On the Parameterization and Initialization of Diagonal State Space Models,” Aug. 2022, arXiv:2206.11893

  25. [25]

    Three Benchmarks Addressing Open Challenges in Nonlinear System Identification*,

    M. Schoukens and J. P. Noël, “Three Benchmarks Addressing Open Challenges in Nonlinear System Identification*,”IFAC-PapersOnLine, vol. 50, no. 1, pp. 446–451, July 2017. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S2405896317300915

  26. [26]

    A⊤P A−P+C ⊤C A ⊤P B+C ⊤D B⊤P A+D ⊤C B ⊤P B+D ⊤D−γ 2I # ≺0, or, equivalently

    S. Lang,Undergraduate Algebra. Berlin, Heidelberg: Springer Berlin Heidelberg, 1990. APPENDIX A. Proof of Theorem 1 Let us start by proving thatψis a free parametrization. Notice thatθ∈R 6n2+2 and the mapψis defined and continuous onR 6n2+2 apart from those values for which H12 = √β X11X21 ⊤ + ˜C ⊤ ˜D is singular. We see that H12 is a genericn×nmatrix and...