pith. machine review for the scientific record. sign in

arxiv: 2605.14489 · v1 · submitted 2026-05-14 · 💻 cs.LG · cs.SY· eess.SY

Recognition: no theorem link

A Novel Schur-Decomposition-Based Weight Projection Method for Stable State-Space Neural-Network Architectures

Authors on Pith no claims yet

Pith reviewed 2026-05-15 01:33 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.SY
keywords Schur decompositionstate-space neural networksstable dynamicssystem identificationweight projectionasymptotic stabilitydiscrete-time systemslinear layers
0
0 comments X

The pith

Projecting the quasi-triangular factor from the real Schur decomposition of the state matrix onto its nearest stable peer keeps discrete-time state-space neural-network layers asymptotically stable during training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a projection scheme for the state matrix in linear state-space layers inside neural networks that uses the real Schur decomposition to enforce stability. The method dynamically maps the quasi-triangular factor to the closest stable matrix, remaining compatible with backpropagation and adding only a small number of extra parameters. On synthetic linear systems the approach reaches accuracy and convergence speeds similar to existing stable identification techniques. In deeper networks with static nonlinearities applied to real-world data, the reduced parameter count improves training convergence while still meeting strict asymptotic-stability requirements.

Core claim

The central claim is that a backpropagation-compatible projection based on the real Schur decomposition of the state matrix, applied by mapping its quasi-triangular factor onto the nearest stable matrix, guarantees asymptotic stability in discrete-time state-space neural-network architectures while using minimal extra weights and preserving model accuracy comparable to state-of-the-art methods.

What carries the argument

Real Schur decomposition of the state matrix, with dynamic projection of its quasi-triangular factor onto the nearest stable peer.

If this is right

  • Accuracy and convergence rates on synthetic linear systems remain comparable to state-of-the-art stable-system identification techniques.
  • Fewer total weights allow faster convergence when the stable layers are stacked with static nonlinearities for real-world data.
  • The projection supplies a numerically robust way to identify complex dynamics while satisfying strict asymptotic-stability constraints.
  • Only a marginal increase in per-iteration computational cost is required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could support reliable long-horizon forecasting in control or time-series tasks by removing the need for post-training stability fixes.
  • Extending the same projection idea to continuous-time state matrices would widen its applicability to physical-system modeling.
  • Embedding the projection inside larger recurrent architectures might reduce dependence on auxiliary regularization for stability.

Load-bearing premise

Repeatedly projecting the state matrix during training does not distort the optimization landscape or introduce bias that stops the model from reaching accurate solutions on real data.

What would settle it

Training runs on real-world datasets where the projected models reach visibly lower accuracy or fail to converge compared with otherwise identical non-projected models, or where long-horizon simulations of the learned dynamics still exhibit growing trajectories.

Figures

Figures reproduced from arXiv: 2605.14489 by Fredy Ruiz, Lasse Lensu, Sergio Vanegas.

Figure 1
Figure 1. Figure 1: Single-branch structures considered in Schoukens and Tiels [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Eigenvalues of the reference matrix (blue), full projection (orange), and truncated projection [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: GPU benchmark figures. • The p steps, representing the block-diagonal pattern, are stored and returned as an n￾dimensional vector b ∈ {0, 1, 2} n, where a null value indicates that the corresponding element of T ’s diagonal is the second diagonal element of a 2 × 2 block. The performance of the custom Schur-decomposition implementation, running both on CPU and GPU, compared to the built-in schur method as … view at source ↗
Figure 11
Figure 11. Figure 11: This dataset turned out to be more challenging than initially expected, as the [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗
Figure 4
Figure 4. Figure 4: Orthogonal benchmark figures. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effective (cumulative minimum) validation-loss history (normalized w.r.t. the initial value) [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Best and worst cases amongst the random systems (testing partition outputs). [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Real-world dataset visualization. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Validation NMSE history for the best-performing models. The shaded lines correspond to [PITH_FULL_IMAGE:figures/full_fig_p029_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Silverbox test-partition output (a) y1 (b) y2 [PITH_FULL_IMAGE:figures/full_fig_p030_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: CED test-partition output [PITH_FULL_IMAGE:figures/full_fig_p030_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: EMPS test-partition output 30 [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Industrial robot test-partition output 31 [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Fine-Steering Mirror test-partition output (first sequence) [PITH_FULL_IMAGE:figures/full_fig_p032_13.png] view at source ↗
read the original abstract

Building black-box models for dynamical systems from data is a challenging problem in machine learning, especially when asymptotic stability guarantees are required. In this paper, we introduce a novel stability-ensuring and backpropagation-compatible projection scheme based on the Schur decomposition for the state matrix of linear discrete-time state-space layers, as well as an alternative pre-factorized formulation of the methodology. The proposed methods dynamically project the quasi-triangular factor of the state matrix's real Schur decomposition onto its nearest stable peer, ensuring stable dynamics with minimal overparameterization. Experiments on synthetic linear systems demonstrate that the method achieves accuracy and convergence rates comparable to those of state-of-the-art stable-system identification techniques, despite a marginal increase in computational complexity. Furthermore, the lower weight count facilitates convergence during training without sacrificing accuracy in stacked neural-network architectures with static nonlinearities targeting real-world datasets. These results suggest that the Schur-based projection provides a numerically robust framework for identifying complex dynamics on par with the State of the Art while satisfying strict asymptotic-stability requirements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce a Schur-decomposition-based weight projection method for ensuring asymptotic stability in state-space neural-network architectures. The approach projects the quasi-triangular factor of the state matrix's real Schur decomposition onto its nearest stable peer during training, with an alternative pre-factorized formulation. Experiments on synthetic linear systems show comparable accuracy and convergence to state-of-the-art techniques, and the method is shown to facilitate convergence in stacked architectures with static nonlinearities on real-world datasets with minimal overparameterization.

Significance. If the results hold, this method could offer a valuable tool for learning stable dynamical models in machine learning, particularly where stability is required without excessive parameterization. The Schur-based projection appears numerically robust and backpropagation-compatible, potentially advancing the field of stable system identification and neural state-space models.

major comments (2)
  1. [Experiments] The reported experiments on synthetic linear systems demonstrate comparable accuracy but do not include error bars, detailed ablations, or a comparison to post-hoc projection after unconstrained optimization, leaving the impact of repeated projection on the optimization landscape unaddressed.
  2. [Method] The description of the projection operation lacks analysis regarding its Lipschitz continuity, sub-differentiability at the stability boundary, or effects on the loss landscape's smoothness, which are necessary to substantiate the backpropagation compatibility claim.
minor comments (2)
  1. [Abstract] The claim of 'marginal increase in computational complexity' should be supported by a brief quantitative comparison or complexity analysis.
  2. [Introduction] Ensure all acronyms and notations (e.g., for Schur factors) are defined upon first use for improved readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and positive evaluation of the significance of our work. We address each of the major comments below, proposing revisions to strengthen the manuscript where appropriate.

read point-by-point responses
  1. Referee: [Experiments] The reported experiments on synthetic linear systems demonstrate comparable accuracy but do not include error bars, detailed ablations, or a comparison to post-hoc projection after unconstrained optimization, leaving the impact of repeated projection on the optimization landscape unaddressed.

    Authors: We agree that error bars and additional ablations would enhance the robustness of our experimental results. In the revised manuscript, we will include error bars computed over multiple independent runs with different random seeds and provide ablations on key hyperparameters such as the projection frequency. Regarding the comparison to post-hoc projection, we note that our method is designed to enforce stability during the entire training process, which prevents the optimizer from exploring unstable regions that could lead to poor local minima. A post-hoc approach might achieve similar final accuracy but could suffer from instability during training, as highlighted by the convergence issues in unconstrained baselines. We will add a discussion clarifying that the repeated projections smooth the optimization by keeping the state matrix within the stable set, supported by the observed comparable convergence rates. revision: partial

  2. Referee: [Method] The description of the projection operation lacks analysis regarding its Lipschitz continuity, sub-differentiability at the stability boundary, or effects on the loss landscape's smoothness, which are necessary to substantiate the backpropagation compatibility claim.

    Authors: We appreciate this suggestion for a more formal analysis. The projection operation involves modifying the eigenvalues (via the diagonal of the Schur factor) to lie within the unit disk, which is a Lipschitz continuous map with constant 1 in the appropriate matrix norm, as it is a nearest-point projection onto a convex set. At the stability boundary, the operation is sub-differentiable, with the subgradient corresponding to the projection onto the tangent cone. We will add a new subsection in the method section providing this analysis, including a proof sketch of Lipschitz continuity and a note on how the chain rule applies through the projection for backpropagation. This will also address the smoothness of the effective loss landscape, which remains differentiable almost everywhere. revision: yes

Circularity Check

0 steps flagged

Schur-based projection derives from standard linear algebra; no circular reduction to inputs

full rationale

The paper's core method projects the quasi-triangular factor of the real Schur form onto its nearest stable matrix using established properties of Schur decomposition and spectral-radius stability. This construction is independent of any fitted parameters, self-referential definitions, or load-bearing self-citations; it relies on external linear-algebra facts rather than re-deriving or renaming prior results from the same authors. Experiments compare final accuracy to existing techniques without presenting any projection step as a 'prediction' forced by data fitting. The derivation chain remains self-contained against standard matrix theory benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on the existence and computability of the real Schur decomposition together with the notion of a nearest stable matrix; no free parameters, new entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)
  • standard math Every real matrix possesses a real Schur decomposition that can be computed numerically.
    Invoked to obtain the quasi-triangular factor that is then projected.

pith-pipeline@v0.9.0 · 5491 in / 1174 out tokens · 36761 ms · 2026-05-15T01:33:53.095596+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages · 3 internal anchors

  1. [1]

    Model reduction via balanced realizations: an extension and frequency weighting techniques.IEEE Transactions on Automatic Control, 33(7):687–692, 2002

    Ubaid M Al-Saggaf and Gene F Franklin. Model reduction via balanced realizations: an extension and frequency weighting techniques.IEEE Transactions on Automatic Control, 33(7):687–692, 2002

  2. [2]

    Efficiently Modeling Long Sequences with Structured State Spaces

    Albert Gu, Karan Goel, and Christopher Ré. Efficiently modeling long sequences with structured state spaces.arXiv preprint arXiv:2111.00396, 2021

  3. [3]

    Sim- plified state space layers for sequence modeling.arXiv preprint arXiv:2208.04933, 2022

    Jimmy TH Smith, Andrew Warrington, and Scott W Linderman. Simplified state space layers for sequence modeling.arXiv preprint arXiv:2208.04933, 2022

  4. [4]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023

  5. [5]

    Hippo: Recurrent memory with optimal polynomial projections.Advances in neural information processing systems, 33:1474–1487, 2020

    Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, and Christopher Ré. Hippo: Recurrent memory with optimal polynomial projections.Advances in neural information processing systems, 33:1474–1487, 2020

  6. [6]

    Simba: System identification methods leveraging backpropagation.IEEE Transactions on Control Systems Technology, 2024

    Loris Di Natale, Muhammad Zakwan, Philipp Heer, Giancarlo Ferrari-Trecate, and Colin Neil Jones. Simba: System identification methods leveraging backpropagation.IEEE Transactions on Control Systems Technology, 2024

  7. [7]

    An L-BFGS-B approach for linear and nonlinear system identification under ℓ1 and group-Lasso regularization.IEEE Transactions on Automatic Control, 70(7):4857–4864, 2025

    Alberto Bemporad. An L-BFGS-B approach for linear and nonlinear system identification under ℓ1 and group-Lasso regularization.IEEE Transactions on Automatic Control, 70(7):4857–4864, 2025. doi: 10.1109/TAC.2025.3541018

  8. [8]

    Nearest Ω-stable matrix via Riemannian optimization.Numerische Mathematik, 148(4):817–851, 2021

    Vanni Noferini and Federico Poloni. Nearest Ω-stable matrix via Riemannian optimization.Numerische Mathematik, 148(4):817–851, 2021

  9. [9]

    System identification—a survey.Automatica, 7(2):123–162, 1971

    Karl Johan Åström and Peter Eykhoff. System identification—a survey.Automatica, 7(2):123–162, 1971

  10. [10]

    A unifying theorem for three subspace system identification algorithms.Automatica, 31(12):1853–1864, 1995

    Peter Van Overschee and Bart De Moor. A unifying theorem for three subspace system identification algorithms.Automatica, 31(12):1853–1864, 1995

  11. [11]

    Hankel matrices for system identification.Journal of Mathematical Analysis and Applications, 409(1):494–508, 2014

    Bi-Qiang Mu and Han-Fu Chen. Hankel matrices for system identification.Journal of Mathematical Analysis and Applications, 409(1):494–508, 2014

  12. [12]

    An overview of subspace identification.Computers & chemical engineering, 30(10-12): 1502–1513, 2006

    S Joe Qin. An overview of subspace identification.Computers & chemical engineering, 30(10-12): 1502–1513, 2006

  13. [13]

    Identification of block-oriented nonlinear systems starting from linear approximations: A survey.Automatica, 85:272–292, 2017

    Maarten Schoukens and Koen Tiels. Identification of block-oriented nonlinear systems starting from linear approximations: A survey.Automatica, 85:272–292, 2017

  14. [14]

    Modern Koopman theory for dynamical systems.arXiv preprint arXiv:2102.12086, 2021

    Steven L Brunton, Marko Budiši ´c, Eurika Kaiser, and J Nathan Kutz. Modern Koopman theory for dynamical systems.arXiv preprint arXiv:2102.12086, 2021

  15. [15]

    Identification of Hammerstein–Wiener models.Automatica, 49(1):70–81, 2013

    Adrian Wills, Thomas B Schön, Lennart Ljung, and Brett Ninness. Identification of Hammerstein–Wiener models.Automatica, 49(1):70–81, 2013

  16. [16]

    Guaranteed stability with subspace methods.Systems & Control Letters, 26(2): 153–156, 1995

    Jan M Maciejowski. Guaranteed stability with subspace methods.Systems & Control Letters, 26(2): 153–156, 1995

  17. [17]

    Springer Nature, 2022

    Gianluigi Pillonetto, Tianshi Chen, Alessandro Chiuso, Giuseppe De Nicolao, and Lennart Ljung.Regular- ized system identification: Learning dynamic models from data. Springer Nature, 2022

  18. [18]

    On the use of regularization in system identification.IFAC Proceedings Volumes, 26(2):75–80, 1993

    J Sjöberg, T McKelvey, and L Ljung. On the use of regularization in system identification.IFAC Proceedings Volumes, 26(2):75–80, 1993. 10

  19. [19]

    Identification of stable models in subspace identification by using regularization.IEEE Transactions on Automatic control, 46(9): 1416–1420, 2002

    Tony Van Gestel, Johan AK Suykens, Paul Van Dooren, and Bart De Moor. Identification of stable models in subspace identification by using regularization.IEEE Transactions on Automatic control, 46(9): 1416–1420, 2002

  20. [20]

    A comprehensive survey on Mamba: Architectures, challenges, and opportunities.Computer, 58(8):64–76, 2025

    Abdus Salam, Rasel Mahmud, Tohedul Islam, Saddam Mukta, and Swakkhar Shatabda. A comprehensive survey on Mamba: Architectures, challenges, and opportunities.Computer, 58(8):64–76, 2025

  21. [21]

    Simba: Simplified mamba-based architecture for vision and multivariate time series,

    Badri N. Patro and Vijay S. Agneeswaran. SiMBA: Simplified Mamba-based architecture for vision and multivariate time series, 2024. URLhttp://arxiv.org/abs/2403.15360

  22. [22]

    Elsevier, 2004

    Biswa Datta.Numerical methods for linear control systems. Elsevier, 2004

  23. [23]

    Cambridge university press, 2012

    Roger A Horn and Charles R Johnson.Matrix analysis, chapter 2, page 82. Cambridge university press, 2012

  24. [24]

    JHU press, 3rd edition, 1996

    Gene H Golub and Charles F Van Loan.Computing Subspaces with the SVD, pages 601–606. JHU press, 3rd edition, 1996. ISBN 978-0-8018-5414-9

  25. [25]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

  26. [26]

    Nonlinear benchmarks

    Maarten Schoukens. Nonlinear benchmarks. https://github.com/MaartenSchoukens/nonlinear_ benchmarks, 2026

  27. [27]

    The Kalman-Yakubovich-Popov lemma for discrete-time positive linear systems

    Federico Najson. The Kalman-Yakubovich-Popov lemma for discrete-time positive linear systems. In 2012 American Control Conference (ACC), pages 5188–5193. IEEE, 2012

  28. [28]

    Discrete Mathematics and Its Applications

    Leslie Hogben, editor.Handbook of Linear Algebra. Discrete Mathematics and Its Applications. Chapman & Hall/CRC, Philadelphia, PA, 2 edition, November 2013

  29. [29]

    Pritchard.Mathematical Systems Theory I

    Diederich Hinrichsen and Anthony J. Pritchard.Mathematical Systems Theory I. Springer Berlin Hei- delberg, 2005. ISBN 9783540264101. doi: 10.1007/b137541. URL http://dx.doi.org/10.1007/ b137541

  30. [30]

    On computing the distance to stability for matrices using linear dissipative Hamiltonian systems.Automatica, 85:113–121, 2017

    Nicolas Gillis and Punit Sharma. On computing the distance to stability for matrices using linear dissipative Hamiltonian systems.Automatica, 85:113–121, 2017

  31. [31]

    Approximating the nearest stable discrete-time system

    Nicolas Gillis, Michael Karow, and Punit Sharma. Approximating the nearest stable discrete-time system. Linear Algebra and its Applications, 573:37–53, 2019

  32. [32]

    Boumal, B

    N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre. Manopt, a Matlab toolbox for optimization on manifolds.Journal of Machine Learning Research, 15(42):1455–1459, 2014. URL https://www. manopt.org

  33. [33]

    Pymanopt: A Python toolbox for optimization on manifolds using automatic differentiation.Journal of Machine Learning Research, 17(137):1–5, 2016

    James Townsend, Niklas Koep, and Sebastian Weichwald. Pymanopt: A Python toolbox for optimization on manifolds using automatic differentiation.Journal of Machine Learning Research, 17(137):1–5, 2016. URLhttp://jmlr.org/papers/v17/16-177.html

  34. [34]

    Lecture notes in numerical methods for solving large scale eigenvalue problems, 2016

    Peter Arbenz. Lecture notes in numerical methods for solving large scale eigenvalue problems, 2016

  35. [35]

    A new deflation criterion for the QR algorithm.LAPACK Working Note, 122, 1997

    Mario Ahues and Francoise Tisseur. A new deflation criterion for the QR algorithm.LAPACK Working Note, 122, 1997

  36. [36]

    Oxford university press, 2004

    John C Gower and Garmt B Dijksterhuis.Procrustes problems, volume 30. Oxford university press, 2004

  37. [37]

    Closed-form solution of absolute orientation using orthonormal matrices.Journal of the Optical Society of America A, 5(7):1127–1135, 1988

    Berthold KP Horn, Hugh M Hilden, and Shahriar Negahdaripour. Closed-form solution of absolute orientation using orthonormal matrices.Journal of the Optical Society of America A, 5(7):1127–1135, 1988

  38. [38]

    On the computation of a matrix inverse square root.Computing, 46(4):295–305, December

    Nagwa Sherif. On the computation of a matrix inverse square root.Computing, 46(4):295–305, December

  39. [39]

    doi: 10.1007/bf02257775

    ISSN 1436-5057. doi: 10.1007/bf02257775. URLhttp://dx.doi.org/10.1007/BF02257775

  40. [40]

    Three free data sets for development and benchmarking in nonlinear system identification

    Torbjorn Wigren and Johan Schoukens. Three free data sets for development and benchmarking in nonlinear system identification. In2013 European Control Conference (ECC), pages 2933–2938. IEEE, 7 2013. doi: 10.23919/ecc.2013.6669201. URLhttp://dx.doi.org/10.23919/ECC.2013.6669201

  41. [41]

    Coupled electric drives data set and reference models

    Torbjörn Wigren and Maarten Schoukens. Coupled electric drives data set and reference models. Technical report, Uppsala University Sweden, 2017. 11

  42. [42]

    Data set and reference models of EMPS

    Alexandre Janot, Maxime Gautier, and Mathieu Brunot. Data set and reference models of EMPS. In Nonlinear system identification benchmarks, 2019

  43. [43]

    Dataset and baseline for an industrial robot identification benchmark, 2022

    Jonas Weigand, Julian Götz, Jonas Ulmen, and Martin Ruskowski. Dataset and baseline for an industrial robot identification benchmark, 2022. URL https://fdm-fallback.uni-kl.de/TUK/FB/MV/WSKL/ 0001/

  44. [44]

    Data-driven state-space identification and nonlinearity assessment of the CubeSpec Fine Steering Mirror.ISMA-USD2024, pages 2042–2052, 2024

    Merijn Floren, Leonardo Peri, Jeroen De Maeyer, Wim De Munter, Dirk Vandepitte, and Jean-Philippe Noël. Data-driven state-space identification and nonlinearity assessment of the CubeSpec Fine Steering Mirror.ISMA-USD2024, pages 2042–2052, 2024

  45. [45]

    Optuna: A next-generation hyperparameter optimization framework

    Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A next-generation hyperparameter optimization framework. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019

  46. [46]

    Baseline results for selected nonlinear system identification benchmarks.IFAC-PapersOnLine, 58(15):474–479, 2024

    Max Champneys, Gerben I Beintema, Roland Tóth, Maarten Schoukens, and Timothy J Rogers. Baseline results for selected nonlinear system identification benchmarks.IFAC-PapersOnLine, 58(15):474–479, 2024

  47. [47]

    Baseline models for the Fine Steering Mirror benchmark dataset

    Merijn Floren. Baseline models for the Fine Steering Mirror benchmark dataset. https://doi.org/ 10.5281/zenodo.19591136, 2026. URL https://doi.org/10.5281/zenodo.19591136. Version v2026.04.15. A Computing the Nearest Real 2-by-2 Schur-Stable Matrix This appendix is included for the sake of convenience, but it is taken almost verbatim from Noferini and Pol...

  48. [48]

    IfA 2,1 ̸=−A 1,2, then ˆA1,1 = ˆA2,2

  49. [49]

    A non-empty closed set S ⊆R 2×2 is doubly rotation-invariant if X∈ S=⇒U(α)XU(β) ⊺ ∈ S ∀{α, β} ∈R 2

    If A2,1 =−A 1,2, then ∃α∈[0, π/2) : ˆA=U(α) ˇAU(α) ⊺, where ˇA is another local minimizer with the same objective value and ˇA1,1 = ˇA2,2. A non-empty closed set S ⊆R 2×2 is doubly rotation-invariant if X∈ S=⇒U(α)XU(β) ⊺ ∈ S ∀{α, β} ∈R 2. Lemma A.4.LetS ∈R 2×2 be doubly rotation-invariant, and let ˆAbe a local minimizer of min X∈S ∥A−X∥ F for some diagona...

  50. [50]

    Ifσ 1 ̸=σ 2, then ˆAis diagonal with ˆA1,1 − ˆA2,2 ≥0and ˆA1,1 + ˆA2,2 ≥0

  51. [51]

    The result in Lemma A.5 is a variant of the Routh-Hurwitz stability criterion (see Hogben [28, Sec

    If σ1 =σ 2, then ∃α∈[0, π/2) : ˆA=U(α) ˇAU(α) ⊺, where ˇA is another local minimizer with the same objective value and ˇA1,1 = ˇA2,2 = 0. The result in Lemma A.5 is a variant of the Routh-Hurwitz stability criterion (see Hogben [28, Sec. 26.2]). Lemma A.5.Both roots of the polynomial p(λ) =aλ 2 +bλ+c , with a̸= 0 , lie in the closed left half-plane{λ∈C:Re...

  52. [52]

    ˆA∈∂S 0, ˆA/∈∂S + ∪∂S −

  53. [53]

    ˆA∈∂S +, ˆA/∈∂S 0 ∪∂S −

  54. [54]

    ˆA∈∂S −, ˆA/∈∂S 0 ∪∂S +

  55. [55]

    ˆA∈∂S 0 ∩∂S +, ˆA/∈∂S −

  56. [56]

    ˆA∈∂S 0 ∩∂S −, ˆA/∈∂S +

  57. [57]

    The six cases are treated separately:

    ˆA∈∂S + ∩∂S −, ˆA/∈∂S 0. The six cases are treated separately:

  58. [58]

    Then, ˆA is a local minimizer of ∥A−X∥ F in the doubly rotation-invariant set ∂S0, and Ξ=U ⊺ 0 ˆAV0 is a local minimizer of σ0,1 σ0,2 −X F in∂S 0

    ˆA∈∂S 0, ˆA/∈∂S + ∪∂S −. Then, ˆA is a local minimizer of ∥A−X∥ F in the doubly rotation-invariant set ∂S0, and Ξ=U ⊺ 0 ˆAV0 is a local minimizer of σ0,1 σ0,2 −X F in∂S 0. If σ0,1 ̸=σ 0,2, then Ξ is diagonal by Lemma A.4; and for Ξ= [ τ1 τ2 ] (with τ1τ2 = det(Ξ) = 1), it is necessary to have(τ 1, τ2)∈ M(σ 0,1, σ0,2). Hence, ˆA∈ ˆA0. Otherwise, ifσ 0,1 =σ ...

  59. [59]

    Then, min X∈∂S + ∥A−X∥ F = min X∈∂S + ∥U ⊺ +(A−I)V + −U ⊺ +(X−I)V +∥F

    ˆA∈∂S +, ˆA/∈∂S 0 ∪∂S −. Then, min X∈∂S + ∥A−X∥ F = min X∈∂S + ∥U ⊺ +(A−I)V + −U ⊺ +(X−I)V +∥F . Hence, ˆA is a local minimizer of ∥A−X∥ F if and only if Ξ=U ⊺ +(X−I)V + in ∂Sd ≜ X∈R 2×2 : det(X) = 0 : indeed, X has an eigenvalue 1 if and only if Ξ has determinant 0. Thus, either Ξ= [ σ+,1 0 ] and hence ˆA= ˆA+, or a minimizer with the same objective valu...

  60. [60]

    Proof as in case 2, swapping all plus and minus signs

    ˆA∈∂S −, ˆA/∈∂S 0 ∪∂S +. Proof as in case 2, swapping all plus and minus signs

  61. [61]

    Note that ∂S0 ∩∂S + is the rotation-invariant set of matrices with double eigenvalue +1 (since they must have eigenvalue +1 and determinant 1)

    ˆA∈∂S 0 ∩∂S +, ˆA/∈∂S −. Note that ∂S0 ∩∂S + is the rotation-invariant set of matrices with double eigenvalue +1 (since they must have eigenvalue +1 and determinant 1). ˆA is a minimizer if and only if Ξ=G ⊺ ˆAG is a minimizer of ∥ ˇA−X∥ in ∂S0 ∩∂S +; so, by Lemma A.3, 1 2 tr(Ξ) = Ξ1,1 = Ξ2,2 = 1 2 tr( ˆA) = +1. Since det(Ξ) = 1, at least one among Ξ1,2 a...

  62. [62]

    Proof as in case 4, swapping all plus and minus signs

    ˆA∈∂S 0 ∩∂S −, ˆA/∈∂S +. Proof as in case 4, swapping all plus and minus signs

  63. [63]

    Note that∂S + ∩∂S − is the rotation-invariant set of matrices with eigenvalues1and−1

    ˆA∈∂S + ∩∂S −, ˆA/∈∂S 0. Note that∂S + ∩∂S − is the rotation-invariant set of matrices with eigenvalues1and−1. Arguing as in case 4, G⊺ ˆAG=Ξ= 0τ 1 τ2 0 , τ 1τ2 = 1, as Ξ must have Ξ1,1 = Ξ 2,2 = 1 2 tr(Ξ) = 0 and det(Ξ) = 1 . Moreover, to minimize ∥ ˇA−Ξ∥ F ,(τ 1, τ2)must belong toM( ˇA1,2, ˇA2,1). Thus, ˆA∈ A ∗. B Projection Scheme - Full Benchmark The ...

  64. [64]

    Random with independent entries inN(0,1)

  65. [65]

    The resulting eigenvalues are shown in the complex plane in Figure 2

    Random with independent entries inU([0,1]). The resulting eigenvalues are shown in the complex plane in Figure 2. Following the setup from Noferini and Poloni [8], the maximum number of iterations and execution time were set to1000and1000seconds, respectively. However, the iteration limit only controls the maximum number of outer iterations, meaning that ...

  66. [66]

    The execution time of each method (Figure 4a)

  67. [67]

    This metric indicates whether or not each implementation yields a valid factorization, regardless of the factors’ orthogonality and sparsity pattern

    The reconstruction error (Figure 4b), defined as the NSFE between the original matrix A and the product of its Schur factors ZT Z ⊺. This metric indicates whether or not each implementation yields a valid factorization, regardless of the factors’ orthogonality and sparsity pattern

  68. [68]

    This metric indicates whether Z fulfils the orthogonality requirement

    The orthogonality error (Figure 4c), defined as the NSFE between the identity matrix and the symmetric matrix ZZ ⊺. This metric indicates whether Z fulfils the orthogonality requirement

  69. [69]

    By Theorem 3.2, under a correct factorization, this metric should only be bound by numerical error

    The spectral error (Figure 4c), defined as the NSSR between the original matrix A and the quasi-triangular factor T . By Theorem 3.2, under a correct factorization, this metric should only be bound by numerical error. This experiment is executed on an isolated HPC instance with 4 Xeon Gold 6230 logical cores, 32 GiB of RAM, and a single Nvidia V olta V100...

  70. [70]

    ˆZ=U V ∗.(18)

    The method from Golub and Van Loan [24], which finds the nearest orthogonal matrix by multiplying together the unitary factors from the Singular-Value Decomposition Z= UΣV ⊺; i.e. ˆZ=U V ∗.(18)

  71. [71]

    The method from Horn et al. [37], which defines the closed form ˆZ=Z(Z ⊺Z) −1/2 , calculating the inverse square root of the product in parentheses through its eigenvalue decomposition(Z ⊺Z) =QΛQ −1 as (Z ⊺Z) −1/2 =QΛ −1/2Q−1 =Q   1√ λ1 ... 1√ λnx  Q−1.(19)

  72. [72]

    [37], but instead approximating the inverse square root using the stable iterative scheme from Sherif [38, Section 3]; i.e., via the recursion X −1/2 = lim r→∞ Ξr s.t

    The same method from Horn et al. [37], but instead approximating the inverse square root using the stable iterative scheme from Sherif [38, Section 3]; i.e., via the recursion X −1/2 = lim r→∞ Ξr s.t. Ξr+1 =Ξ r (I+E r),Ξ 0 =I, Er+1 =E 2 r 2I−S 2 r ,E 0 = (I−X) (I+X) −1 . (20) Since the methods are jit-compiled during training, the stop condition for Equat...

  73. [73]

    Execution times for the Schur-stable projection, the Schur decomposition, and the spectral norm are also included for scale and relative-cost analysis

    The execution time of each method (Figure 4a). Execution times for the Schur-stable projection, the Schur decomposition, and the spectral norm are also included for scale and relative-cost analysis. 20

  74. [74]

    This metric should only be evaluated relative to other methods, as a null value should only be achievable ifZis already orthogonal

    The projection distance (Figure 4b), defined as the NSFE between the original matrix Z and its projection ˆZ. This metric should only be evaluated relative to other methods, as a null value should only be achievable ifZis already orthogonal

  75. [75]

    Unlike the projection distance, if ˆZ∈O(n x), then this metric should be approximately 0 (bounded by numerical error)

    The orthogonality numerical error (Figure 4c), defined as the NSFE between the identity matrix and ˆZ ⊺ ˆZ. Unlike the projection distance, if ˆZ∈O(n x), then this metric should be approximately 0 (bounded by numerical error). This experiment is executed on an interactive HPC instance with 4 Xeon Gold 6230 logical cores and 8 GiB of RAM. Each metric is sa...

  76. [76]

    It is built as a 2nd order linear time-invariant system with a 3rd degree polynomial static nonlinearity around it in feedback

    Silverbox[ 39], an electronic implementation of the Duffing oscillator. It is built as a 2nd order linear time-invariant system with a 3rd degree polynomial static nonlinearity around it in feedback

  77. [77]

    The pulley is held by a spring, resulting in a lightly damped dynamic mode

    CED[ 40], or Coupled Electric Drives, consists of two electric motors that drive a pulley using a flexible belt. The pulley is held by a spring, resulting in a lightly damped dynamic mode. The electric drives can be individually controlled, allowing the tension and the speed of the belt to be simultaneously controlled. The drive control is symmetric aroun...

  78. [78]

    The main source of nonlinearity is caused by friction effects that are present in the setup

    EMPS[ 41], or Electro-Mechanical Positioning System, is a standard configuration of a drive system for the prismatic joint of robots or machine tools. The main source of nonlinearity is caused by friction effects that are present in the setup. Due to the presence of a pure integrator in the system, the measurements are taken in a closed-loop setting. The ...

  79. [79]

    Industrial Robot[ 42], an identification benchmark dataset for a full robot movement with a KUKA KR300 R2500 ultra SE industrial robot. It is a robot with a nominal payload capacity of 300 kg, a weight of 1120 kg, and a reach of 2500 mm, exhibiting 12 states accounting for position and velocity for each of the 6 joints. The robot encounters backlash in al...

  80. [80]

    V oltages applied to three piezo-actuators serve as inputs, while the mirror displacements measured at three non-collocated reference points serve as outputs

    Fine-Steering Mirror[ 43], a multi-input multi-output high-precision control platform used in a small satellite. V oltages applied to three piezo-actuators serve as inputs, while the mirror displacements measured at three non-collocated reference points serve as outputs. The excitation signals are orthogonal random-phase multisines spanning a wide frequen...