arxiv: 2605.14489 · v1 · submitted 2026-05-14 · 💻 cs.LG · cs.SY· eess.SY

Recognition: no theorem link

A Novel Schur-Decomposition-Based Weight Projection Method for Stable State-Space Neural-Network Architectures

Sergio Vanegas , Lasse Lensu , Fredy Ruiz

Authors on Pith no claims yet

Pith reviewed 2026-05-15 01:33 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.SY

keywords Schur decompositionstate-space neural networksstable dynamicssystem identificationweight projectionasymptotic stabilitydiscrete-time systemslinear layers

0 comments

The pith

Projecting the quasi-triangular factor from the real Schur decomposition of the state matrix onto its nearest stable peer keeps discrete-time state-space neural-network layers asymptotically stable during training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a projection scheme for the state matrix in linear state-space layers inside neural networks that uses the real Schur decomposition to enforce stability. The method dynamically maps the quasi-triangular factor to the closest stable matrix, remaining compatible with backpropagation and adding only a small number of extra parameters. On synthetic linear systems the approach reaches accuracy and convergence speeds similar to existing stable identification techniques. In deeper networks with static nonlinearities applied to real-world data, the reduced parameter count improves training convergence while still meeting strict asymptotic-stability requirements.

Core claim

The central claim is that a backpropagation-compatible projection based on the real Schur decomposition of the state matrix, applied by mapping its quasi-triangular factor onto the nearest stable matrix, guarantees asymptotic stability in discrete-time state-space neural-network architectures while using minimal extra weights and preserving model accuracy comparable to state-of-the-art methods.

What carries the argument

Real Schur decomposition of the state matrix, with dynamic projection of its quasi-triangular factor onto the nearest stable peer.

If this is right

Accuracy and convergence rates on synthetic linear systems remain comparable to state-of-the-art stable-system identification techniques.
Fewer total weights allow faster convergence when the stable layers are stacked with static nonlinearities for real-world data.
The projection supplies a numerically robust way to identify complex dynamics while satisfying strict asymptotic-stability constraints.
Only a marginal increase in per-iteration computational cost is required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could support reliable long-horizon forecasting in control or time-series tasks by removing the need for post-training stability fixes.
Extending the same projection idea to continuous-time state matrices would widen its applicability to physical-system modeling.
Embedding the projection inside larger recurrent architectures might reduce dependence on auxiliary regularization for stability.

Load-bearing premise

Repeatedly projecting the state matrix during training does not distort the optimization landscape or introduce bias that stops the model from reaching accurate solutions on real data.

What would settle it

Training runs on real-world datasets where the projected models reach visibly lower accuracy or fail to converge compared with otherwise identical non-projected models, or where long-horizon simulations of the learned dynamics still exhibit growing trajectories.

Figures

Figures reproduced from arXiv: 2605.14489 by Fredy Ruiz, Lasse Lensu, Sergio Vanegas.

**Figure 2.** Figure 2: Eigenvalues of the reference matrix (blue), full projection (orange), and truncated projection [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗

**Figure 3.** Figure 3: GPU benchmark figures. • The p steps, representing the block-diagonal pattern, are stored and returned as an ndimensional vector b ∈ {0, 1, 2} n, where a null value indicates that the corresponding element of T ’s diagonal is the second diagonal element of a 2 × 2 block. The performance of the custom Schur-decomposition implementation, running both on CPU and GPU, compared to the built-in schur method as … view at source ↗

**Figure 11.** Figure 11: This dataset turned out to be more challenging than initially expected, as the [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗

**Figure 4.** Figure 4: Orthogonal benchmark figures. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗

**Figure 5.** Figure 5: Effective (cumulative minimum) validation-loss history (normalized w.r.t. the initial value) [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗

**Figure 6.** Figure 6: Best and worst cases amongst the random systems (testing partition outputs). [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗

**Figure 7.** Figure 7: Real-world dataset visualization. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_7.png] view at source ↗

**Figure 8.** Figure 8: Validation NMSE history for the best-performing models. The shaded lines correspond to [PITH_FULL_IMAGE:figures/full_fig_p029_8.png] view at source ↗

**Figure 9.** Figure 9: Silverbox test-partition output (a) y1 (b) y2 [PITH_FULL_IMAGE:figures/full_fig_p030_9.png] view at source ↗

**Figure 10.** Figure 10: CED test-partition output [PITH_FULL_IMAGE:figures/full_fig_p030_10.png] view at source ↗

**Figure 11.** Figure 11: EMPS test-partition output 30 [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗

**Figure 12.** Figure 12: Industrial robot test-partition output 31 [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗

**Figure 13.** Figure 13: Fine-Steering Mirror test-partition output (first sequence) [PITH_FULL_IMAGE:figures/full_fig_p032_13.png] view at source ↗

read the original abstract

Building black-box models for dynamical systems from data is a challenging problem in machine learning, especially when asymptotic stability guarantees are required. In this paper, we introduce a novel stability-ensuring and backpropagation-compatible projection scheme based on the Schur decomposition for the state matrix of linear discrete-time state-space layers, as well as an alternative pre-factorized formulation of the methodology. The proposed methods dynamically project the quasi-triangular factor of the state matrix's real Schur decomposition onto its nearest stable peer, ensuring stable dynamics with minimal overparameterization. Experiments on synthetic linear systems demonstrate that the method achieves accuracy and convergence rates comparable to those of state-of-the-art stable-system identification techniques, despite a marginal increase in computational complexity. Furthermore, the lower weight count facilitates convergence during training without sacrificing accuracy in stacked neural-network architectures with static nonlinearities targeting real-world datasets. These results suggest that the Schur-based projection provides a numerically robust framework for identifying complex dynamics on par with the State of the Art while satisfying strict asymptotic-stability requirements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The Schur-based projection gives a workable way to enforce stability in discrete state-space layers with low overhead, but the experiments leave open whether repeated projection distorts the loss surface.

read the letter

Hi, the main point is that this paper gives a dynamic projection for the state matrix in linear discrete-time state-space layers. It takes the real Schur form, isolates the quasi-triangular factor, and replaces it with its nearest stable counterpart so the spectral radius stays below one. The method is backprop-compatible and keeps the parameter count low compared with some other stability tricks. They also give a pre-factorized variant that may cut compute. On synthetic linear systems the accuracy and convergence look comparable to existing stable identification methods, and the lower weight count seems to help when the layers are stacked with static nonlinearities on real data. That part is useful for anyone who needs guaranteed asymptotic stability without heavy overparameterization. The linear-algebra steps are standard, so the contribution is really the integration into the training loop and the claim that it stays numerically robust. The soft spots are in the evidence. The abstract reports comparable performance but supplies no error bars, no ablation on the projection frequency, and no check on whether the operation introduces bias or non-smoothness that steers the optimizer away from better points. The stress-test concern about the effective Hessian and sub-differentiability at the boundary is not addressed in the given text. A post-hoc projection after unconstrained training might reach similar or better accuracy on the same manifold, but that comparison is missing. The math itself has no circularity or invented entities, and the citation pattern looks ordinary. This is for people working on stable dynamical models or control-oriented networks. A reader who needs a concrete, implementable constraint will find the description clear. I would send it to peer review so the full experiments and any gradient analysis can be checked; the core idea is grounded enough to justify referee time even if revisions are required on the optimization side.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce a Schur-decomposition-based weight projection method for ensuring asymptotic stability in state-space neural-network architectures. The approach projects the quasi-triangular factor of the state matrix's real Schur decomposition onto its nearest stable peer during training, with an alternative pre-factorized formulation. Experiments on synthetic linear systems show comparable accuracy and convergence to state-of-the-art techniques, and the method is shown to facilitate convergence in stacked architectures with static nonlinearities on real-world datasets with minimal overparameterization.

Significance. If the results hold, this method could offer a valuable tool for learning stable dynamical models in machine learning, particularly where stability is required without excessive parameterization. The Schur-based projection appears numerically robust and backpropagation-compatible, potentially advancing the field of stable system identification and neural state-space models.

major comments (2)

[Experiments] The reported experiments on synthetic linear systems demonstrate comparable accuracy but do not include error bars, detailed ablations, or a comparison to post-hoc projection after unconstrained optimization, leaving the impact of repeated projection on the optimization landscape unaddressed.
[Method] The description of the projection operation lacks analysis regarding its Lipschitz continuity, sub-differentiability at the stability boundary, or effects on the loss landscape's smoothness, which are necessary to substantiate the backpropagation compatibility claim.

minor comments (2)

[Abstract] The claim of 'marginal increase in computational complexity' should be supported by a brief quantitative comparison or complexity analysis.
[Introduction] Ensure all acronyms and notations (e.g., for Schur factors) are defined upon first use for improved readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and positive evaluation of the significance of our work. We address each of the major comments below, proposing revisions to strengthen the manuscript where appropriate.

read point-by-point responses

Referee: [Experiments] The reported experiments on synthetic linear systems demonstrate comparable accuracy but do not include error bars, detailed ablations, or a comparison to post-hoc projection after unconstrained optimization, leaving the impact of repeated projection on the optimization landscape unaddressed.

Authors: We agree that error bars and additional ablations would enhance the robustness of our experimental results. In the revised manuscript, we will include error bars computed over multiple independent runs with different random seeds and provide ablations on key hyperparameters such as the projection frequency. Regarding the comparison to post-hoc projection, we note that our method is designed to enforce stability during the entire training process, which prevents the optimizer from exploring unstable regions that could lead to poor local minima. A post-hoc approach might achieve similar final accuracy but could suffer from instability during training, as highlighted by the convergence issues in unconstrained baselines. We will add a discussion clarifying that the repeated projections smooth the optimization by keeping the state matrix within the stable set, supported by the observed comparable convergence rates. revision: partial
Referee: [Method] The description of the projection operation lacks analysis regarding its Lipschitz continuity, sub-differentiability at the stability boundary, or effects on the loss landscape's smoothness, which are necessary to substantiate the backpropagation compatibility claim.

Authors: We appreciate this suggestion for a more formal analysis. The projection operation involves modifying the eigenvalues (via the diagonal of the Schur factor) to lie within the unit disk, which is a Lipschitz continuous map with constant 1 in the appropriate matrix norm, as it is a nearest-point projection onto a convex set. At the stability boundary, the operation is sub-differentiable, with the subgradient corresponding to the projection onto the tangent cone. We will add a new subsection in the method section providing this analysis, including a proof sketch of Lipschitz continuity and a note on how the chain rule applies through the projection for backpropagation. This will also address the smoothness of the effective loss landscape, which remains differentiable almost everywhere. revision: yes

Circularity Check

0 steps flagged

Schur-based projection derives from standard linear algebra; no circular reduction to inputs

full rationale

The paper's core method projects the quasi-triangular factor of the real Schur form onto its nearest stable matrix using established properties of Schur decomposition and spectral-radius stability. This construction is independent of any fitted parameters, self-referential definitions, or load-bearing self-citations; it relies on external linear-algebra facts rather than re-deriving or renaming prior results from the same authors. Experiments compare final accuracy to existing techniques without presenting any projection step as a 'prediction' forced by data fitting. The derivation chain remains self-contained against standard matrix theory benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on the existence and computability of the real Schur decomposition together with the notion of a nearest stable matrix; no free parameters, new entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)

standard math Every real matrix possesses a real Schur decomposition that can be computed numerically.
Invoked to obtain the quasi-triangular factor that is then projected.

pith-pipeline@v0.9.0 · 5491 in / 1174 out tokens · 36761 ms · 2026-05-15T01:33:53.095596+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages · 3 internal anchors

[1]

Model reduction via balanced realizations: an extension and frequency weighting techniques.IEEE Transactions on Automatic Control, 33(7):687–692, 2002

Ubaid M Al-Saggaf and Gene F Franklin. Model reduction via balanced realizations: an extension and frequency weighting techniques.IEEE Transactions on Automatic Control, 33(7):687–692, 2002

work page 2002
[2]

Efficiently Modeling Long Sequences with Structured State Spaces

Albert Gu, Karan Goel, and Christopher Ré. Efficiently modeling long sequences with structured state spaces.arXiv preprint arXiv:2111.00396, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[3]

Sim- plified state space layers for sequence modeling.arXiv preprint arXiv:2208.04933, 2022

Jimmy TH Smith, Andrew Warrington, and Scott W Linderman. Simplified state space layers for sequence modeling.arXiv preprint arXiv:2208.04933, 2022

work page arXiv 2022
[4]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[5]

Hippo: Recurrent memory with optimal polynomial projections.Advances in neural information processing systems, 33:1474–1487, 2020

Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, and Christopher Ré. Hippo: Recurrent memory with optimal polynomial projections.Advances in neural information processing systems, 33:1474–1487, 2020

work page 2020
[6]

Simba: System identification methods leveraging backpropagation.IEEE Transactions on Control Systems Technology, 2024

Loris Di Natale, Muhammad Zakwan, Philipp Heer, Giancarlo Ferrari-Trecate, and Colin Neil Jones. Simba: System identification methods leveraging backpropagation.IEEE Transactions on Control Systems Technology, 2024

work page 2024
[7]

An L-BFGS-B approach for linear and nonlinear system identification under ℓ1 and group-Lasso regularization.IEEE Transactions on Automatic Control, 70(7):4857–4864, 2025

Alberto Bemporad. An L-BFGS-B approach for linear and nonlinear system identification under ℓ1 and group-Lasso regularization.IEEE Transactions on Automatic Control, 70(7):4857–4864, 2025. doi: 10.1109/TAC.2025.3541018

work page doi:10.1109/tac.2025.3541018 2025
[8]

Nearest Ω-stable matrix via Riemannian optimization.Numerische Mathematik, 148(4):817–851, 2021

Vanni Noferini and Federico Poloni. Nearest Ω-stable matrix via Riemannian optimization.Numerische Mathematik, 148(4):817–851, 2021

work page 2021
[9]

System identification—a survey.Automatica, 7(2):123–162, 1971

Karl Johan Åström and Peter Eykhoff. System identification—a survey.Automatica, 7(2):123–162, 1971

work page 1971
[10]

A unifying theorem for three subspace system identification algorithms.Automatica, 31(12):1853–1864, 1995

Peter Van Overschee and Bart De Moor. A unifying theorem for three subspace system identification algorithms.Automatica, 31(12):1853–1864, 1995

work page 1995
[11]

Hankel matrices for system identification.Journal of Mathematical Analysis and Applications, 409(1):494–508, 2014

Bi-Qiang Mu and Han-Fu Chen. Hankel matrices for system identification.Journal of Mathematical Analysis and Applications, 409(1):494–508, 2014

work page 2014
[12]

An overview of subspace identification.Computers & chemical engineering, 30(10-12): 1502–1513, 2006

S Joe Qin. An overview of subspace identification.Computers & chemical engineering, 30(10-12): 1502–1513, 2006

work page 2006
[13]

Identification of block-oriented nonlinear systems starting from linear approximations: A survey.Automatica, 85:272–292, 2017

Maarten Schoukens and Koen Tiels. Identification of block-oriented nonlinear systems starting from linear approximations: A survey.Automatica, 85:272–292, 2017

work page 2017
[14]

Modern Koopman theory for dynamical systems.arXiv preprint arXiv:2102.12086, 2021

Steven L Brunton, Marko Budiši ´c, Eurika Kaiser, and J Nathan Kutz. Modern Koopman theory for dynamical systems.arXiv preprint arXiv:2102.12086, 2021

work page arXiv 2021
[15]

Identification of Hammerstein–Wiener models.Automatica, 49(1):70–81, 2013

Adrian Wills, Thomas B Schön, Lennart Ljung, and Brett Ninness. Identification of Hammerstein–Wiener models.Automatica, 49(1):70–81, 2013

work page 2013
[16]

Guaranteed stability with subspace methods.Systems & Control Letters, 26(2): 153–156, 1995

Jan M Maciejowski. Guaranteed stability with subspace methods.Systems & Control Letters, 26(2): 153–156, 1995

work page 1995
[17]

Springer Nature, 2022

Gianluigi Pillonetto, Tianshi Chen, Alessandro Chiuso, Giuseppe De Nicolao, and Lennart Ljung.Regular- ized system identification: Learning dynamic models from data. Springer Nature, 2022

work page 2022
[18]

On the use of regularization in system identification.IFAC Proceedings Volumes, 26(2):75–80, 1993

J Sjöberg, T McKelvey, and L Ljung. On the use of regularization in system identification.IFAC Proceedings Volumes, 26(2):75–80, 1993. 10

work page 1993
[19]

Identification of stable models in subspace identification by using regularization.IEEE Transactions on Automatic control, 46(9): 1416–1420, 2002

Tony Van Gestel, Johan AK Suykens, Paul Van Dooren, and Bart De Moor. Identification of stable models in subspace identification by using regularization.IEEE Transactions on Automatic control, 46(9): 1416–1420, 2002

work page 2002
[20]

A comprehensive survey on Mamba: Architectures, challenges, and opportunities.Computer, 58(8):64–76, 2025

Abdus Salam, Rasel Mahmud, Tohedul Islam, Saddam Mukta, and Swakkhar Shatabda. A comprehensive survey on Mamba: Architectures, challenges, and opportunities.Computer, 58(8):64–76, 2025

work page 2025
[21]

Simba: Simplified mamba-based architecture for vision and multivariate time series,

Badri N. Patro and Vijay S. Agneeswaran. SiMBA: Simplified Mamba-based architecture for vision and multivariate time series, 2024. URLhttp://arxiv.org/abs/2403.15360

work page arXiv 2024
[22]

Elsevier, 2004

Biswa Datta.Numerical methods for linear control systems. Elsevier, 2004

work page 2004
[23]

Cambridge university press, 2012

Roger A Horn and Charles R Johnson.Matrix analysis, chapter 2, page 82. Cambridge university press, 2012

work page 2012
[24]

JHU press, 3rd edition, 1996

Gene H Golub and Charles F Van Loan.Computing Subspaces with the SVD, pages 601–606. JHU press, 3rd edition, 1996. ISBN 978-0-8018-5414-9

work page 1996
[25]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[26]

Nonlinear benchmarks

Maarten Schoukens. Nonlinear benchmarks. https://github.com/MaartenSchoukens/nonlinear_ benchmarks, 2026

work page 2026
[27]

The Kalman-Yakubovich-Popov lemma for discrete-time positive linear systems

Federico Najson. The Kalman-Yakubovich-Popov lemma for discrete-time positive linear systems. In 2012 American Control Conference (ACC), pages 5188–5193. IEEE, 2012

work page 2012
[28]

Discrete Mathematics and Its Applications

Leslie Hogben, editor.Handbook of Linear Algebra. Discrete Mathematics and Its Applications. Chapman & Hall/CRC, Philadelphia, PA, 2 edition, November 2013

work page 2013
[29]

Pritchard.Mathematical Systems Theory I

Diederich Hinrichsen and Anthony J. Pritchard.Mathematical Systems Theory I. Springer Berlin Hei- delberg, 2005. ISBN 9783540264101. doi: 10.1007/b137541. URL http://dx.doi.org/10.1007/ b137541

work page doi:10.1007/b137541 2005
[30]

On computing the distance to stability for matrices using linear dissipative Hamiltonian systems.Automatica, 85:113–121, 2017

Nicolas Gillis and Punit Sharma. On computing the distance to stability for matrices using linear dissipative Hamiltonian systems.Automatica, 85:113–121, 2017

work page 2017
[31]

Approximating the nearest stable discrete-time system

Nicolas Gillis, Michael Karow, and Punit Sharma. Approximating the nearest stable discrete-time system. Linear Algebra and its Applications, 573:37–53, 2019

work page 2019
[32]

Boumal, B

N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre. Manopt, a Matlab toolbox for optimization on manifolds.Journal of Machine Learning Research, 15(42):1455–1459, 2014. URL https://www. manopt.org

work page 2014
[33]

Pymanopt: A Python toolbox for optimization on manifolds using automatic differentiation.Journal of Machine Learning Research, 17(137):1–5, 2016

James Townsend, Niklas Koep, and Sebastian Weichwald. Pymanopt: A Python toolbox for optimization on manifolds using automatic differentiation.Journal of Machine Learning Research, 17(137):1–5, 2016. URLhttp://jmlr.org/papers/v17/16-177.html

work page 2016
[34]

Lecture notes in numerical methods for solving large scale eigenvalue problems, 2016

Peter Arbenz. Lecture notes in numerical methods for solving large scale eigenvalue problems, 2016

work page 2016
[35]

A new deflation criterion for the QR algorithm.LAPACK Working Note, 122, 1997

Mario Ahues and Francoise Tisseur. A new deflation criterion for the QR algorithm.LAPACK Working Note, 122, 1997

work page 1997
[36]

Oxford university press, 2004

John C Gower and Garmt B Dijksterhuis.Procrustes problems, volume 30. Oxford university press, 2004

work page 2004
[37]

Closed-form solution of absolute orientation using orthonormal matrices.Journal of the Optical Society of America A, 5(7):1127–1135, 1988

Berthold KP Horn, Hugh M Hilden, and Shahriar Negahdaripour. Closed-form solution of absolute orientation using orthonormal matrices.Journal of the Optical Society of America A, 5(7):1127–1135, 1988

work page 1988
[38]

On the computation of a matrix inverse square root.Computing, 46(4):295–305, December

Nagwa Sherif. On the computation of a matrix inverse square root.Computing, 46(4):295–305, December

work page
[39]

doi: 10.1007/bf02257775

ISSN 1436-5057. doi: 10.1007/bf02257775. URLhttp://dx.doi.org/10.1007/BF02257775

work page doi:10.1007/bf02257775
[40]

Three free data sets for development and benchmarking in nonlinear system identification

Torbjorn Wigren and Johan Schoukens. Three free data sets for development and benchmarking in nonlinear system identification. In2013 European Control Conference (ECC), pages 2933–2938. IEEE, 7 2013. doi: 10.23919/ecc.2013.6669201. URLhttp://dx.doi.org/10.23919/ECC.2013.6669201

work page doi:10.23919/ecc.2013.6669201 2013
[41]

Coupled electric drives data set and reference models

Torbjörn Wigren and Maarten Schoukens. Coupled electric drives data set and reference models. Technical report, Uppsala University Sweden, 2017. 11

work page 2017
[42]

Data set and reference models of EMPS

Alexandre Janot, Maxime Gautier, and Mathieu Brunot. Data set and reference models of EMPS. In Nonlinear system identification benchmarks, 2019

work page 2019
[43]

Dataset and baseline for an industrial robot identification benchmark, 2022

Jonas Weigand, Julian Götz, Jonas Ulmen, and Martin Ruskowski. Dataset and baseline for an industrial robot identification benchmark, 2022. URL https://fdm-fallback.uni-kl.de/TUK/FB/MV/WSKL/ 0001/

work page 2022
[44]

Data-driven state-space identification and nonlinearity assessment of the CubeSpec Fine Steering Mirror.ISMA-USD2024, pages 2042–2052, 2024

Merijn Floren, Leonardo Peri, Jeroen De Maeyer, Wim De Munter, Dirk Vandepitte, and Jean-Philippe Noël. Data-driven state-space identification and nonlinearity assessment of the CubeSpec Fine Steering Mirror.ISMA-USD2024, pages 2042–2052, 2024

work page 2042
[45]

Optuna: A next-generation hyperparameter optimization framework

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A next-generation hyperparameter optimization framework. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019

work page 2019
[46]

Baseline results for selected nonlinear system identification benchmarks.IFAC-PapersOnLine, 58(15):474–479, 2024

Max Champneys, Gerben I Beintema, Roland Tóth, Maarten Schoukens, and Timothy J Rogers. Baseline results for selected nonlinear system identification benchmarks.IFAC-PapersOnLine, 58(15):474–479, 2024

work page 2024
[47]

Baseline models for the Fine Steering Mirror benchmark dataset

Merijn Floren. Baseline models for the Fine Steering Mirror benchmark dataset. https://doi.org/ 10.5281/zenodo.19591136, 2026. URL https://doi.org/10.5281/zenodo.19591136. Version v2026.04.15. A Computing the Nearest Real 2-by-2 Schur-Stable Matrix This appendix is included for the sake of convenience, but it is taken almost verbatim from Noferini and Pol...

work page doi:10.5281/zenodo.19591136 2026
[48]

IfA 2,1 ̸=−A 1,2, then ˆA1,1 = ˆA2,2

work page
[49]

A non-empty closed set S ⊆R 2×2 is doubly rotation-invariant if X∈ S=⇒U(α)XU(β) ⊺ ∈ S ∀{α, β} ∈R 2

If A2,1 =−A 1,2, then ∃α∈[0, π/2) : ˆA=U(α) ˇAU(α) ⊺, where ˇA is another local minimizer with the same objective value and ˇA1,1 = ˇA2,2. A non-empty closed set S ⊆R 2×2 is doubly rotation-invariant if X∈ S=⇒U(α)XU(β) ⊺ ∈ S ∀{α, β} ∈R 2. Lemma A.4.LetS ∈R 2×2 be doubly rotation-invariant, and let ˆAbe a local minimizer of min X∈S ∥A−X∥ F for some diagona...

work page
[50]

Ifσ 1 ̸=σ 2, then ˆAis diagonal with ˆA1,1 − ˆA2,2 ≥0and ˆA1,1 + ˆA2,2 ≥0

work page
[51]

The result in Lemma A.5 is a variant of the Routh-Hurwitz stability criterion (see Hogben [28, Sec

If σ1 =σ 2, then ∃α∈[0, π/2) : ˆA=U(α) ˇAU(α) ⊺, where ˇA is another local minimizer with the same objective value and ˇA1,1 = ˇA2,2 = 0. The result in Lemma A.5 is a variant of the Routh-Hurwitz stability criterion (see Hogben [28, Sec. 26.2]). Lemma A.5.Both roots of the polynomial p(λ) =aλ 2 +bλ+c , with a̸= 0 , lie in the closed left half-plane{λ∈C:Re...

work page
[52]

ˆA∈∂S 0, ˆA/∈∂S + ∪∂S −

work page
[53]

ˆA∈∂S +, ˆA/∈∂S 0 ∪∂S −

work page
[54]

ˆA∈∂S −, ˆA/∈∂S 0 ∪∂S +

work page
[55]

ˆA∈∂S 0 ∩∂S +, ˆA/∈∂S −

work page
[56]

ˆA∈∂S 0 ∩∂S −, ˆA/∈∂S +

work page
[57]

The six cases are treated separately:

ˆA∈∂S + ∩∂S −, ˆA/∈∂S 0. The six cases are treated separately:

work page
[58]

Then, ˆA is a local minimizer of ∥A−X∥ F in the doubly rotation-invariant set ∂S0, and Ξ=U ⊺ 0 ˆAV0 is a local minimizer of σ0,1 σ0,2 −X F in∂S 0

ˆA∈∂S 0, ˆA/∈∂S + ∪∂S −. Then, ˆA is a local minimizer of ∥A−X∥ F in the doubly rotation-invariant set ∂S0, and Ξ=U ⊺ 0 ˆAV0 is a local minimizer of σ0,1 σ0,2 −X F in∂S 0. If σ0,1 ̸=σ 0,2, then Ξ is diagonal by Lemma A.4; and for Ξ= [ τ1 τ2 ] (with τ1τ2 = det(Ξ) = 1), it is necessary to have(τ 1, τ2)∈ M(σ 0,1, σ0,2). Hence, ˆA∈ ˆA0. Otherwise, ifσ 0,1 =σ ...

work page
[59]

Then, min X∈∂S + ∥A−X∥ F = min X∈∂S + ∥U ⊺ +(A−I)V + −U ⊺ +(X−I)V +∥F

ˆA∈∂S +, ˆA/∈∂S 0 ∪∂S −. Then, min X∈∂S + ∥A−X∥ F = min X∈∂S + ∥U ⊺ +(A−I)V + −U ⊺ +(X−I)V +∥F . Hence, ˆA is a local minimizer of ∥A−X∥ F if and only if Ξ=U ⊺ +(X−I)V + in ∂Sd ≜ X∈R 2×2 : det(X) = 0 : indeed, X has an eigenvalue 1 if and only if Ξ has determinant 0. Thus, either Ξ= [ σ+,1 0 ] and hence ˆA= ˆA+, or a minimizer with the same objective valu...

work page
[60]

Proof as in case 2, swapping all plus and minus signs

ˆA∈∂S −, ˆA/∈∂S 0 ∪∂S +. Proof as in case 2, swapping all plus and minus signs

work page
[61]

Note that ∂S0 ∩∂S + is the rotation-invariant set of matrices with double eigenvalue +1 (since they must have eigenvalue +1 and determinant 1)

ˆA∈∂S 0 ∩∂S +, ˆA/∈∂S −. Note that ∂S0 ∩∂S + is the rotation-invariant set of matrices with double eigenvalue +1 (since they must have eigenvalue +1 and determinant 1). ˆA is a minimizer if and only if Ξ=G ⊺ ˆAG is a minimizer of ∥ ˇA−X∥ in ∂S0 ∩∂S +; so, by Lemma A.3, 1 2 tr(Ξ) = Ξ1,1 = Ξ2,2 = 1 2 tr( ˆA) = +1. Since det(Ξ) = 1, at least one among Ξ1,2 a...

work page
[62]

Proof as in case 4, swapping all plus and minus signs

ˆA∈∂S 0 ∩∂S −, ˆA/∈∂S +. Proof as in case 4, swapping all plus and minus signs

work page
[63]

Note that∂S + ∩∂S − is the rotation-invariant set of matrices with eigenvalues1and−1

ˆA∈∂S + ∩∂S −, ˆA/∈∂S 0. Note that∂S + ∩∂S − is the rotation-invariant set of matrices with eigenvalues1and−1. Arguing as in case 4, G⊺ ˆAG=Ξ= 0τ 1 τ2 0 , τ 1τ2 = 1, as Ξ must have Ξ1,1 = Ξ 2,2 = 1 2 tr(Ξ) = 0 and det(Ξ) = 1 . Moreover, to minimize ∥ ˇA−Ξ∥ F ,(τ 1, τ2)must belong toM( ˇA1,2, ˇA2,1). Thus, ˆA∈ A ∗. B Projection Scheme - Full Benchmark The ...

work page
[64]

Random with independent entries inN(0,1)

work page
[65]

The resulting eigenvalues are shown in the complex plane in Figure 2

Random with independent entries inU([0,1]). The resulting eigenvalues are shown in the complex plane in Figure 2. Following the setup from Noferini and Poloni [8], the maximum number of iterations and execution time were set to1000and1000seconds, respectively. However, the iteration limit only controls the maximum number of outer iterations, meaning that ...

work page
[66]

The execution time of each method (Figure 4a)

work page
[67]

This metric indicates whether or not each implementation yields a valid factorization, regardless of the factors’ orthogonality and sparsity pattern

The reconstruction error (Figure 4b), defined as the NSFE between the original matrix A and the product of its Schur factors ZT Z ⊺. This metric indicates whether or not each implementation yields a valid factorization, regardless of the factors’ orthogonality and sparsity pattern

work page
[68]

This metric indicates whether Z fulfils the orthogonality requirement

The orthogonality error (Figure 4c), defined as the NSFE between the identity matrix and the symmetric matrix ZZ ⊺. This metric indicates whether Z fulfils the orthogonality requirement

work page
[69]

By Theorem 3.2, under a correct factorization, this metric should only be bound by numerical error

The spectral error (Figure 4c), defined as the NSSR between the original matrix A and the quasi-triangular factor T . By Theorem 3.2, under a correct factorization, this metric should only be bound by numerical error. This experiment is executed on an isolated HPC instance with 4 Xeon Gold 6230 logical cores, 32 GiB of RAM, and a single Nvidia V olta V100...

work page
[70]

ˆZ=U V ∗.(18)

The method from Golub and Van Loan [24], which finds the nearest orthogonal matrix by multiplying together the unitary factors from the Singular-Value Decomposition Z= UΣV ⊺; i.e. ˆZ=U V ∗.(18)

work page
[71]

The method from Horn et al. [37], which defines the closed form ˆZ=Z(Z ⊺Z) −1/2 , calculating the inverse square root of the product in parentheses through its eigenvalue decomposition(Z ⊺Z) =QΛQ −1 as (Z ⊺Z) −1/2 =QΛ −1/2Q−1 =Q   1√ λ1 ... 1√ λnx  Q−1.(19)

work page
[72]

[37], but instead approximating the inverse square root using the stable iterative scheme from Sherif [38, Section 3]; i.e., via the recursion X −1/2 = lim r→∞ Ξr s.t

The same method from Horn et al. [37], but instead approximating the inverse square root using the stable iterative scheme from Sherif [38, Section 3]; i.e., via the recursion X −1/2 = lim r→∞ Ξr s.t. Ξr+1 =Ξ r (I+E r),Ξ 0 =I, Er+1 =E 2 r 2I−S 2 r ,E 0 = (I−X) (I+X) −1 . (20) Since the methods are jit-compiled during training, the stop condition for Equat...

work page
[73]

Execution times for the Schur-stable projection, the Schur decomposition, and the spectral norm are also included for scale and relative-cost analysis

The execution time of each method (Figure 4a). Execution times for the Schur-stable projection, the Schur decomposition, and the spectral norm are also included for scale and relative-cost analysis. 20

work page
[74]

This metric should only be evaluated relative to other methods, as a null value should only be achievable ifZis already orthogonal

The projection distance (Figure 4b), defined as the NSFE between the original matrix Z and its projection ˆZ. This metric should only be evaluated relative to other methods, as a null value should only be achievable ifZis already orthogonal

work page
[75]

Unlike the projection distance, if ˆZ∈O(n x), then this metric should be approximately 0 (bounded by numerical error)

The orthogonality numerical error (Figure 4c), defined as the NSFE between the identity matrix and ˆZ ⊺ ˆZ. Unlike the projection distance, if ˆZ∈O(n x), then this metric should be approximately 0 (bounded by numerical error). This experiment is executed on an interactive HPC instance with 4 Xeon Gold 6230 logical cores and 8 GiB of RAM. Each metric is sa...

work page
[76]

It is built as a 2nd order linear time-invariant system with a 3rd degree polynomial static nonlinearity around it in feedback

Silverbox[ 39], an electronic implementation of the Duffing oscillator. It is built as a 2nd order linear time-invariant system with a 3rd degree polynomial static nonlinearity around it in feedback

work page
[77]

The pulley is held by a spring, resulting in a lightly damped dynamic mode

CED[ 40], or Coupled Electric Drives, consists of two electric motors that drive a pulley using a flexible belt. The pulley is held by a spring, resulting in a lightly damped dynamic mode. The electric drives can be individually controlled, allowing the tension and the speed of the belt to be simultaneously controlled. The drive control is symmetric aroun...

work page
[78]

The main source of nonlinearity is caused by friction effects that are present in the setup

EMPS[ 41], or Electro-Mechanical Positioning System, is a standard configuration of a drive system for the prismatic joint of robots or machine tools. The main source of nonlinearity is caused by friction effects that are present in the setup. Due to the presence of a pure integrator in the system, the measurements are taken in a closed-loop setting. The ...

work page
[79]

Industrial Robot[ 42], an identification benchmark dataset for a full robot movement with a KUKA KR300 R2500 ultra SE industrial robot. It is a robot with a nominal payload capacity of 300 kg, a weight of 1120 kg, and a reach of 2500 mm, exhibiting 12 states accounting for position and velocity for each of the 6 joints. The robot encounters backlash in al...

work page
[80]

V oltages applied to three piezo-actuators serve as inputs, while the mirror displacements measured at three non-collocated reference points serve as outputs

Fine-Steering Mirror[ 43], a multi-input multi-output high-precision control platform used in a small satellite. V oltages applied to three piezo-actuators serve as inputs, while the mirror displacements measured at three non-collocated reference points serve as outputs. The excitation signals are orthogonal random-phase multisines spanning a wide frequen...

work page