pith. sign in

arxiv: 2604.18507 · v2 · submitted 2026-04-20 · 🧮 math.OC · cs.AI· cs.LG

Learning the Riccati solution operator for time-varying LQR via Deep Operator Networks

Pith reviewed 2026-05-10 04:04 UTC · model grok-4.3

classification 🧮 math.OC cs.AIcs.LG
keywords Deep Operator NetworksRiccati equationLinear quadratic regulatorTime-varying systemsOptimal controlOperator learningExponential stabilityFeedback control
0
0 comments X

The pith

A DeepONet learns the Riccati solution operator for finite-horizon time-varying LQR, replacing repeated differential equation solves with fast evaluations while preserving exponential stability under bounded errors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to approximate the solution operator of the differential Riccati equation that arises in finite-horizon linear quadratic regulator problems with time-varying parameters. A Deep Operator Network is trained offline on pairs of system-parameter trajectories and corresponding Riccati-matrix trajectories, after which the network produces an approximate Riccati trajectory for any new parameter input in a single forward pass. This learned surrogate supplies near-optimal time-varying feedback gains without numerical integration at runtime. Theoretical analysis supplies explicit bounds that relate the operator approximation error to the resulting closed-loop trajectory error, feedback error, and cost suboptimality, together with a proof that exponential stability of the closed loop is retained whenever the operator error remains below a computable threshold. The approach therefore converts an online integration burden into a one-time learning cost and supplies verifiable reliability guarantees for data-driven optimal control.

Core claim

The central claim is that the Riccati solution operator, which maps a time-dependent system-parameter function to the corresponding time-dependent Riccati-matrix solution, admits a practical approximation by a matrix-valued DeepONet. Once trained, this network supplies approximate optimal feedbacks whose deviation from the true optimal feedback is controlled by the operator error; the paper proves that exponential stability of the closed-loop system is preserved as long as this error lies below an explicitly derived threshold, and it quantifies the resulting degradation in trajectory accuracy and cost.

What carries the argument

The Riccati solution operator that takes time-varying system matrices as input and returns the time-dependent positive-semidefinite Riccati trajectory, realized by a tailored DeepONet architecture together with a progressive learning strategy that scales with system dimension.

If this is right

  • Operator approximation errors propagate in quantifiable ways to feedback performance, state-trajectory accuracy, and cost suboptimality.
  • Exponential stability of the closed-loop system is retained whenever the learned operator is sufficiently accurate.
  • The method yields substantial computational speedups over classical Riccati solvers while retaining high accuracy on both time-invariant and time-varying instances.
  • The same learned operator generalizes across a wide range of system dimensions and parameter configurations after a single training phase.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the same operator-learning pattern succeeds for the linear Riccati equation, analogous surrogates could accelerate solution of other parametric differential equations that appear in control design.
  • The separation between offline training and online evaluation opens the possibility of updating the operator incrementally when new parameter regimes are observed during operation.

Load-bearing premise

The training distribution of time-dependent system parameters must be representative of all future instances the operator will encounter, so that the approximation error remains small enough for the stability and performance bounds to hold.

What would settle it

An explicit system-parameter trajectory drawn from the same class as the training data for which the learned feedback produces a closed-loop state that grows exponentially, even though the operator approximation error lies strictly below the threshold stated in the stability theorem, would falsify the preservation claim.

Figures

Figures reproduced from arXiv: 2604.18507 by Jun Chen, Junmin Wang, Umberto Biccari.

Figure 1
Figure 1. Figure 1: Eigenvalue distribution for 3-d LQR problems. On the left we have the true closed-loop matrix Acl and on the right the DeepONet approximated one Abcl . 5.1.2. Progressive DeepONet architecture: 4-dimensional case. As mentioned in Section 3.3, a central challenge in OL is scalability with respect to the system dimension. To address this issue, we have proposed a Progressive OL strategy allowing to transfer … view at source ↗
Figure 2
Figure 2. Figure 2: Eigenvalue distribution for 10-d LQR problems. On the left we have the true closed-loop matrix Acl and on the right the DeepONet approximated one Abcl . Stable samples Unstable samples Stable eigenvalues Unstable eigenvalues Test loss 197 3 1994 6 8.77 × 10−3 [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of predicted and true trajectories (top) and predicted and tru controls (bottom) for a 3-d test system. 5.3. Computational complexity and efficiency. We conclude the numerical analysis by pro￾viding a unified assessment of the computational cost of the proposed approach, integrating the results obtained for both time-invariant (ARE) and time-dependent (DRE) settings. The objective is to quantify… view at source ↗
read the original abstract

We propose a computational framework for replacing the repeated numerical solution of differential Riccati equations in finite-horizon Linear Quadratic Regulator (LQR) problems by a learned operator surrogate. Instead of solving a nonlinear matrix-valued differential equation for each new system instance, we construct offline an approximation of the associated solution operator mapping time-dependent system parameters to the Riccati trajectory. The resulting model enables fast online evaluation of approximate optimal feedbacks across a wide class of systems, thereby shifting the computational burden from repeated numerical integration to a one-time learning stage. From a theoretical perspective, we establish control-theoretic guarantees for this operator-based approximation. In particular, we derive bounds quantifying how operator approximation errors propagate to feedback performance, trajectory accuracy, and cost suboptimality, and we prove that exponential stability of the closed-loop system is preserved under sufficiently accurate operator approximation. These results provide a framework to assess the reliability of data-driven approximations in optimal control. On the computational side, we design tailored DeepONet architectures for matrix-valued, time-dependent problems and introduce a progressive learning strategy to address scalability with respect to the system dimension. Numerical experiments on both time-invariant and time-varying LQR problems demonstrate that the proposed approach achieves high accuracy and strong generalization across a wide range of system configurations, while delivering substantial computational speedups compared to classical solvers. The method offers an effective and scalable alternative for parametric and real-time optimal control applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper proposes a DeepONet framework to learn the solution operator mapping time-dependent system parameters to the Riccati trajectory for finite-horizon time-varying LQR, replacing repeated numerical integration with fast online evaluation. It derives bounds on how operator approximation errors propagate to feedback performance, state trajectories, and cost suboptimality, and proves that exponential stability of the closed-loop system is preserved provided the approximation error is sufficiently small. Tailored matrix-valued DeepONet architectures and a progressive learning strategy are introduced for scalability, with numerical experiments on time-invariant and time-varying LQR instances demonstrating accuracy, generalization, and computational speedups.

Significance. If the error-propagation bounds and stability-preservation result hold under the stated assumptions, the work supplies a concrete bridge between operator learning and control-theoretic guarantees for parametric optimal control. The explicit quantification of suboptimality and the stability theorem (when the required norm on the operator error is attained) would be a useful addition to the literature on data-driven surrogates for Riccati equations.

major comments (3)
  1. [§3.2, Theorem 3.1] §3.2, Theorem 3.1 (stability preservation): the proof requires the operator approximation error to be small in a topology (sup-norm or weighted L^∞ over the time interval) that controls the time-varying closed-loop state-transition matrix and the uniform exponential decay rate. The manuscript does not show that the L²-type training loss minimized by DeepONet, even with the progressive strategy, guarantees this uniform bound for every admissible parameter trajectory outside the training distribution.
  2. [§3.1] §3.1 (error-propagation bounds): the derivation of the feedback-performance and cost-suboptimality bounds assumes that the learned operator stays inside a ball whose radius is determined by the stability margin of the true Riccati trajectory. No quantitative link is provided between the empirical training loss and the radius of this ball, leaving open whether the numerical examples actually satisfy the hypothesis of the theorem.
  3. [§4] §4 (progressive learning strategy): while the architecture is described, there is no analysis showing that the progressive training procedure preserves the approximation properties needed for the stability theorem when the system dimension increases or when the parameter functions exhibit rapid temporal variations not seen in the training set.
minor comments (3)
  1. [§2] Notation for the time-dependent system matrices (A(t), B(t), Q(t), R(t)) is introduced without an explicit statement of the function space (e.g., C^0 or L^∞) to which they belong; this should be stated once at the beginning of §2.
  2. [Figures 3-4] Figure 3 and Figure 4: the color scales and axis labels for the Riccati-matrix trajectories are difficult to read at the printed size; larger fonts or separate panels for each matrix entry would improve clarity.
  3. [§5] The abstract claims “strong generalization across a wide range of system configurations,” yet the test trajectories in §5 appear to be drawn from the same distribution family as the training set; a clearer statement of the out-of-distribution test protocol would be helpful.

Simulated Author's Rebuttal

3 responses · 3 unresolved

We thank the referee for the careful and constructive review. The comments highlight important connections between the learning procedure and the control-theoretic guarantees, which we address point by point below.

read point-by-point responses
  1. Referee: [§3.2, Theorem 3.1] §3.2, Theorem 3.1 (stability preservation): the proof requires the operator approximation error to be small in a topology (sup-norm or weighted L^∞ over the time interval) that controls the time-varying closed-loop state-transition matrix and the uniform exponential decay rate. The manuscript does not show that the L²-type training loss minimized by DeepONet, even with the progressive strategy, guarantees this uniform bound for every admissible parameter trajectory outside the training distribution.

    Authors: We agree that Theorem 3.1 requires the approximation error to be small in the sup-norm (or weighted L^∞) to control the state-transition matrix and preserve the exponential decay rate. The L² training loss does not automatically guarantee this uniform bound for all admissible trajectories, including those outside the training set. In the revision we will add a remark in §3.2 clarifying the distinction between the training norm and the norm required by the theorem, together with additional numerical checks of the sup-norm error on out-of-distribution samples. A rigorous guarantee that the learned operator satisfies the sup-norm condition for arbitrary parameter trajectories lies beyond the present scope. revision: partial

  2. Referee: [§3.1] §3.1 (error-propagation bounds): the derivation of the feedback-performance and cost-suboptimality bounds assumes that the learned operator stays inside a ball whose radius is determined by the stability margin of the true Riccati trajectory. No quantitative link is provided between the empirical training loss and the radius of this ball, leaving open whether the numerical examples actually satisfy the hypothesis of the theorem.

    Authors: The error-propagation bounds in §3.1 are indeed conditional on the operator error lying inside a ball whose radius depends on the stability margin. No a priori quantitative map from the empirical L² loss to this radius is given. In the revised manuscript we will insert a paragraph in §3.1 describing how the hypothesis can be verified post-training by evaluating the actual operator error norm on held-out test trajectories and comparing it with the stability-derived radius. The numerical examples already satisfy the condition (observed errors are well below the margin), and we will make this verification explicit. revision: yes

  3. Referee: [§4] §4 (progressive learning strategy): while the architecture is described, there is no analysis showing that the progressive training procedure preserves the approximation properties needed for the stability theorem when the system dimension increases or when the parameter functions exhibit rapid temporal variations not seen in the training set.

    Authors: The progressive learning strategy is presented primarily for computational scalability. We do not provide a theoretical analysis proving that it preserves the approximation quality required by the stability theorem for higher dimensions or for parameter functions with rapid temporal variations absent from the training set. In the revision we will expand §4 with a limitations paragraph acknowledging this point and will add further numerical experiments using higher-dimensional systems and parameter trajectories with faster dynamics. A complete theoretical treatment of the progressive strategy’s effect on the required approximation properties is left for future work. revision: partial

standing simulated objections not resolved
  • A rigorous proof that the L² training loss (even with progressive training) guarantees the sup-norm bound needed for stability preservation for every admissible parameter trajectory outside the training distribution.
  • An a priori quantitative relationship between the empirical training loss and the ball radius appearing in the error-propagation theorems without post-training verification.
  • A theoretical analysis establishing that the progressive learning strategy preserves the approximation properties required by the stability theorem under increasing system dimension or unseen rapid temporal variations.

Circularity Check

0 steps flagged

No significant circularity; theoretical bounds derived independently of learned operator

full rationale

The paper separates the data-driven DeepONet approximation of the Riccati operator from the subsequent control-theoretic analysis. Error propagation bounds and the exponential stability preservation result are obtained via standard Lyapunov-type estimates on the time-varying closed-loop system, using the exact Riccati trajectory as reference; these steps do not reduce to fitted parameters, self-citations, or ansatzes imported from prior author work. The training distribution assumption is an explicit hypothesis required for the bounds to apply, but it is not smuggled into the derivation itself. No load-bearing step equates a prediction to its own input by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard existence results for Riccati solutions and on the universal approximation properties of DeepONet; no new physical entities are introduced.

axioms (1)
  • domain assumption The differential Riccati equation admits a unique positive-semidefinite solution for every admissible time-varying system in the considered class.
    Invoked to ensure the target operator is well-defined before learning begins.

pith-pipeline@v0.9.0 · 5557 in / 1237 out tokens · 25452 ms · 2026-05-10T04:04:18.859242+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

  1. [1]

    D. E. Kirk, Optimal control theory: an introduction, Courier Corporation, 2004

  2. [2]

    Kwakernaak, R

    H. Kwakernaak, R. Sivan, Linear optimal control systems, Vol. 1, Wiley-interscience New York, 1972

  3. [3]

    E. D. Sontag, Mathematical control theory: deterministic finite dimensional systems, Vol. 6, Springer Science & Business Media, 2013

  4. [4]

    Trélat, Contrôle optimal: théorie & applications, Vol

    E. Trélat, Contrôle optimal: théorie & applications, Vol. 36, Vuibert Paris, 2005

  5. [5]

    Alcala, V

    E. Alcala, V. Puig, J. Quevedo, T. Escobet, R. Comasolivas, Autonomous vehicle control using a kinematic Lyapunov-based technique with LQR-LMI tuning, Control Eng. Pract. 73 (2018) 1--12

  6. [6]

    Dhingra, K

    D. Dhingra, K. Kaheman, S. B. Fuller, Modeling and LQR control of insect sized flapping wing robot, npj Robotics 3 (1) (2025) 6

  7. [7]

    A. S. Elkhatem, S. N. Engin, Robust LQR and LQR-PI control strategies based on adaptive weighting matrix selection for a UAV position and attitude tracking control, Alex. Eng. J. 61 (8) (2022) 6275--6292

  8. [8]

    Gu, X.-R

    G. Gu, X.-R. Cao, H. Badr, Generalized LQR control and Kalman filtering with relations to computations of inner-outer and spectral factorizations, IEEE Trans. Automat. Control 51 (4) (2006) 595--605

  9. [9]

    X. Peng, J. Yin, J. Yu, J. Song, K. Liu, Z. Li, T. Wei, Linear quadratic regulator-based coordinated optimization for harmonic compensation in multifunctional grid-connected inverters, Int. J. Electr. Power Energy Syst. 172 (2025) 111333

  10. [10]

    Z. Su, H. Yao, J. Peng, Z. Liao, Z. Wang, H. Yu, H. Dai, T. C. Lueth, LQR-based control strategy for improving human--robot companionship and natural obstacle avoidance, Biomim. Intell. Robot. 4 (4) (2024) 100185

  11. [11]

    Bittanti, A

    S. Bittanti, A. Laub, J. Willems, The Riccati equation, Springer-Verlag, 1991

  12. [12]

    Lancaster, L

    P. Lancaster, L. Rodman, Algebraic Riccati equations, Clarendon press, 1995

  13. [13]

    Lanthaler, S

    S. Lanthaler, S. Mishra, G. E. Karniadakis, Error estimates for deeponets: A deep learning framework in infinite dimensions, Trans. Math. Appl. 6 (1) (2022) 1--141

  14. [14]

    L. Lu, P. Jin, G. Pang, Z. Zhang, G. E. Karniadakis, Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators, Nature Mach. Intell. 3 (3) (2021) 218--229

  15. [15]

    C. A. O. Quero, J. Martinez-Carranza, Physics-informed machine learning for uav control, in: 2024 21st International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE), IEEE, 2024, pp. 1--6

  16. [16]

    Raissi, P

    M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, J. Comput. Phys. 378 (2019) 686--707

  17. [17]

    L. Cui, B. Pang, M. Krstić, Z.-P. Jiang, Learning-based adaptive optimal control of linear time-delay systems: A value iteration approach, Automatica 171 (2025) 111944

  18. [18]

    Yu, B.-C

    J. Yu, B.-C. Wang, D. Meng, Stochastic Linear Quadratic Optimal Control for Continuous-Time Systems via Reinforcement Learning, Int. J. Robust Nonlinear Control 36 (4) (2026) 1876--1887

  19. [19]

    J. W. Choi, Y. B. Seo, LQR design with eigenstructure assignment capability and application to aircraft flight control, IEEE Trans. Aerosp. Electron. Syst. 35 (2) (1999) 700--708

  20. [20]

    B. L. Stevens, F. L. Lewis, E. N. Johnson, Aircraft control and simulation: dynamics, controls design, and autonomous systems, John Wiley & Sons, 2015

  21. [21]

    Klemm, A

    V. Klemm, A. Morra, L. Gulich, D. Mannhart, D. Rohr, M. Kamel, Y. de Viragh, R. Siegwart, LQR-assisted whole-body control of a wheeled bipedal robot with kinematic loops, IEEE Robot. Autom. Lett. 5 (2) (2020) 3745--3752

  22. [22]

    E. V. Kumar, J. Jerome, Robust LQR controller design for stabilizing and trajectory tracking of inverted pendulum, Procedia Eng. 64 (2013) 169--178

  23. [23]

    H. A. U.-Q. Mohammed, H. R. Wasmi, Active vibration control of cantilever beam by using optimal LQR controller, J. Eng. 24 (11) (2018) 1--17

  24. [24]

    A. K. Singh, B. C. Pal, Decentralized control of oscillatory dynamics in power systems using an extended LQR, IEEE Trans. Power Syst. 31 (3) (2015) 1715--1728

  25. [25]

    K. B. Slimane, Z. Tmar, M. Besbes, Deep reinforcement learning LQR controller design for MIMO systems applied to gas production facility, Int. J. Autom. Control 19 (6) (2025) 669--704

  26. [26]

    U. V. Kalabić, I. V. Kolmanovsky, A constraint-separation principle in model predictive control, Automatica 121 (2020) 109190

  27. [27]

    He, Dual-mode nonlinear MPC via terminal control laws with free-parameters, IEEE/CAA J

    D. He, Dual-mode nonlinear MPC via terminal control laws with free-parameters, IEEE/CAA J. Autom. Sin. 4 (3) (2016) 526--533. 27

  28. [28]

    Reddy, A

    V. Reddy, A. Boker, H. Eldardiry, Learning-based optimal control of linear time-varying systems over large time intervals, Syst. Control Lett. 185 (2024) 105750

  29. [29]

    J. Xu, C. Wen, D. Xu, Optimal control data scheduling with limited controller-plant communication, Sci. China Inf. Sci. 61 (1) (2018) 012202

  30. [30]

    J. Fong, Y. Tan, V. Crocher, D. Oetomo, I. Mareels, Dual-loop iterative optimal control for the finite horizon LQR problem with unknown dynamics, Syst. Control Lett. 111 (2018) 49--57

  31. [31]

    B. Deng, Y. Shin, L. Lu, Z. Zhang, G. E. Karniadakis, Approximation rates of DeepONets for learning operators arising from advection-diffusion equations, Neur. Netw. 153 (2022) 411--426

  32. [32]

    Abou-Kandil, G

    H. Abou-Kandil, G. Freiling, V. Ionescu, G. Jank, Matrix Riccati equations in control and systems theory, Birkhäuser, 2012

  33. [33]

    K. Zhou, J. C. Doyle, K. Glover, et al., Robust and optimal control, Vol. 40, Prentice hall New Jersey, 1996

  34. [34]

    B. D. Anderson, J. B. Moore, Optimal control: linear quadratic methods, Courier Corporation, 2007

  35. [35]

    Bertsekas, Dynamic programming and optimal control: Volume I, Vol

    D. Bertsekas, Dynamic programming and optimal control: Volume I, Vol. 4, Athena scientific, 2012

  36. [36]

    E. A. Coddington, N. Levinson, T. Teichmann, Theory of ordinary differential equations (1956). ∗ School of Mathematics and Statistics, Beijing Institute of Technology, 100081 Beijing, China. † Chair of Computational Mathematics, DeustoTech, University of Deusto, Avenida de las Universidades 24, 48007 Bilbao, Basque Country, Spain. Email address:chenjun_bc...