Learning the Riccati solution operator for time-varying LQR via Deep Operator Networks
Pith reviewed 2026-05-10 04:04 UTC · model grok-4.3
The pith
A DeepONet learns the Riccati solution operator for finite-horizon time-varying LQR, replacing repeated differential equation solves with fast evaluations while preserving exponential stability under bounded errors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the Riccati solution operator, which maps a time-dependent system-parameter function to the corresponding time-dependent Riccati-matrix solution, admits a practical approximation by a matrix-valued DeepONet. Once trained, this network supplies approximate optimal feedbacks whose deviation from the true optimal feedback is controlled by the operator error; the paper proves that exponential stability of the closed-loop system is preserved as long as this error lies below an explicitly derived threshold, and it quantifies the resulting degradation in trajectory accuracy and cost.
What carries the argument
The Riccati solution operator that takes time-varying system matrices as input and returns the time-dependent positive-semidefinite Riccati trajectory, realized by a tailored DeepONet architecture together with a progressive learning strategy that scales with system dimension.
If this is right
- Operator approximation errors propagate in quantifiable ways to feedback performance, state-trajectory accuracy, and cost suboptimality.
- Exponential stability of the closed-loop system is retained whenever the learned operator is sufficiently accurate.
- The method yields substantial computational speedups over classical Riccati solvers while retaining high accuracy on both time-invariant and time-varying instances.
- The same learned operator generalizes across a wide range of system dimensions and parameter configurations after a single training phase.
Where Pith is reading between the lines
- If the same operator-learning pattern succeeds for the linear Riccati equation, analogous surrogates could accelerate solution of other parametric differential equations that appear in control design.
- The separation between offline training and online evaluation opens the possibility of updating the operator incrementally when new parameter regimes are observed during operation.
Load-bearing premise
The training distribution of time-dependent system parameters must be representative of all future instances the operator will encounter, so that the approximation error remains small enough for the stability and performance bounds to hold.
What would settle it
An explicit system-parameter trajectory drawn from the same class as the training data for which the learned feedback produces a closed-loop state that grows exponentially, even though the operator approximation error lies strictly below the threshold stated in the stability theorem, would falsify the preservation claim.
Figures
read the original abstract
We propose a computational framework for replacing the repeated numerical solution of differential Riccati equations in finite-horizon Linear Quadratic Regulator (LQR) problems by a learned operator surrogate. Instead of solving a nonlinear matrix-valued differential equation for each new system instance, we construct offline an approximation of the associated solution operator mapping time-dependent system parameters to the Riccati trajectory. The resulting model enables fast online evaluation of approximate optimal feedbacks across a wide class of systems, thereby shifting the computational burden from repeated numerical integration to a one-time learning stage. From a theoretical perspective, we establish control-theoretic guarantees for this operator-based approximation. In particular, we derive bounds quantifying how operator approximation errors propagate to feedback performance, trajectory accuracy, and cost suboptimality, and we prove that exponential stability of the closed-loop system is preserved under sufficiently accurate operator approximation. These results provide a framework to assess the reliability of data-driven approximations in optimal control. On the computational side, we design tailored DeepONet architectures for matrix-valued, time-dependent problems and introduce a progressive learning strategy to address scalability with respect to the system dimension. Numerical experiments on both time-invariant and time-varying LQR problems demonstrate that the proposed approach achieves high accuracy and strong generalization across a wide range of system configurations, while delivering substantial computational speedups compared to classical solvers. The method offers an effective and scalable alternative for parametric and real-time optimal control applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a DeepONet framework to learn the solution operator mapping time-dependent system parameters to the Riccati trajectory for finite-horizon time-varying LQR, replacing repeated numerical integration with fast online evaluation. It derives bounds on how operator approximation errors propagate to feedback performance, state trajectories, and cost suboptimality, and proves that exponential stability of the closed-loop system is preserved provided the approximation error is sufficiently small. Tailored matrix-valued DeepONet architectures and a progressive learning strategy are introduced for scalability, with numerical experiments on time-invariant and time-varying LQR instances demonstrating accuracy, generalization, and computational speedups.
Significance. If the error-propagation bounds and stability-preservation result hold under the stated assumptions, the work supplies a concrete bridge between operator learning and control-theoretic guarantees for parametric optimal control. The explicit quantification of suboptimality and the stability theorem (when the required norm on the operator error is attained) would be a useful addition to the literature on data-driven surrogates for Riccati equations.
major comments (3)
- [§3.2, Theorem 3.1] §3.2, Theorem 3.1 (stability preservation): the proof requires the operator approximation error to be small in a topology (sup-norm or weighted L^∞ over the time interval) that controls the time-varying closed-loop state-transition matrix and the uniform exponential decay rate. The manuscript does not show that the L²-type training loss minimized by DeepONet, even with the progressive strategy, guarantees this uniform bound for every admissible parameter trajectory outside the training distribution.
- [§3.1] §3.1 (error-propagation bounds): the derivation of the feedback-performance and cost-suboptimality bounds assumes that the learned operator stays inside a ball whose radius is determined by the stability margin of the true Riccati trajectory. No quantitative link is provided between the empirical training loss and the radius of this ball, leaving open whether the numerical examples actually satisfy the hypothesis of the theorem.
- [§4] §4 (progressive learning strategy): while the architecture is described, there is no analysis showing that the progressive training procedure preserves the approximation properties needed for the stability theorem when the system dimension increases or when the parameter functions exhibit rapid temporal variations not seen in the training set.
minor comments (3)
- [§2] Notation for the time-dependent system matrices (A(t), B(t), Q(t), R(t)) is introduced without an explicit statement of the function space (e.g., C^0 or L^∞) to which they belong; this should be stated once at the beginning of §2.
- [Figures 3-4] Figure 3 and Figure 4: the color scales and axis labels for the Riccati-matrix trajectories are difficult to read at the printed size; larger fonts or separate panels for each matrix entry would improve clarity.
- [§5] The abstract claims “strong generalization across a wide range of system configurations,” yet the test trajectories in §5 appear to be drawn from the same distribution family as the training set; a clearer statement of the out-of-distribution test protocol would be helpful.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The comments highlight important connections between the learning procedure and the control-theoretic guarantees, which we address point by point below.
read point-by-point responses
-
Referee: [§3.2, Theorem 3.1] §3.2, Theorem 3.1 (stability preservation): the proof requires the operator approximation error to be small in a topology (sup-norm or weighted L^∞ over the time interval) that controls the time-varying closed-loop state-transition matrix and the uniform exponential decay rate. The manuscript does not show that the L²-type training loss minimized by DeepONet, even with the progressive strategy, guarantees this uniform bound for every admissible parameter trajectory outside the training distribution.
Authors: We agree that Theorem 3.1 requires the approximation error to be small in the sup-norm (or weighted L^∞) to control the state-transition matrix and preserve the exponential decay rate. The L² training loss does not automatically guarantee this uniform bound for all admissible trajectories, including those outside the training set. In the revision we will add a remark in §3.2 clarifying the distinction between the training norm and the norm required by the theorem, together with additional numerical checks of the sup-norm error on out-of-distribution samples. A rigorous guarantee that the learned operator satisfies the sup-norm condition for arbitrary parameter trajectories lies beyond the present scope. revision: partial
-
Referee: [§3.1] §3.1 (error-propagation bounds): the derivation of the feedback-performance and cost-suboptimality bounds assumes that the learned operator stays inside a ball whose radius is determined by the stability margin of the true Riccati trajectory. No quantitative link is provided between the empirical training loss and the radius of this ball, leaving open whether the numerical examples actually satisfy the hypothesis of the theorem.
Authors: The error-propagation bounds in §3.1 are indeed conditional on the operator error lying inside a ball whose radius depends on the stability margin. No a priori quantitative map from the empirical L² loss to this radius is given. In the revised manuscript we will insert a paragraph in §3.1 describing how the hypothesis can be verified post-training by evaluating the actual operator error norm on held-out test trajectories and comparing it with the stability-derived radius. The numerical examples already satisfy the condition (observed errors are well below the margin), and we will make this verification explicit. revision: yes
-
Referee: [§4] §4 (progressive learning strategy): while the architecture is described, there is no analysis showing that the progressive training procedure preserves the approximation properties needed for the stability theorem when the system dimension increases or when the parameter functions exhibit rapid temporal variations not seen in the training set.
Authors: The progressive learning strategy is presented primarily for computational scalability. We do not provide a theoretical analysis proving that it preserves the approximation quality required by the stability theorem for higher dimensions or for parameter functions with rapid temporal variations absent from the training set. In the revision we will expand §4 with a limitations paragraph acknowledging this point and will add further numerical experiments using higher-dimensional systems and parameter trajectories with faster dynamics. A complete theoretical treatment of the progressive strategy’s effect on the required approximation properties is left for future work. revision: partial
- A rigorous proof that the L² training loss (even with progressive training) guarantees the sup-norm bound needed for stability preservation for every admissible parameter trajectory outside the training distribution.
- An a priori quantitative relationship between the empirical training loss and the ball radius appearing in the error-propagation theorems without post-training verification.
- A theoretical analysis establishing that the progressive learning strategy preserves the approximation properties required by the stability theorem under increasing system dimension or unseen rapid temporal variations.
Circularity Check
No significant circularity; theoretical bounds derived independently of learned operator
full rationale
The paper separates the data-driven DeepONet approximation of the Riccati operator from the subsequent control-theoretic analysis. Error propagation bounds and the exponential stability preservation result are obtained via standard Lyapunov-type estimates on the time-varying closed-loop system, using the exact Riccati trajectory as reference; these steps do not reduce to fitted parameters, self-citations, or ansatzes imported from prior author work. The training distribution assumption is an explicit hypothesis required for the bounds to apply, but it is not smuggled into the derivation itself. No load-bearing step equates a prediction to its own input by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The differential Riccati equation admits a unique positive-semidefinite solution for every admissible time-varying system in the considered class.
Reference graph
Works this paper leans on
-
[1]
D. E. Kirk, Optimal control theory: an introduction, Courier Corporation, 2004
work page 2004
-
[2]
H. Kwakernaak, R. Sivan, Linear optimal control systems, Vol. 1, Wiley-interscience New York, 1972
work page 1972
-
[3]
E. D. Sontag, Mathematical control theory: deterministic finite dimensional systems, Vol. 6, Springer Science & Business Media, 2013
work page 2013
-
[4]
Trélat, Contrôle optimal: théorie & applications, Vol
E. Trélat, Contrôle optimal: théorie & applications, Vol. 36, Vuibert Paris, 2005
work page 2005
- [5]
-
[6]
D. Dhingra, K. Kaheman, S. B. Fuller, Modeling and LQR control of insect sized flapping wing robot, npj Robotics 3 (1) (2025) 6
work page 2025
-
[7]
A. S. Elkhatem, S. N. Engin, Robust LQR and LQR-PI control strategies based on adaptive weighting matrix selection for a UAV position and attitude tracking control, Alex. Eng. J. 61 (8) (2022) 6275--6292
work page 2022
- [8]
-
[9]
X. Peng, J. Yin, J. Yu, J. Song, K. Liu, Z. Li, T. Wei, Linear quadratic regulator-based coordinated optimization for harmonic compensation in multifunctional grid-connected inverters, Int. J. Electr. Power Energy Syst. 172 (2025) 111333
work page 2025
-
[10]
Z. Su, H. Yao, J. Peng, Z. Liao, Z. Wang, H. Yu, H. Dai, T. C. Lueth, LQR-based control strategy for improving human--robot companionship and natural obstacle avoidance, Biomim. Intell. Robot. 4 (4) (2024) 100185
work page 2024
-
[11]
S. Bittanti, A. Laub, J. Willems, The Riccati equation, Springer-Verlag, 1991
work page 1991
-
[12]
P. Lancaster, L. Rodman, Algebraic Riccati equations, Clarendon press, 1995
work page 1995
-
[13]
S. Lanthaler, S. Mishra, G. E. Karniadakis, Error estimates for deeponets: A deep learning framework in infinite dimensions, Trans. Math. Appl. 6 (1) (2022) 1--141
work page 2022
-
[14]
L. Lu, P. Jin, G. Pang, Z. Zhang, G. E. Karniadakis, Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators, Nature Mach. Intell. 3 (3) (2021) 218--229
work page 2021
-
[15]
C. A. O. Quero, J. Martinez-Carranza, Physics-informed machine learning for uav control, in: 2024 21st International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE), IEEE, 2024, pp. 1--6
work page 2024
- [16]
-
[17]
L. Cui, B. Pang, M. Krstić, Z.-P. Jiang, Learning-based adaptive optimal control of linear time-delay systems: A value iteration approach, Automatica 171 (2025) 111944
work page 2025
- [18]
-
[19]
J. W. Choi, Y. B. Seo, LQR design with eigenstructure assignment capability and application to aircraft flight control, IEEE Trans. Aerosp. Electron. Syst. 35 (2) (1999) 700--708
work page 1999
-
[20]
B. L. Stevens, F. L. Lewis, E. N. Johnson, Aircraft control and simulation: dynamics, controls design, and autonomous systems, John Wiley & Sons, 2015
work page 2015
- [21]
-
[22]
E. V. Kumar, J. Jerome, Robust LQR controller design for stabilizing and trajectory tracking of inverted pendulum, Procedia Eng. 64 (2013) 169--178
work page 2013
-
[23]
H. A. U.-Q. Mohammed, H. R. Wasmi, Active vibration control of cantilever beam by using optimal LQR controller, J. Eng. 24 (11) (2018) 1--17
work page 2018
-
[24]
A. K. Singh, B. C. Pal, Decentralized control of oscillatory dynamics in power systems using an extended LQR, IEEE Trans. Power Syst. 31 (3) (2015) 1715--1728
work page 2015
-
[25]
K. B. Slimane, Z. Tmar, M. Besbes, Deep reinforcement learning LQR controller design for MIMO systems applied to gas production facility, Int. J. Autom. Control 19 (6) (2025) 669--704
work page 2025
-
[26]
U. V. Kalabić, I. V. Kolmanovsky, A constraint-separation principle in model predictive control, Automatica 121 (2020) 109190
work page 2020
-
[27]
He, Dual-mode nonlinear MPC via terminal control laws with free-parameters, IEEE/CAA J
D. He, Dual-mode nonlinear MPC via terminal control laws with free-parameters, IEEE/CAA J. Autom. Sin. 4 (3) (2016) 526--533. 27
work page 2016
- [28]
-
[29]
J. Xu, C. Wen, D. Xu, Optimal control data scheduling with limited controller-plant communication, Sci. China Inf. Sci. 61 (1) (2018) 012202
work page 2018
-
[30]
J. Fong, Y. Tan, V. Crocher, D. Oetomo, I. Mareels, Dual-loop iterative optimal control for the finite horizon LQR problem with unknown dynamics, Syst. Control Lett. 111 (2018) 49--57
work page 2018
-
[31]
B. Deng, Y. Shin, L. Lu, Z. Zhang, G. E. Karniadakis, Approximation rates of DeepONets for learning operators arising from advection-diffusion equations, Neur. Netw. 153 (2022) 411--426
work page 2022
-
[32]
H. Abou-Kandil, G. Freiling, V. Ionescu, G. Jank, Matrix Riccati equations in control and systems theory, Birkhäuser, 2012
work page 2012
-
[33]
K. Zhou, J. C. Doyle, K. Glover, et al., Robust and optimal control, Vol. 40, Prentice hall New Jersey, 1996
work page 1996
-
[34]
B. D. Anderson, J. B. Moore, Optimal control: linear quadratic methods, Courier Corporation, 2007
work page 2007
-
[35]
Bertsekas, Dynamic programming and optimal control: Volume I, Vol
D. Bertsekas, Dynamic programming and optimal control: Volume I, Vol. 4, Athena scientific, 2012
work page 2012
-
[36]
E. A. Coddington, N. Levinson, T. Teichmann, Theory of ordinary differential equations (1956). ∗ School of Mathematics and Statistics, Beijing Institute of Technology, 100081 Beijing, China. † Chair of Computational Mathematics, DeustoTech, University of Deusto, Avenida de las Universidades 24, 48007 Bilbao, Basque Country, Spain. Email address:chenjun_bc...
work page 1956
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.